External datasets relative to current directory or HDF5 file as documented?

The documentation for H5Pset_efile_prefix() states:

The default behavior of the library is to search for the dataset’s external storage raw data files in the same directory as the HDF5 file which contains the dataset.

https://docs.hdfgroup.org/hdf5/v1_12/group___d_a_p_l.html#gad487f84157fd0944cbe1cbd4dea4e1b8

Examining the source code indicates the default prefix is the current directory of the program and NOT “the same directory as the HDF5 file”. The documentation appears to be incorrect here. Otherwise, this seems like a bug?

    /* Prefix has to be checked for NULL / empty string again because the
     * code above might have updated it.
     */
    if (prefix == NULL || *prefix == '\0' || HDstrcmp(prefix, ".") == 0) {
        /* filename is interpreted as relative to the current directory,
         * does not need to be expanded
         */
        *file_prefix = NULL;
    } /* end if */
    else {
        if (HDstrncmp(prefix, "${ORIGIN}", HDstrlen("${ORIGIN}")) == 0) {
            /* Replace ${ORIGIN} at beginning of prefix by directory of HDF5 file */
            filepath_len    = HDstrlen(filepath);
            prefix_len      = HDstrlen(prefix);
            file_prefix_len = filepath_len + prefix_len - HDstrlen("${ORIGIN}") + 1;


            if (NULL == (*file_prefix = (char *)H5MM_malloc(file_prefix_len)))
                HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, FAIL, "unable to allocate buffer")
            HDsnprintf(*file_prefix, file_prefix_len, "%s%s", filepath, prefix + HDstrlen("${ORIGIN}"));
        } /* end if */
        else {
            if (NULL == (*file_prefix = (char *)H5MM_strdup(prefix)))
                HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, FAIL, "memory allocation failed")
        } /* end else */
    }     /* end else */


done:
    FUNC_LEAVE_NOAPI(ret_value)
} /* H5D__build_file_prefix() */

Thanks for the report. I’ve created this GitHub issue.

2 Likes

Thanks, Gerd. To clarify the situation, the existing and intended behavior is that the default external dataset prefix is the current working directory of the calling program’s current process. This is the case when the prefix is not set, set to an empty string, or set to “.” via H5Pset_efile_prefix.

To make the location of the external dataset file relative to the HDF5 file, the misdocumented default, H5Pset_efile_prefix should set the prefix to "${ORIGIN}".