provide more details on limitations for H5Dget_offset in documentation



referring to and quoting from:

      Returns dataset address in file.
      H5Dget_offset returns the address in the file of the dataset dset_id. That address is expressed as the offset in bytes from the beginning of the file.
      Returns the offset in bytes; otherwise returns HADDR_UNDEF, a negative value.

But when looking into the sources of the current release for H5D__get_offset
there is a drastically narrowed down subset of real world use cases:

    haddr_t ret_value = HADDR_UNDEF; [...]

    switch(dset->shared->layout.type) {
        case H5D_CHUNKED:
        case H5D_COMPACT:

        case H5D_CONTIGUOUS:
            /* If dataspace hasn't been allocated or dataset is stored in
             * an external file, the value will be HADDR_UNDEF. */
            if(dset->shared->dcpl_cache.efl.nused == 0 || H5F_addr_defined(dset->shared->
                /* Return the absolute dataset offset from the beginning of file. */
                ret_value = dset->shared-> + H5F_BASE_ADDR(dset->oloc.file);

        case H5D_LAYOUT_ERROR:
        case H5D_NLAYOUTS:
            HGOTO_ERROR(H5E_ARGS, H5E_BADTYPE, HADDR_UNDEF, "unknown dataset layout type")
    } /*lint !e788 All appropriate cases are covered */

The function will only provide a useable value if the layout type is H5D_CONTIGUOUS
_and_ the data space is allocated in main memory without any external file usage.
In all other cases (no main memory usage, external file usage, chunked/compact/other layout)
it will simply fail all the time with a return value of HADDR_UNDEF.

Those behavior should be documented in the reference guide as such cases might happen rather frequent.
To my understanding due to this the function only serves a useable value in only quite few cases.

Regards, Alex.


Just ran into this one myself. The documentation is still not updated. Maybe add a sentence to H5Dget_offset docs…

Note: returns HADDR_UNDEF for any layout other than CONTIGUOUS.

As an aside, maybe it could provide address of first chunk or, minimally, the only chunk for single-chunked datasets?