How to know if a chunk has been allocated

Hi,

Is it possible to determine for an existing hdf5 dataset if a specific chunk has been allocated?

My situation is the following: I have a large two dimensional int32 dataset representing an image, for which I want to create a number of downscaled copies (gaussian image pyramid). Each copy is half of the size of the one before.
Now, the base dataset might be set quite spasely, so only a small part (~30%) of the area has actually been written to, the rest ist just the default fill value. When computing the downscaled copies, I would like to skip areas only containing the default value. Or, more precisely, copy only chunks, that actually have been allocated. Is there a way to determine if a chunk (maybe identified by an element position x,y) has been allocated?

Thanks,
Daniel

PS
A big thank-you to all HDF5 developers for this greate library!!

Hi Daniel,

Hi,

Is it possible to determine for an existing hdf5 dataset if a specific chunk has been allocated?

Unfortunately, one cannot do it with the current public API. This feature was requested on FORUM in the past and we would implement it if we have funding.

We have been trying to get funds to work on the APIs listed below to write/read chunks without going through hyperslab selection, and to iterate over written chunks, but without much success ;-\ Currently you have to "track" the chunks with data of interest yourself and use hyperslab selection mechanism to read the chunks back.
Description of work to perform: We will create new API routines for the HDF5 library which provide access to the on-disk form of chunks for a dataset, along with supporting API routines for this access pattern.

API routines (draft):

herr_t H5Dchunk_import(hid_t dset_id, const hsize_t *chunk_coords, size_t nbytes, hid_t dxpl_id, const void *buf, hid_t dxpl_id);

herr_t H5Dchunk_export(hid_t dset_id, const hsize_t *chunk_coords, hsize_t start, hsize_t length, void *buf, hid_t dxpl_id);

herr_t H5Dchunk_size(hid_t dset_id, const hsize_t *chunk_coords, hsize_t *nbytes);

herr_t H5Dchunk_offset(hid_t dset_id, const hsize_t *chunk_coords, haddr_t *addr);

herr_t H5Dchunk_iterate(hid_t dset_id, hid_t dxpl_id, <func>, void *udata);

            <func>(hid_t dset_id, const hsize_t *chunk_coords, size_t *nbytes, const void *buf, void *udata);

“read/write by chunk” convenience routines:

herr_t H5Dchunk_read(hid_t dset_id, const hsize_t *chunk_coords, void * buf, hid_t dxpl_id);

herr_t H5Dchunk_write(hid_t dset_id, const hsize_t *chunk_coords, const void *buf, hid_t dxpl_id);

Thank you!

Elena

···

On Jul 25, 2012, at 2:04 PM, Daniel Martens wrote:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

My situation is the following: I have a large two dimensional int32 dataset representing an image, for which I want to create a number of downscaled copies (gaussian image pyramid). Each copy is half of the size of the one before.
Now, the base dataset might be set quite spasely, so only a small part (~30%) of the area has actually been written to, the rest ist just the default fill value. When computing the downscaled copies, I would like to skip areas only containing the default value. Or, more precisely, copy only chunks, that actually have been allocated. Is there a way to determine if a chunk (maybe identified by an element position x,y) has been allocated?

Thanks,
Daniel

PS
A big thank-you to all HDF5 developers for this greate library!!

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org