Allocation time of chunked dataset (collective calls)

hknibbe · December 5, 2018, 4:37pm

Hi,

When I create a filtered chunked dataset, I noticed that the dataset gets filtered twice: when the empty dataset gets created, and when it gets written.

For efficiency, I would like to remove the first filter call (dataset creation) and if possible, avoid the initial disk allocation with empty data.
I’ve tried options such as H5D_ALLOC_TIME_INCR, but without success so far.

I’m using the C++ wrappers of HDF5 1.10.3 with collective calls.

Thanks,
Hans

gheber · December 5, 2018, 5:27pm

Hans, you should consider precreating the dataset (in serial) on rank 0, and then re-open in parallel.

You can avoid writing fill values via H5Pset_fill_time(dcpl, H5D_FILL_TIME_NEVER ).

G.

hknibbe · December 6, 2018, 9:02am

Hi Gerd,

Thanks for your reply.

I’ll give it a try. I have another question though: the file is being created collectively, is it OK to create a dataset sequentially within a collective file?

Regards,
Hans

hknibbe · December 6, 2018, 10:06am

Hi Gerd,

I’m not sure how to do this.

The file is being created collectively. I create the dataset like this:
H5::DSetCreatPropList ds_creatplist;
ds_creatplist.setFillTime( H5D_FILL_TIME_NEVER);
ds_creatplist.setFilter( (h5Z_filter_t)32768, H5Z_FLAG_OPTIONAL);
ds_creatplist.setAllocTime(H5D_ALLOC_TIME_LATE);
createDataSet(“ds”,H5::PredType::NATIVE_INT, dataspace, ds_creatplist);

As far as I can see, DSetCreatPropList has no options related to a sequential or collective call.

Do you mean create the dataset and the file sequentially on rank 0, and reopen them collectively?

Regards,
Hans

gheber · December 6, 2018, 1:28pm

No, if the file is opened via an MPI communicator, all processes in that communicator need to “witness” the dataset creation (or attribute or group creation, etc.). That’s part of the parallel HDF5 etiquette.

gheber · December 6, 2018, 1:28pm

Yes (… create the dataset and the file sequentially on rank 0, and reopen them collectively?)

hknibbe · December 10, 2018, 9:19am

Hi Gerd,

I’ve implemented what you suggested.

The sequential file and dataset creation works as expected. The filter is not called at this stage.

I reopen the file collectively, and the the dataset like this:

H5::FileAccPropList fileAccPropList;
H5Pset_fapl_mpio(fileAccPropList.getId() , comm, info);
H5Pset_all_coll_metadata_ops(fileAccPropList.getId() , true);
H5::H5File h5file("test.h5", H5F_ACC_RDWR, H5::FileCreatPropList::DEFAULT, fileAccPropList);
H5::DSetMemXferPropList xfer_plist;
H5Pset_dxpl_mpio(xfer_plist.getId() , H5FD_MPIO_COLLECTIVE);

H5::DataSet dataset  = h5file.openDataSet(dataset_name);

Unfortunately, the openDataSet calls the filter to compress data. I don’t know how to avoid this call.
A possible work-around: test for a zero-filled buffer and write a “magic number” instead of calling an expensive compression algorithm. This way, the filter is called twice, but relatively fast the first time.

Regards,
Hans

jhenderson · December 10, 2018, 10:23pm

Hi Hans,

I believe that there is currently no way to get around the first call to the filter.

If you look here https://support.hdfgroup.org/HDF5/doc_resource/H5Fill_Values.html under section VI., the first part of the first bullet point mentions that certain VFDs (such as the MPI-I/O one) require that the space for the dataset be allocated upon creation time. From what I understand, it seems that in your case the library is realizing upon dataset open that the space wasn’t allocated. At that point, the MPI-I/O driver forces the space to be allocated, the side effect of which will be that the chunks in the dataset get filtered during allocation time. In this manner, later reads of those chunks will see filtered data as expected, instead of unfiltered data.

You may be able to optimize by using your suggestion of testing the buffer, but in that case, if fill values weren’t written, you may not be guaranteed that the data buffer for the chunk is actually zero-filled.

hknibbe · December 11, 2018, 7:22am

Hi Jordan,

Thanks for your reply. Your post confirms my suspicions. The link is very helpful.

Regards,
Hans

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Allocation time of chunked dataset (collective calls)