pHDF5 1.12: data transform in combination with compression filter

jan-willem.blokland · July 30, 2021, 12:23pm

Hello,

Currently, we are working on a library built on top of the HDF5 library. Using this library one can make use of the parallel version of HDF5, compression filter like Szip, and a data transform function. Using this combination, the parallel HDF5 library reports the following error during a write action:

Error: #005: H5Dio.c line 1170 in H5D__ioinfo_adjust(): Can’t perform independent write with filters in pipeline.
The following caused a break from collective I/O:
Local causes: data transforms needed to be applied
Global causes: data transforms needed to be applied
major: Low-level I/O
minor: Can’t perform independent IO
Error: #004: H5Dio.c line 757 in H5D__write(): unable to adjust I/O info for parallel I/O
major: Dataset
minor: Unable to initialize object
Error: #003: H5VLnative_dataset.c line 207 in H5VL__native_dataset_write(): can’t write data
major: Dataset
minor: Write failed
Error: #002: H5VLcallback.c line 2080 in H5VL__dataset_write(): dataset write failed
major: Virtual Object Layer
minor: Write failed
Error: #001: H5VLcallback.c line 2113 in H5VL_dataset_write(): dataset write failed
major: Virtual Object Layer
minor: Write failed
Error: #000: H5Dio.c line 291 in H5Dwrite(): can’t write data
major: Dataset
minor: Write failed

From this error, I understand that this combination is unfortunately not supported. Is there a special reason for this? If I switch from the parallel to the serial version, it all works.

Furthermore, I also tried pHDF5 with only a compression filter. This combination only works if one does a collective write. This is fine for me but it may be idea to mention it in the documentation. At least this is not mentioned in the H5P_SET_SZIP documentation.

Best regards,
Jan-Willem

gheber · August 3, 2021, 2:43pm

Jan-Willem, in principle a dataset transformation should not pose an insurmountable challenge for the parallel implementation. My hunch is that it was just dropped to simplify (?) the first implementation of parallel compression, and not documented.

The collective write requirement is mentioned in passing in the HDF5 1.10.2 release notes (see section “Using compression with HDF5 parallel applications”), and it applies to all compression methods. I will take an action item and append a note to all H5Pset_compressionX calls alerting readers to these limitations.

Best, G.

gheber · August 4, 2021, 12:49pm

I’ve added a reminder to the relevant APIs. See for example H5Pset_deflate. OK?

Best, G.

jan-willem.blokland · August 9, 2021, 8:17am

Thanks for the explanation and for adding a reminder in the relevant APIs.

Are there any plans to extend the current implementation of parallel compression? For us, it would be an extremely useful feature if we can combine compression and data transform function in the parallel version of the HDF5 library.

gheber · August 10, 2021, 1:36pm

No plans as of now, but I’ve created a GitHub issue.

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

pHDF5 1.12: data transform in combination with compression filter