I have thousands of HDF5 files that need to be merged into a single file.
Merging is simply to append all groups and datasets of one file after
another in a new output file. The group names of the input files are all
different from one another. In addition, all datasets are chunked and
My question is how do I merge the files in parallel?
My implementation consists of the following steps:
- all MPI processes collectively open all input files,
- all MPI processes collectively create the single shared output file,
- all processes read the metadata of groups and datasets of all input
files and create the same in the output file,
- assigns the input files evenly among all processes,
- each process independently reads datasets from the assigned files and
writes them to the output file. My intent is such that each process
can read and write disjoined sets of datasets in parallel.
However, I encountered the error below indicating the compressed datasets
can only be written in collective mode.
HDF5-DIAG: Error detected in HDF5 (1.12.0) MPI-process 15:
#000: …/…/hdf5-1.12.0/src/H5Dio.c line 314 in H5Dwrite(): can’t write data
#005: …/…/hdf5-1.12.0/src/H5Dio.c line 1194 in H5D__ioinfo_adjust(): Can’t perform independent write with filters in pipeline.
Can this file merge operation be done in parallel for compressed datasets?