Append HDF5 files in parallel

I have thousands of HDF5 files that need to be merged into a single file.
Merging is simply to append all groups and datasets of one file after
another in a new output file. The group names of the input files are all
different from one another. In addition, all datasets are chunked and
compressed.

My question is how do I merge the files in parallel?

My implementation consists of the following steps:

  1. all MPI processes collectively open all input files,
  2. all MPI processes collectively create the single shared output file,
  3. all processes read the metadata of groups and datasets of all input
    files and create the same in the output file,
  4. assigns the input files evenly among all processes,
  5. each process independently reads datasets from the assigned files and
    writes them to the output file. My intent is such that each process
    can read and write disjoined sets of datasets in parallel.

However, I encountered the error below indicating the compressed datasets
can only be written in collective mode.
HDF5-DIAG: Error detected in HDF5 (1.12.0) MPI-process 15:
#000: …/…/hdf5-1.12.0/src/H5Dio.c line 314 in H5Dwrite(): can’t write data
#005: …/…/hdf5-1.12.0/src/H5Dio.c line 1194 in H5D__ioinfo_adjust(): Can’t perform independent write with filters in pipeline.

Can this file merge operation be done in parallel for compressed datasets?

Wei-keng

I think you definitely want to use H5D[read,write]_chunk() to bypass the filter pipeline. Unfortunately, they aren’t supported w/ MPI, at the moment. How big are the chunks? To maximize bandwidth utilization, maybe you want to H5Dread_chunk as many chunks as you can on different ranks from different files, and buffer them in RAM, while you have a single writer that is being fed the chunks (locally and via the interconnect) and that does the H5Dwrite_chunk writing. This is admittedly a little kludgy but might get the job done in a reasonable amount of time.

Best, G.

If you don’t actually require everything to get merged into one file (but just want it to look that way from the application’s point of view), you could just create a master file with a bunch of external links to groups/datasets in the other files. This would easy enough that one process would suffice to create the master file.

Using external links is an Interesting idea.
I will give it a try. Thanks.

I used H5Dread_chunk and H5Dwrite_chunk as suggested.
The error message was gone. This appears to work for
independent I/O mode.

However, I was encountering a new problem. The program
hangs at H5Fclose. After some digging into HDF5 source codes,
I found the hanging is inside of H5FD__mpio_truncate() and the same
problem was reported earlier by @gdsjaar in Hang in H5Fclose

I added a printf statement to check the values of ‘size’ and ‘needed_eof’.
Below is what I got from my program running on 2 MPI processes.
[0] …/…/hdf5-1.12.0/src/H5FDmpio.c line 1654: size=5113724 needed_eof=4683672
[1] …/…/hdf5-1.12.0/src/H5FDmpio.c line 1654: size=5113724 needed_eof=5113724

My program hangs because of the MPI_Barrier called inside the if condition:
if(size != needed_eof) {