I would like to do a filtered collective write on data that lives on GPU (Nvidia). The filter would compress the GPU data using a fast GPU algorithm, then transfer the data to CPU to let HDF5 continue the write operation.
What happens is that the collective write crashes prior to calling the filter. In the debugger, the call stack shows a call to H5D_gather_mem. The memory gather calls a memcpy. The memcpy most probably fails because it is trying to copy GPU data.
Assuming the data must be compressed on GPU, a work-around could look like this:
Copy data from GPU to CPU
Create the dataset
write the data
in filter function, copy data back to GPU, compress it, copy compressed to CPU.
The number of copy operations can have a big impact on performance. Is there a solution to this problem?
if I understand correctly your data is already on the GPU and you’d like to bypass the filter pipeline of HDF5. You can achieve this by using the H5D_write_chunk function ( https://portal.hdfgroup.org/display/HDF5/H5D_WRITE_CHUNK ). This way you can directly pass the already compressed data to HDF5 to write to the file, so there is no need to copy the uncompressed data to CPU memory.
Thanks for your reply. I was not aware of this function.
I gave it a try, and H5Dwrite_chunk seems to be working if I don’t use collective-IO.
If I use collective-IO with any number of MPI-ranks (including 1), I get the following errors:
HDF5-DIAG: Error detected in HDF5 (1.10.3) MPI-process 0:
#000: H5Dio.c line 404 in H5Dwrite_chunk(): can't write unprocessed chunk data
major: Dataset
minor: Write failed
#001: H5Dchunk.c line 460 in H5D__chunk_direct_write(): unable to write raw data to file
major: Dataset
minor: Write failed
#002: H5Fio.c line 165 in H5F_block_write(): write through page buffer failed
major: Low-level I/O
minor: Write failed
#003: H5PB.c line 1028 in H5PB_write(): write through metadata accumulator failed
major: Page Buffering
minor: Write failed
#004: H5Faccum.c line 826 in H5F__accum_write(): file write failed
major: Low-level I/O
minor: Write failed
#005: H5FDint.c line 258 in H5FD_write(): driver write request failed
major: Virtual File Layer
minor: Write failed
#006: H5FDmpio.c line 1775 in H5FD_mpio_write(): MPI_File_set_view failed
major: Internal error (too specific to document in detail)
minor: Some MPI function failed
#007: H5FDmpio.c line 1775 in H5FD_mpio_write(): Invalid datatype, error stack:
MPIR_Ext_datatype_iscommitted(70): Invalid datatype
major: Internal error (too specific to document in detail)
minor: MPI Error String
I call H5Dwrite_chunk like this:
H5::DSetMemXferPropList xfer_plist;
H5Pset_dxpl_mpio(xfer_plist.getId() , H5FD_MPIO_COLLECTIVE);
.
// Here, filter GPU data on GPU, copy data to CPU
.
// Write chunk
H5Dwrite_chunk( dset_id, xfer_plist.getId(), 0, offset, chunk_dims[0]*chunk_dims[1]*chunk_dims[2] *sizeof(int), cpuData );
I think the parallel write of compressed chunks is a relatively new addition to HDF5, so unfortunately I’m not that familiar with its features. It seems however, there has been some discussion relating to this feature in the following thread: https://forum.hdfgroup.org/t/compressed-parallel-writing-problem/4979
Thanks for your reply and pointing me to the other thread.
From the other thread, it seems that version 1.10.3 (I’ve been using this version) has a bug which could explain my problem.
I’ll download and try v1.10.5.