Computational storage with HDF5-UDF

Greetings!

I’m happy to announce the availability of a new experimental backend for HDF5-UDF that lets one populate dataset values using CUDA kernels. Moreover, if the user-defined function happens to take input from other datasets from the HDF5 file, such dependencies are DMA-transferred from disk to the GPU memory using NVIDIA’s GPUDirect Storage.

Here’s a screenshot that gives you an idea of how to use this backend. Note how simple it is to invoke the kernel: the data retrieved with lib.getData() is allocated in GPU memory, so it’s readily available to the CUDA kernel. HDF5-UDF takes care of copying the results from device memory to the host, too, so no explicit calls to NVIDIA APIs are needed to get started.

A current limitation of this implementation is that DMA transfers are only possible if dependencies have a contiguous layout on disk. It would be nice if we had an API such as H5Dget_chunk_offsets(hid_t dset_id) which provided us with the extents where the dataset chunks are stored. If we had that, then we could both DMA-transfer chunked datasets and decompress them in the GPU itself.

Please visit the project’s GPUDirect Storage branch if you’re interested in testing this feature.

Have fun!
Lucas

1 Like

Hi Lucas,

Thank you for the nice new feature! Are you looking for
https://portal.hdfgroup.org/display/HDF5/H5D_GET_CHUNK_INFO
function? It provides chunk address in the file.

Elena

H5D_GET_CHUNK_INFO

H5D_GET_CHUNK_INFO retrieves the offset coordinates offset, filter mask filter_mask, size size and address addr for the dataset specified by the identifier dset_id and the chunk specified by the index index.The chunk belongs to a set of chunks in the selection specified by fspace_id.If the queried chunk does not exist in the file, the size will be set to 0 and address to HADDR_UNDEF.

portal.hdfgroup.org

1 Like

Oh, this is brand new information for me – I was unaware of this new API! Yes, H5D_GET_CHUNK_INFO should definitely do it. I will look into incorporating support for chunked datasets soon.

Thanks for the pointer!
Lucas