I’m happy to announce the availability of a new experimental backend for HDF5-UDF that lets one populate dataset values using CUDA kernels. Moreover, if the user-defined function happens to take input from other datasets from the HDF5 file, such dependencies are DMA-transferred from disk to the GPU memory using NVIDIA’s GPUDirect Storage.
Here’s a screenshot that gives you an idea of how to use this backend. Note how simple it is to invoke the kernel: the data retrieved with
lib.getData() is allocated in GPU memory, so it’s readily available to the CUDA kernel. HDF5-UDF takes care of copying the results from device memory to the host, too, so no explicit calls to NVIDIA APIs are needed to get started.
A current limitation of this implementation is that DMA transfers are only possible if dependencies have a contiguous layout on disk. It would be nice if we had an API such as
H5Dget_chunk_offsets(hid_t dset_id) which provided us with the extents where the dataset chunks are stored. If we had that, then we could both DMA-transfer chunked datasets and decompress them in the GPU itself.
Please visit the project’s GPUDirect Storage branch if you’re interested in testing this feature.