Hello,
I am updating a Monte Carlo particle transport code written in Fortran to
be run on Titan (http://en.wikipedia.org/wiki/Titan_(supercomputer)),
and I'm having some trouble achieving good performance writing to many
small sections of large datasets with independent I/O. Monitoring with du,
it looks like I'm writing only about 50 MB/s while creating a 78GB file. I
was hoping to get much better from the Lustre filesystem.
For my current case I'm trying to write to 3000 1D datasets in the same
file containing 3,534,960 doubles each, where each individual write will
always be a contiguous 1D chunk of 206 doubles. These writes are coming
from 980 different MPI processes, and each chunk of 206 that needs to be
written could be in any of the 3000 datasets, anywhere within the large 1D
array. I'm doing things in batches, where on each MPI process I collect a
bunch of these chunks of writes on each process before doing a round of
writing.
It's trivial to calculate where in each of the large datasets each chunk
should be written to, so at first I just looped through each chunk to write
to and selected the appropriate hyperslab before writing. I figured this
might be slow because it was jumping all around the datasets for each
little write, so I switched to a scheme where I first sort the writes in
the order they would appear in the dataset, and then create a large union
of irregularly-spaced hyperslabs and do one big write, similar to the
difference discussed here:
http://hdf-forum.184993.n3.nabble.com/reading-multiple-irregularly-spaced-hyperslabs-td426929.html
.
This gave me about a 2 or 3x speedup from where I was, but I'm still pretty
unsatisfied with these write speeds. I'm still a little new to HDF5, so
I'm worried I might be missing something fundamental. I've done a little
playing around with setting the chunking on these datasets to multiples of
206, but I haven't had any success there.
Is there a way to accomplish the write pattern I described more efficiently
than what I'm doing? Example code below showing what I'm doing now. This
is with hdf5 1.8.11.
Thanks in advance for any advice!
Nick
···
---
integer(HID_T) :: file_id, group_id, dset, dspace, plist
integer(HSIZE_T) :: dims(1), block(1), count(1), start(1)
! All the datasets are created up front once
dims(1) = 3534960
do i = 1, 3000
call h5fopen_f(filename, H5F_ACC_RDWR_F, file_id, hdf5_err)
call h5gopen_f(file_id, groupname(i), group_id, hdf5_err)
call h5screate_simple_f(1, dims, dspace, hdf5_err)
call h5dcreate_f(group_id, 'data', H5T_NATIVE_DOUBLE, dspace, dset,
hdf5_err)
call h5dclose_f(dset, hdf5_err)
call h5sclose_f(dspace, hdf5_err)
call h5gclose_f(group_id, hdf5_err)
call h5fclose_f(file_id, hdf5_err)
end do
...
! The rest is run on each MPI process, which has n_chunks chunks of 206 to
write
! The starting index in the large dataset for each chunk j is stored in
chunk_starts(j)
! The chunks themselves are stored in one large contiguous array of
doubles: chunks
! Setup file access property list with parallel I/O access
call h5pcreate_f(H5P_FILE_ACCESS_F, plist, hdf5_err)
call h5pset_fapl_mpio_f(plist, MPI_COMM_WORLD, MPI_INFO_NULL, hdf5_err)
! Open the file
call h5fopen_f(filename, H5F_ACC_RDWR_F, file_id, hdf5_err, access_prp =
plist)
! Close property list
call h5pclose_f(plist, hdf5_err)
! Create the property list to describe independent parallel I/O
call h5pcreate_f(H5P_DATASET_XFER_F, plist, hdf5_err)
call h5pset_dxpl_mpio_f(plist, H5FD_MPIO_INDEPENDENT_F, hdf5_err)
do i = 1, 3000
block = 206
count = 1
! Open the group
call h5gopen_f(file_id, groupname(i), group_id, hdf5_err)
! Open the dataset
call h5dopen_f(group_id, 'data', dset, hdf5_err)
! Open the dataspace and memory space
call h5dget_space_f(dset, dspace, hdf5_err)
call h5screate_simple_f(1, block * n_chunks, memspace, hdf5_err)
! Select the irregular hyperslab
call h5sselect_none_f(dspace, hdf5_err)
do j = 1, n_chunks
idx = chunk_start(j)
start = (idx - 1) * block
! Select the hyperslab
call h5sselect_hyperslab_f(dspace, H5S_SELECT_OR_F, start, &
count, hdf5_err, block = block)
end do
! Write the data
chunks = chunks_for_group(i)
f_ptr = c_loc(chunks(1))
call h5dwrite_f(dset, H5T_NATIVE_DOUBLE, f_ptr, hdf5_err, &
file_space_id = dspace, mem_space_id = memspace, &
xfer_prp = plist)
! Close the dataspace and memory space
call h5sclose_f(dspace, hdf5_err)
call h5sclose_f(memspace, hdf5_err)
end do
! Close the group and file
call h5gclose_f(group_id, hdf5_err)
call h5fclose_f(file_id, hdf5_err)