About Parallel Hdf5 Reading for a 3D dataset

Yongsheng_Pan · September 9, 2011, 9:31pm

Hello, dear hdf experts,

I am using the parallel hdf5 library to access a three-dimensional dataset
in a hdf5 file on linux. The dataset is of size: dim[0]=1501, dim[1] = 1536 and
dim[2] = 2048. I want to read the dataset along the first (READ_Z), second(READ_Y),
and third (READ_X) dimension separately.

However, the performance is quite differently.

(1) Reading along the first dimension (#define READ_Z) is very fast. It takes
about 0.8 seconds to read a 24 * 1536 * 2048 unsigned integers using one node.

(2) Reading along the second dimension (#define READ_Y) is much slower. It takes
about 15 seconds to read a 1501 * 25 * 2048 unsigned integers using one node.

(3) Reading along the third dimension (#define READ_X) is the slowest. It takes
about 30 seconds to read a 1501 * 1536 * 64 unsigned integers using one node.

The first case (1) seems reasonable to me. The other two cases do not. But I do not
know why. I see that hdf5 library uses MPI IO (MPI_VECTOR_TYPE and MPI_FILE_SET_VIEW)
for parallel access. But I do not know why the performance is so different. Is it
from the parallel library? Or is it from my codes (some parameters)? Please check the
attached codes for details. Thanks.

I am using mpich2 and Parallel hdf5 1.8.5 on linux. The compiler is mpicc from mpich2.

I would appreciate your comments and suggestions a lot.

Best regards,

Yongsheng Pan
Postdoctoral Appointee
Advanced Photon Science
Argonne National Laboratory

phdf_test.cpp (8.82 KB)

robl · September 12, 2011, 2:51pm

Hello, dear hdf experts,

Howdy, fellow ANL-er

I am using the parallel hdf5 library to access a three-dimensional dataset
in a hdf5 file on linux. The dataset is of size: dim[0]=1501, dim[1] = 1536 and
dim[2] = 2048. I want to read the dataset along the first (READ_Z), second(READ_Y),
and third (READ_X) dimension separately.

However, the performance is quite differently.

Sure. In one order, the bytes are all laid out nice and contiguous.
A reader can just zip through. In another order, the access is
non-contiguous, requiring collecting more pieces and parts from across
the dataset.

I am using mpich2 and Parallel hdf5 1.8.5 on linux. The compiler is mpicc from mpich2.

Are you perhaps using argonne's Fusion cluster? I don't know anything
about the APS compute resources, so I can't offer any tuning
suggestions there.

I don't think you should turn off collectives.

I do think you should think if there's some way your application in
the aggregate could read the entire dataset in a single call. What i
mean is, a given process can be decomposed in any way you like, but if
you think of each processes decomposition as a puzzle piece, all the
puzzle pieces would fit together to be the full 3d array. Then, your
MPI-IO library can work some magic.

==rob

···

On Fri, Sep 09, 2011 at 04:31:39PM -0500, Yongsheng Pan wrote:

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

Izaak_Beekman · September 10, 2011, 8:42pm

Have you tried individual data transfer instead of collective? How
many nodes are reading the file? How is your domain decomposition
done? It's a little bit hard to understand from a quick look at your
code what the domain decomposition strategy is, and which nodes read
which data. Also, this won't help with your timing issues but it seems
that you call mpi_barier more often than necessary.

Izaak Beekman

···

===================================
(301)244-9367
UMD-CP Visiting Graduate Student
Aerospace Engineering
ibeekman@umiacs.umd.edu
ibeekman@umd.edu

On Sat, Sep 10, 2011 at 12:00 PM, <hdf-forum-request@hdfgroup.org> wrote:

About Parallel Hdf5 Reading for a 3D dataset