HDF5 performance issue

Hello All,

I have been trying to improve the IO performance in our nuclear physics code but
HDF5 is consistently performing much slower than binary in terms of IO even when
I use multiple processors in HDF5 as against single processor in HDF5.

To get some expert opinion, I have written 2 small programs that does only the
IO. Can some one please have a look at the code and try running it ?

When I run these 2 codes, binary program writes 2GB file(root only) is about 10
seconds. On the contrary, HDF5 program takes about 40 seconds to write this 2 GB
with 23 processors.

I can put the code as an attachment, but the files required to be read(64 MB)
cannot be attached. Could anyone suggest how can I send these files ?

···

--
Regards,
Nikhil

Regards,
Nikhil

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Nikhil,

No programs were attached.

Have you done an fsync in your binary program to be sure your data are really written to disk? Otherwise you only measure the time it takes to copy your data to the system's file buffers.

Do you write from your 23 processors into a single HDF5 file or multiple files?

Cheers,
Ger

"Nikhil Laghave" <nikhill@iastate.edu> 09/12/08 1:48 AM >>>

Hello All,

I have been trying to improve the IO performance in our nuclear physics code but
HDF5 is consistently performing much slower than binary in terms of IO even when
I use multiple processors in HDF5 as against single processor in HDF5.

To get some expert opinion, I have written 2 small programs that does only the
IO. Can some one please have a look at the code and try running it ?

When I run these 2 codes, binary program writes 2GB file(root only) is about 10
seconds. On the contrary, HDF5 program takes about 40 seconds to write this 2 GB
with 23 processors.

I can put the code as an attachment, but the files required to be read(64 MB)
cannot be attached. Could anyone suggest how can I send these files ?

···

--
Regards,
Nikhil

Regards,
Nikhil

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hello,

I am sorry I did not attach any files since it wasn't possible to attach a 64
MB file. I am attaching the program for a smaller run.

Hi. I do this all the time too -- but there was no code in your
message.

The folder ptestrun contains the program to read a binary file "mfdn.smwf001"
and write the same binary file into another file "binopfile". It also uses a
HDF5 file hdfref, to read the lengths being read and written by the
processors. The program does IO in exactly the same way our actual
code does. IO happens as
follows.
1. Proc 0 reads its part of the vector. Then it reads the vector for the
remaining processors and sends(tag=1) it to the respective processors.
2. Proc 0 writes its part of the vector. Then it writes the vector for the
remaining processors that it receives(tag=2) from the respective processors.

This approach of having rank 0 drive the entire I/O process is going
to give you pretty poor performance, right?

            Yes it should give poor performance. But this is what I am trying
to improve by doing Parallel IO using PHDF5. This program basically emulates
the IO in the older version which I intend to improve.

You're on the right track, using MPI to carry out some coordination
before performing I/O. Just take that a little further.

parallel HDF5 really needs two things to perform well:
- simultaneous accesses from all processes (via hyperslabs)
- collective I/O

The folder testrun contains the program to write the output file in a HDF5
file. It reads the contents from a HDF5 file "mfdn.smwf001" and writes the
dataset named "EigenVectors__" to a new HDF5 file "hdfopfile". The lengths
written by various procs can stored in the dataset "LENGTHS". The IO happens

as

follows.

1. Get the dataset dims from the datasets named "EigenVectors__"

OK, you can do this with rank 0. I'd suggest rank 0 broadcast the
result to everyone. You could either use MPI_Bcast or MPI_Scatter.
Use MPI_Bcast if each process has to compute their 'offset' into the
dataset as well. Use MPI_Scatter if all they need is a length.
                
                  this is a very minor issue. I do this only in the test
program since there is no other way to compute the dimensions. In the main
program, IO is not required to find the dimension. I basically mentioned this
so that it will become easier to understand the code.

2. Create the HDF5 o/p file accordingly.
3. Read the datasets from "mfdn.smwf001" and write it to "hdfopfile"

Once all processes have the lenghts and have computed their offsets,
then you can create hyperslabs to set up a collective read from
mfdn.smwf001 and a collective write to hdfopfile.
                       this is exactly what I am doing(or rather trying to do).
the reads and writes are parallel and issued by all the processors
simultaneously. Unfortunately in my case independent IO outperforms collective
IO.

can tell if I am making any mistakes that is leading to the slow IO. I can't
send the larger runs as attachments since they are around 64 MB and 2 GB.

Maybe your slow performance is because you are you are using the
parallel HDF5 interface to carry out essentially serial I/O? Do
follow up with your code.

I have attached the code. Please have a look if you can.

Thanks a lot.

Nikhil

Regards,
Nikhil

testrun.zip (170 KB)

ptestrun.zip (1.07 MB)

Hello,

I am sorry I did not attach any files since it wasn't possible to attach a 64
MB file. I am attaching the program for a smaller run.

The folder ptestrun contains the program to read a binary file "mfdn.smwf001"
and write the same binary file into another file "binopfile". It also uses a
HDF5 file hdfref, to read the lengths being read and written by the processors.
The program does IO in exactly the same way our actual code does. IO happens as
follows.
1. Proc 0 reads its part of the vector. Then it reads the vector for the
remaining processors and sends(tag=1) it to the respective processors.
2. Proc 0 writes its part of the vector. Then it writes the vector for the
remaining processors that it receives(tag=2) from the respective processors.

The folder testrun contains the program to write the output file in a HDF5
file. It reads the contents from a HDF5 file "mfdn.smwf001" and writes the
dataset named "EigenVectors__" to a new HDF5 file "hdfopfile". The lengths
written by various procs can stored in the dataset "LENGTHS". The IO happens as
follows.

1. Get the dataset dims from the datasets named "EigenVectors__"
2. Create the HDF5 o/p file accordingly.
3. Read the datasets from "mfdn.smwf001" and write it to "hdfopfile"

Although this is a small run and may not fully utilize PHDF5 but may be someone
can tell if I am making any mistakes that is leading to the slow IO. I can't
send the larger runs as attachments since they are around 64 MB and 2 GB.

This program should run exactly on 5 processors.

Thanks a lot. Hope to get some pointers for improving the code.

Regards,
Nikhil

···

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hello,

I am sorry I did not attach any files since it wasn't possible to attach a 64
MB file. I am attaching the program for a smaller run.

Hi. I do this all the time too -- but there was no code in your
message.

The folder ptestrun contains the program to read a binary file "mfdn.smwf001"
and write the same binary file into another file "binopfile". It also uses a
HDF5 file hdfref, to read the lengths being read and written by the
processors. The program does IO in exactly the same way our actual
code does. IO happens as
follows.
1. Proc 0 reads its part of the vector. Then it reads the vector for the
remaining processors and sends(tag=1) it to the respective processors.
2. Proc 0 writes its part of the vector. Then it writes the vector for the
remaining processors that it receives(tag=2) from the respective processors.

This approach of having rank 0 drive the entire I/O process is going
to give you pretty poor performance, right?

You're on the right track, using MPI to carry out some coordination
before performing I/O. Just take that a little further.

parallel HDF5 really needs two things to perform well:
- simultaneous accesses from all processes (via hyperslabs)
- collective I/O

The folder testrun contains the program to write the output file in a HDF5
file. It reads the contents from a HDF5 file "mfdn.smwf001" and writes the
dataset named "EigenVectors__" to a new HDF5 file "hdfopfile". The lengths
written by various procs can stored in the dataset "LENGTHS". The IO happens as
follows.

1. Get the dataset dims from the datasets named "EigenVectors__"

OK, you can do this with rank 0. I'd suggest rank 0 broadcast the
result to everyone. You could either use MPI_Bcast or MPI_Scatter.
Use MPI_Bcast if each process has to compute their 'offset' into the
dataset as well. Use MPI_Scatter if all they need is a length.

2. Create the HDF5 o/p file accordingly.
3. Read the datasets from "mfdn.smwf001" and write it to "hdfopfile"

Once all processes have the lenghts and have computed their offsets,
then you can create hyperslabs to set up a collective read from
mfdn.smwf001 and a collective write to hdfopfile.

can tell if I am making any mistakes that is leading to the slow IO. I can't
send the larger runs as attachments since they are around 64 MB and 2 GB.

Maybe your slow performance is because you are you are using the
parallel HDF5 interface to carry out essentially serial I/O? Do
follow up with your code.

==rob

···

On Fri, Sep 12, 2008 at 12:55:04PM -0500, Nikhil Laghave wrote:

--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.