In short : are there things to know / make sure of / be aware of to get good performance with P-HDF5 ?
My understanding is that "before" HDF5 / P-HDF5, each process of a MPI code was compelled to write / read sequential (and separate) files : this was stressing the file system, and, perfomance was poor (I make it short). My understanding is that P-HDF5 (and MPI-IO) were designed to cope with this problem (MPI-IO is more "low-level", P-HDF5 is built on top of MPI-IO and offers flexibility and ways to structure easily the file).
To test this I wrote a MPI code. There are N MPI processus. For each MPI process, data to write are just an (1D) array of integers (initialised with the rank of the MPI process). In the MPI / HDF5 file, one target to write these data by bunchs one after the other (ordered bunchs). First each MPI process write a file sequentially (each MPI process writes his own sequential file : one get N separate files at the end) : the function used to write is "write" (binary). Then, I use MPI-IO, each process writes the same bunch of data in one file (containing all data, bunch after bunch, from all processus - one get only one file at the end) : the function used to write is MPI_File_write. Then, I use P-HDF5 to do the same thing than with MPI-IO (so one get only one file at the end) : the function used to write is H5Dwrite. I expected to get better performance with MPI-IO and P-HDF5 than with the sequential approach. The spirit of this test code is very simple / basic (each MPI process writes his own block of data in the same file, or, in separate files in the sequential approach).
Note : in each case (sequential, MPI-IO, P-HDF5), when I say "write data in file", I mean writing big blocks / bunch of data at once (I do not write data one by one - I write the biggest block of data, but smaller than 2Gb, that is possible to write).
Note : I tried with N = 1, 2, 4, 8, 16.
Note : I generated files (MPI-IO, P-HDF5) whose size scaled from 1Gb to 16 Gb (which looks like a "very big" file to me).
Note : I followed the P-HDF5 documentation (use H5P_FILE_ACCESS and H5P_DATASET_XFER property list + use hyperslab "by chunks")
Note : the file system is "GPFS" (it has been installed by the cluster vendor : this is supposed to be ready to get performance out of P-HDF5 - I am an "application" guy that try to use HDF5, I am not an "admin sys" that would be familiar with complex related stuffs related to the file system)
Note : I compiled the HDF5 package like this "./configure --enable-parallel".
Note : I use CentOS + GNU compilers (for both HDF5 package and my test code) + hdf5-1.8.13
Note : I use mpic++ (not h5pxx compilers - actually I didn't get why HDF5 provides compilers) to compile my test code, is this a problem ?
The problem is : in all cases, I get always (sequential, MPI-IO, HDF5) the same performances : using P-HDF5 or MPI-IO seems useless !?... And I don't get why : this seems not logical to me. I expected to get some improvements when using P-HDF5 / MPI-IO. I expected to get something like this : http://www.speedup.ch/workshops/w37_2008/HDF5-Tutorial-PDF/PSI-HDF5-PARALLEL.pdf (slide 30). For instance I get :
============================ mpirun -n 8 ./tstIO.exe --dataSize 536870912 ============================
INFO : data block = 536870912 integers per MPI proc X 8 MPI procs = 16384 Mb = 16 Gb
Seq. : write time = 5.2239 sec, read time = 8.5331 sec
MPI-IO : write time = 6.3301 sec, read time = 5.9788 sec
P-HDF5 : write time = 5.9695 sec, read time = 6.1289 sec
============================ mpirun -n 8 ./tstIO.exe --dataSize 1073741824 ============================
INFO : data block = 1073741824 integers per MPI proc X 8 MPI procs = 32768 Mb = 32 Gb
Seq. : write time = 10.426 sec, read time = 14.353 sec
MPI-IO : write time = 11.305 sec, read time = 11.908 sec
P-HDF5 : write time = 10.886 sec, read time = 16.943 sec
I understand I can not get a clear answer to my question as it is not sharp enough. I try to post this to get some clue to get some logic out of the behavior I observe. Does the "code spirit" (compare sequnetial, MPI-IO, P-HDF5 the way I do it) can not enable to see P-HDF5 performance ? If yes, why ("too simple" data set ? data should be gathered at master side before to be written collectivelly) ? How to change this ? Should I look for a problem in the file system ? Or somewhere else ? Missing option(s) when configuring the HDF5 package ? Are there others things I should or can check to be sure I am in a situation where I can get performance out of P-HDF5 ? Are there HDF5 tools / tutorial / benchmark (I could replay) designed to check for performance ? [just before to send this mail, I heard about h5perf : I ran h5perf over 4 MPI processus, I attached the log]
Any relevant clue / information would be appreciated. If what I observe is logical I would just understand why, and, how / when it is possible to get performance out of P-HDF5. I just would like to get some logic out of this.
Thanks for help,
FH
PS : I can give more information and the code, if needed (?)
h5perf.log (15.1 KB)