behavior of pHDF5 when closing files

Matthieu_Dorier · March 16, 2011, 10:09am

Hello,

I have an application that writes a big file using pHDF5, I measured the
time spent in IO phases by each process using MPI_Wtime before the phase and
right after closing everything (H5close). At large scale and since my
filesystem and network are not supposed to be able to sustain such a load, I
was expecting to see some processes getting all the available bandwidth
first, then some slower processes, thus leading to a variance in the time
for a process to write its piece of data. But I realized that the write
times in all the processes were almost the same (extremely low variability,
in the order of 0.1s). Is their a barrier within H5close when using pHDF5
(or H5Fclose) that forces all the participants to sync before going on? If
so how can I retrieve the time that an individual process spends actually
writing?
Thank you

Regards

···

--
Matthieu Dorier
ENS Cachan, antenne de Bretagne
Département informatique et télécommunication
http://perso.eleves.bretagne.ens-cachan.fr/~mdori307/wiki/

Quincey_Koziol · March 16, 2011, 12:47pm

Hi Matthieu,

···

On Mar 16, 2011, at 5:09 AM, Matthieu Dorier wrote:

Hello,

I have an application that writes a big file using pHDF5, I measured the time spent in IO phases by each process using MPI_Wtime before the phase and right after closing everything (H5close). At large scale and since my filesystem and network are not supposed to be able to sustain such a load, I was expecting to see some processes getting all the available bandwidth first, then some slower processes, thus leading to a variance in the time for a process to write its piece of data. But I realized that the write times in all the processes were almost the same (extremely low variability, in the order of 0.1s). Is their a barrier within H5close when using pHDF5 (or H5Fclose) that forces all the participants to sync before going on? If so how can I retrieve the time that an individual process spends actually writing?
Thank you

There isn't a good way to do this with the current capabilities of the parallel I/O code in the HDF5 library. :-/ However, you could enable the MPE reporting in your MPI implementation and use that (which will give you information about the MPI-level I/O operations), and some/most HPC machines give some way to track I/O operations at the next level down also (usually by setting environment variables or MPI info objects).

That said, we are working on two projects for the HDF5 library right now that will allow much more information to be extracted from the lower levels of the I/O stack. We're working toward implementing a generic way to stack virtual file drivers (VFDs) on top of each other. We're also working on enhancements to the "logging" VFD that we currently have (which currently only logs serial I/O operations), to extract much more useful information. Then, once the stackable VFD interface is in place, we'll refactor the logging VFD as a pass-through layer, allowing both serial and parallel I/O operations to be logged. I'm not certain about the completion date for both of these, but I'll post notices to the forum as we make progress.

Quincey