Does PHDF5 forces all the process to write the same quantity of information?

Mathieu_Gontier · September 12, 2011, 7:32am

Dear all,

I am implementing a parallel IO solution based on ADIOS into a structured flow solver, in order to replace an CGNS host slave approach with is not competitive for HPC.

In this context, it seems using PHDF5 requires each process writes the same quantity of data, or more precisely, each process have to call the write functions N times per process. If I do not make mistake, it is due to some synchronous MPI function.
Here the back trace of a process while my host, which does not write anything, is waiting into the close function:
#0 0x00007f4ce7e45d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007f4ce7e49ab6 in abort () at abort.c:92
#2 0x00007f4ce7e7ed7b in __libc_message (do_abort=2, fmt=0x7f4ce7f67400 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
#3 0x00007f4ce7e88bb6 in malloc_printerr (action=3, str=0x7f4ce7f67980 "malloc(): memory corruption (fast)", ptr=<value optimized out>) at malloc.c:6283
#4 0x00007f4ce7e8be78 in _int_malloc (av=0x7f4ce81a11c0, bytes=<value optimized out>) at malloc.c:4308
#5 0x00007f4ce7e8e31e in __libc_malloc (bytes=48) at malloc.c:3660
#6 0x0000000001a792ac in H5FL_blk_malloc ()
#7 0x0000000001a7945d in H5FL_blk_calloc ()
#8 0x0000000001abaf7b in H5O_create ()
#9 0x0000000001a88166 in H5G_obj_create_real ()
#10 0x0000000001a8859f in H5G_obj_create ()
#11 0x0000000001a7cab5 in H5G_create ()
#12 0x0000000001bf9ec0 in H5O_group_create ()
#13 0x0000000001ac0446 in H5O_obj_create ()
#14 0x0000000001ab2076 in H5L_link_cb ()
#15 0x0000000001a8e140 in H5G_traverse_real ()
#16 0x0000000001a8e50b in H5G_traverse ()
#17 0x0000000001aaef08 in H5L_create_real ()
#18 0x0000000001ab2c9a in H5L_link_object ()
#19 0x0000000001a7b396 in H5G_create_named ()
#20 0x0000000001a7f1cb in H5Gcreate1 ()
#21 0x0000000001a05d85 in hw_gopen ()
#22 0x0000000001a04a5c in hw_var ()
#23 0x0000000001a036ae in adios_phdf5_write ()
#24 0x00000000019a5207 in common_adios_write ()
#25 0x00000000019a4786 in adios_write ()

So, does someone has any idea? Does HDF5 in parallel forces all the processes to write the same quantity of data or did I make a mistake?...

Thanks for your help.

···

--
/
Mathieu Gontier
skype: mathieu_gontier /

robl · September 12, 2011, 2:53pm

please note that "same quantity" and "same number of calls" are not at
all equivalent statements.

I think you need the call H5S_select_none so the "do nothing" workers
can still participate in this collective routine, even if they have no
i/o to contribute.

http://www.hdfgroup.org/HDF5/doc/RM/RM_H5S.html#Dataspace-SelectNone

==rob

···

On Mon, Sep 12, 2011 at 09:32:43AM +0200, Mathieu Gontier wrote:

In this context, it seems using PHDF5 requires each process writes
the same quantity of data, or more precisely, each process have to
call the write functions N times per process.

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

Quincey_Koziol · September 14, 2011, 1:20pm

Hi Mathieu,
One thing to keep in mind also is the difference between working with metadata in the HDF5 file (things like creating objects, extending datasets, adding attributes, etc) and accessing the "raw" data elements in a dataset. Currently, metadata modifications are required to be collective (although we are working toward changing that), but accessing raw data can be collective or independent. The stack trace below is from a metadata modification, but the rest of your text mainly seems to be focusing on writing raw data...

Quincey

···

On Sep 12, 2011, at 2:32 AM, Mathieu Gontier wrote:

Dear all,

I am implementing a parallel IO solution based on ADIOS into a structured flow solver, in order to replace an CGNS host slave approach with is not competitive for HPC.

In this context, it seems using PHDF5 requires each process writes the same quantity of data, or more precisely, each process have to call the write functions N times per process. If I do not make mistake, it is due to some synchronous MPI function.
Here the back trace of a process while my host, which does not write anything, is waiting into the close function:
#0 0x00007f4ce7e45d05 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007f4ce7e49ab6 in abort () at abort.c:92
#2 0x00007f4ce7e7ed7b in __libc_message (do_abort=2, fmt=0x7f4ce7f67400 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
#3 0x00007f4ce7e88bb6 in malloc_printerr (action=3, str=0x7f4ce7f67980 "malloc(): memory corruption (fast)", ptr=<value optimized out>) at malloc.c:6283
#4 0x00007f4ce7e8be78 in _int_malloc (av=0x7f4ce81a11c0, bytes=<value optimized out>) at malloc.c:4308
#5 0x00007f4ce7e8e31e in __libc_malloc (bytes=48) at malloc.c:3660
#6 0x0000000001a792ac in H5FL_blk_malloc ()
#7 0x0000000001a7945d in H5FL_blk_calloc ()
#8 0x0000000001abaf7b in H5O_create ()
#9 0x0000000001a88166 in H5G_obj_create_real ()
#10 0x0000000001a8859f in H5G_obj_create ()
#11 0x0000000001a7cab5 in H5G_create ()
#12 0x0000000001bf9ec0 in H5O_group_create ()
#13 0x0000000001ac0446 in H5O_obj_create ()
#14 0x0000000001ab2076 in H5L_link_cb ()
#15 0x0000000001a8e140 in H5G_traverse_real ()
#16 0x0000000001a8e50b in H5G_traverse ()
#17 0x0000000001aaef08 in H5L_create_real ()
#18 0x0000000001ab2c9a in H5L_link_object ()
#19 0x0000000001a7b396 in H5G_create_named ()
#20 0x0000000001a7f1cb in H5Gcreate1 ()
#21 0x0000000001a05d85 in hw_gopen ()
#22 0x0000000001a04a5c in hw_var ()
#23 0x0000000001a036ae in adios_phdf5_write ()
#24 0x00000000019a5207 in common_adios_write ()
#25 0x00000000019a4786 in adios_write ()

So, does someone has any idea? Does HDF5 in parallel forces all the processes to write the same quantity of data or did I make a mistake?...

Thanks for your help.

--

Mathieu Gontier
skype: mathieu_gontier
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

robl · September 12, 2011, 3:47pm

exactly.

if it helps, you can have different sized domains, too.
e.g. one process can have a count array of of 1024,1024,1024 and
another could have 3,3,3. That wouldn

The important fact to remember is that these I/O calls are collective
(once passed in the appropriate property list), so like MPI_BARRIER or
other MPI collectives, all processors in the communicator need to make
the call.

Sometimes if the i/o is always on a certain set of processors,
applications make a sub-communicator and pass that into HDF5.
Probably do not need to worry about that, though.

==rob

···

On Mon, Sep 12, 2011 at 05:05:51PM +0200, Mathieu Gontier wrote:

Hi Rob,

Many thanks for your help.

So, in my problem, it is more about "number of calls" for the
moment. Each process (running a structured solver) is in charge of a
number of sub domains (different on each process) which have
different sizes.
So, if I will understand what you are suggesting is:
- each process loops over all the sub domains;
- if the domain is hosted by the process, it has to read
- else, call H5S_select_none
Right?

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

Mathieu_Gontier · September 12, 2011, 3:51pm

Many thanks rob

···

On 09/12/2011 05:47 PM, Rob Latham wrote:

On Mon, Sep 12, 2011 at 05:05:51PM +0200, Mathieu Gontier wrote:

Hi Rob,

Many thanks for your help.

So, in my problem, it is more about "number of calls" for the
moment. Each process (running a structured solver) is in charge of a
number of sub domains (different on each process) which have
different sizes.
So, if I will understand what you are suggesting is:
- each process loops over all the sub domains;
- if the domain is hosted by the process, it has to read
- else, call H5S_select_none
Right?

exactly.

if it helps, you can have different sized domains, too.
e.g. one process can have a count array of of 1024,1024,1024 and
another could have 3,3,3. That wouldn

The important fact to remember is that these I/O calls are collective
(once passed in the appropriate property list), so like MPI_BARRIER or
other MPI collectives, all processors in the communicator need to make
the call.

Sometimes if the i/o is always on a certain set of processors,
applications make a sub-communicator and pass that into HDF5.
Probably do not need to worry about that, though.

==rob

--
/
Mathieu Gontier
skype: mathieu_gontier /