H5Fclose hangs when using parallel HDF5

(To resurrect a variant on a long-dead thread...
http://hdf-forum.184993.n3.nabble.com/H5Fclose-hangs-when-using-parallel-HDF5-td3369047.html)

I have an MPI parallel application that is reproducibly hanging within
H5Fclose for HDF5 1.8.12 on OpenMPI/1.6.5. The application
always collectively writes HDF5 metadata to the best of my knowledge.
The hang occurs after opening, writing to, and closing two files in
quick succession. I'm quite stuck here, and would appreciate any
thoughts for how to debug this.

Out of 96 total ranks, I see two different distinct backtraces at the
time of the hang. On 88 ranks I observe

#6 0x00002b77dcd5f402 in PMPI_Barrier (comm=0x2b7ff40) at pbarrier.c:70
#7 0x00002b77dc279455 in H5FD_mpio_truncate () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#8 0x00002b77dc2729b9 in H5FD_truncate () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#9 0x00002b77dc260559 in H5F_dest () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#10 0x00002b77dc2619bd in H5F_try_close () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#11 0x00002b77dc261cc4 in H5F_close () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#12 0x00002b77dc2d38a2 in H5I_dec_ref () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#13 0x00002b77dc2d3952 in H5I_dec_app_ref () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#14 0x00002b77dc2625da in H5Fclose () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#15 0x00002b77dbd64151 in esio_file_close (h=0x2b79020) at esio.c:730
#16 0x00002b77dbd642eb in esio_file_close_restart (h=0x2b79020,
restart_template=0x2686f58 "sample#####.h5", retain_count=32768) at
esio.c:770

while on the remaining 8 ranks I see

#6 0x00002b517f4e7402 in PMPI_Barrier (comm=0x32d8640) at pbarrier.c:70
#7 0x00002b518ff4868c in mca_io_romio_dist_MPI_File_close
(mpi_fh=0x32faa80) at close.c:62
#8 0x00002b517f4d4b6e in file_destructor (file=0x32d8300) at file/file.c:273
#9 0x00002b517f4d4a89 in opal_obj_run_destructors (object=0x32d8300)
at ../opal/class/opal_object.h:448
#10 ompi_file_close (file=0x32f8c20) at file/file.c:146
#11 0x00002b517f501c48 in PMPI_File_close (fh=0x32f8c20) at pfile_close.c:59
#12 0x00002b517ea02454 in H5FD_mpio_close () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#13 0x00002b517e9f8a61 in H5FD_close () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#14 0x00002b517e9e84fb in H5F_dest () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#15 0x00002b517e9e99bd in H5F_try_close () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#16 0x00002b517e9e9cc4 in H5F_close () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#17 0x00002b517ea5b8a2 in H5I_dec_ref () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#18 0x00002b517ea5b952 in H5I_dec_app_ref () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#19 0x00002b517e9ea5da in H5Fclose () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#20 0x00002b517e4ec151 in esio_file_close (h=0x32f9100) at esio.c:730
#21 0x00002b517e4ec2eb in esio_file_close_restart (h=0x32f9100,
restart_template=0x262f6d8 "sample#####.h5", retain_count=32768) at
esio.c:770

with everything underneath level #6 being OpenMPI's MPI_Barrier
details. The application from frames #20, #21, and higher (not shown)
are the same.

The H5FD_truncate in frame #8 above call seems to be near
src/H5F.c:1140 within the HDF 1.8.12 source code. My H5pubconf.h does
show H5_MPI_FILE_SET_SIZE_BIG to be defined to 1 which might matter
since that makes it look like the PMPI_Barrier that's hanging is
src/H5FDmpio.c:2020 on 88/96 ranks while on the remaining 8/96 ranks
it's a barrier hidden underneath the MPI_File_close call at
src/H5FDmpic.c:1174.

The final 8/96 ranks are on a node doing a bit more work with
opening/saving the sequence of two files. Specifically, one rank on
that node is responsible for moving the first file after it is closed
and for copying an auxiliary file into position before it is
collectively opened. Those operations use some basic POSIX calls. I
have tried to be careful to flush those changes, and have I
incorporated MPI Barriers to prevent the other ranks from rushing
ahead.

The same code runs without issue on mpich2/1.4.1 with HDF 1.8.9. It
has also run on TACC's Lonestar resource which is, I think, mvapich
and HDF/1.8.5-p1. I can't check the mvapich version at the moment as
Lonestar is down tonight.

Thanks for your time,
Rhys

Hi Rhys,

Do you have a reproducer program that I can look at?

Thanks,
Mohamad

···

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Rhys Ulerich
Sent: Monday, March 03, 2014 3:27 AM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] Fwd: H5Fclose hangs when using parallel HDF5

(To resurrect a variant on a long-dead thread...
http://hdf-forum.184993.n3.nabble.com/H5Fclose-hangs-when-using-parallel-HDF5-td3369047.html)

I have an MPI parallel application that is reproducibly hanging within H5Fclose for HDF5 1.8.12 on OpenMPI/1.6.5. The application always collectively writes HDF5 metadata to the best of my knowledge.
The hang occurs after opening, writing to, and closing two files in quick succession. I'm quite stuck here, and would appreciate any thoughts for how to debug this.

Out of 96 total ranks, I see two different distinct backtraces at the time of the hang. On 88 ranks I observe

#6 0x00002b77dcd5f402 in PMPI_Barrier (comm=0x2b7ff40) at pbarrier.c:70
#7 0x00002b77dc279455 in H5FD_mpio_truncate () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#8 0x00002b77dc2729b9 in H5FD_truncate () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#9 0x00002b77dc260559 in H5F_dest () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#10 0x00002b77dc2619bd in H5F_try_close () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#11 0x00002b77dc261cc4 in H5F_close () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#12 0x00002b77dc2d38a2 in H5I_dec_ref () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#13 0x00002b77dc2d3952 in H5I_dec_app_ref () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#14 0x00002b77dc2625da in H5Fclose () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#15 0x00002b77dbd64151 in esio_file_close (h=0x2b79020) at esio.c:730
#16 0x00002b77dbd642eb in esio_file_close_restart (h=0x2b79020,
restart_template=0x2686f58 "sample#####.h5", retain_count=32768) at
esio.c:770

while on the remaining 8 ranks I see

#6 0x00002b517f4e7402 in PMPI_Barrier (comm=0x32d8640) at pbarrier.c:70
#7 0x00002b518ff4868c in mca_io_romio_dist_MPI_File_close
(mpi_fh=0x32faa80) at close.c:62
#8 0x00002b517f4d4b6e in file_destructor (file=0x32d8300) at file/file.c:273
#9 0x00002b517f4d4a89 in opal_obj_run_destructors (object=0x32d8300) at ../opal/class/opal_object.h:448
#10 ompi_file_close (file=0x32f8c20) at file/file.c:146
#11 0x00002b517f501c48 in PMPI_File_close (fh=0x32f8c20) at pfile_close.c:59
#12 0x00002b517ea02454 in H5FD_mpio_close () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#13 0x00002b517e9f8a61 in H5FD_close () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#14 0x00002b517e9e84fb in H5F_dest () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#15 0x00002b517e9e99bd in H5F_try_close () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#16 0x00002b517e9e9cc4 in H5F_close () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#17 0x00002b517ea5b8a2 in H5I_dec_ref () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#18 0x00002b517ea5b952 in H5I_dec_app_ref () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#19 0x00002b517e9ea5da in H5Fclose () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#20 0x00002b517e4ec151 in esio_file_close (h=0x32f9100) at esio.c:730
#21 0x00002b517e4ec2eb in esio_file_close_restart (h=0x32f9100,
restart_template=0x262f6d8 "sample#####.h5", retain_count=32768) at
esio.c:770

with everything underneath level #6 being OpenMPI's MPI_Barrier details. The application from frames #20, #21, and higher (not shown) are the same.

The H5FD_truncate in frame #8 above call seems to be near
src/H5F.c:1140 within the HDF 1.8.12 source code. My H5pubconf.h does show H5_MPI_FILE_SET_SIZE_BIG to be defined to 1 which might matter since that makes it look like the PMPI_Barrier that's hanging is
src/H5FDmpio.c:2020 on 88/96 ranks while on the remaining 8/96 ranks it's a barrier hidden underneath the MPI_File_close call at src/H5FDmpic.c:1174.

The final 8/96 ranks are on a node doing a bit more work with opening/saving the sequence of two files. Specifically, one rank on that node is responsible for moving the first file after it is closed and for copying an auxiliary file into position before it is collectively opened. Those operations use some basic POSIX calls. I have tried to be careful to flush those changes, and have I incorporated MPI Barriers to prevent the other ranks from rushing ahead.

The same code runs without issue on mpich2/1.4.1 with HDF 1.8.9. It has also run on TACC's Lonestar resource which is, I think, mvapich and HDF/1.8.5-p1. I can't check the mvapich version at the moment as Lonestar is down tonight.

Thanks for your time,
Rhys

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Do you have a reproducer program that I can look at?

I'll trim down what I have as much as possible, but it may be a week
or more until I find the time. I'll report back either
success/failure with a recreate if successful.

In the meantime, comparing...

#8 0x00002b77dc2729b9 in H5FD_truncate () from /home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#9 0x00002b77dc260559 in H5F_dest () from /home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#10 0x00002b77dc2619bd in H5F_try_close () from /home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8

...vs...

#13 0x00002b517e9f8a61 in H5FD_close () from /home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#14 0x00002b517e9e84fb in H5F_dest () from /home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#15 0x00002b517e9e99bd in H5F_try_close () from /home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8

...any guesses for why H5F_dest() might take a node-dependent path?

Lastly, am I nuts to be running atop OpenMPI 1.6.5 in the first place?
I ask as http://www.hdfgroup.org/HDF5/release/platforms5.html shows
no OpenMPI coverage in the test matrix.

- Rhys

Well we don't test OpenMPI, do we can't include it in our supported platforms list.
But I am not aware of any issues that would not allow you to run on top Open MPI.

Any chance you can upgrade to a newer version of ompi?

In any case, I do not see why you should have different code paths when closing the files.
You said you were working with 2 files at the same time. Are you closing them in the same order on all processes?

Also make sure all collective operations are called in-order on all processes. (I'm guessing you know this already, but all metadata update operations have to be collective).
Here is the list:
http://www.hdfgroup.org/HDF5/doc/RM/CollectiveCalls.html

If you still can't figure it out, then I will need something to replicate the issue and figure out what's going on.

Thanks,
Mohamad

···

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Rhys Ulerich
Sent: Monday, March 03, 2014 10:12 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Fwd: H5Fclose hangs when using parallel HDF5

Do you have a reproducer program that I can look at?

I'll trim down what I have as much as possible, but it may be a week or more until I find the time. I'll report back either success/failure with a recreate if successful.

In the meantime, comparing...

#8 0x00002b77dc2729b9 in H5FD_truncate () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#9 0x00002b77dc260559 in H5F_dest () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#10 0x00002b77dc2619bd in H5F_try_close () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8

...vs...

#13 0x00002b517e9f8a61 in H5FD_close () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#14 0x00002b517e9e84fb in H5F_dest () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8
#15 0x00002b517e9e99bd in H5F_try_close () from
/home1/rhys/Stow/hdf5-1.8.12/lib/libhdf5.so.8

...any guesses for why H5F_dest() might take a node-dependent path?

Lastly, am I nuts to be running atop OpenMPI 1.6.5 in the first place?
I ask as http://www.hdfgroup.org/HDF5/release/platforms5.html shows no OpenMPI coverage in the test matrix.

- Rhys

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

I would be interested to hear if this problem occurs with MPICH-3.1: the backtraces indicate "stuff" is going on in the attribute cleanup code, and we (the MPICH/ROMIO folks) have had some back and forth with the OpenMPI folks about order of destruction of attributes. I *thought* we had that cleared up several years ago. If you can test with MPICH that will help us understand the problem a bit better.

You can see in one set of backtraces a subset of HDF5 processs are trying to resize the file at close, while meanwhile another set of MPI processes are trying to close the file as part of a "destructor".

==rob

···

On 03/03/2014 10:11 AM, Rhys Ulerich wrote:

Lastly, am I nuts to be running atop OpenMPI 1.6.5 in the first place?
  I ask as http://www.hdfgroup.org/HDF5/release/platforms5.html shows
no OpenMPI coverage in the test matrix.

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA