parallel HDF5: H5Fclose hangs when not using a power of 2 number of processes

Wolf_Dapp · April 7, 2015, 4:30pm

Dear hdf-forum members,

I have a problem I am hoping someone can help me with. I have a program
that outputs a 2D-array (contiguous, indexed linearly) using parallel
HDF5. When I choose a number of processors that is not a power of 2
(1,2,4,8,...) H5Fclose() hangs, inexplicably. I'm using HDF5 v.1.8.14,
and OpenMPI 1.7.2, on top of GCC 4.8 with Linux.

Can someone help me pinpoint my mistake?

I have searched the forum, and the first hit [searching for "h5fclose
hangs"] was a user mistake that I didn't make (to the best of my
knowledge). The second didn't go on beyond the initial problem
description, and didn't offer a solution.

Attached is a (maybe insufficiently bare-boned, apologies) demonstrator
program. Strangely, the hang only happens if nx >= 32. The code is
adapted from an HDF5 example program.

The demonstrator is compiled with
h5pcc test.hangs.cpp -DVERBOSE -lstdc++

( on my system, for some strange reason, MPI has been compiled with the
deprecated C++ bindings. I need to include -lmpi_cxx also, but that
shouldn't be necessary for anyone else. I hope that's not the reason for
the hang-ups. )

Thanks in advance for your help!

Wolf Dapp

test.hangs.cpp (6.3 KB)

···

--

Timothy_Brown · April 7, 2015, 5:18pm

Hi Wolf,

It doesn't hang for me. I get a seg fault with the following traceback.
By the way, I'm using
- gcc 4.9.2
- openmpi 1.8.4
- szip 2.1
- hdf5 1.8.14
On a x86_64 linux machine.

test.hangs: test.hangs.cpp:121: void writeH5(const char*, double*) [with T = float]: Assertion `status_h5 >= 0' failed.
HDF5-DIAG: Error detected in HDF5 (1.8.14) MPI-process 0:
  #000: H5F.c line 795 in H5Fclose(): decrementing file ID failed
    major: Object atom
    minor: Unable to close file
  #001: H5I.c line 1475 in H5I_dec_app_ref(): can't decrement ID ref count
    major: Object atom
    minor: Unable to decrement reference count
  #002: H5Fint.c line 1259 in H5F_close(): can't close file
    major: File accessibilty
    minor: Unable to close file
  #003: H5Fint.c line 1421 in H5F_try_close(): problems closing file
    major: File accessibilty
    minor: Unable to close file
  #004: H5Fint.c line 861 in H5F_dest(): low level truncate failed
    major: File accessibilty
    minor: Write failed
  #005: H5FD.c line 1908 in H5FD_truncate(): driver truncate request failed
    major: Virtual File Layer
    minor: Can't update object
  #006: H5FDmpio.c line 1982 in H5FD_mpio_truncate(): MPI_File_set_size failed
    major: Internal error (too specific to document in detail)
    minor: Some MPI function failed
  #007: H5FDmpio.c line 1982 in H5FD_mpio_truncate(): MPI_ERR_ARG: invalid argument of some other kind
    major: Internal error (too specific to document in detail)
    minor: MPI Error String
test.hangs: test.hangs.cpp:121: void writeH5(const char*, double*) [with T = float]: Assertion `status_h5 >= 0' failed.

···

--------------------------------------------------------------------------
mpiexec noticed that process rank 4 with PID 1158 on node node1446 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

If you want I can try and help debug it, however I am flat out today, so it'd have to wait till tomorrow. In the mean time, hope this error helps.

Timoth

On Apr 7, 2015, at 10:30 AM, Wolf Dapp <wolf.dapp@gmail.com> wrote:

Dear hdf-forum members,

I have a problem I am hoping someone can help me with. I have a program
that outputs a 2D-array (contiguous, indexed linearly) using parallel
HDF5. When I choose a number of processors that is not a power of 2
(1,2,4,8,...) H5Fclose() hangs, inexplicably. I'm using HDF5 v.1.8.14,
and OpenMPI 1.7.2, on top of GCC 4.8 with Linux.

Can someone help me pinpoint my mistake?

I have searched the forum, and the first hit [searching for "h5fclose
hangs"] was a user mistake that I didn't make (to the best of my
knowledge). The second didn't go on beyond the initial problem
description, and didn't offer a solution.

Attached is a (maybe insufficiently bare-boned, apologies) demonstrator
program. Strangely, the hang only happens if nx >= 32. The code is
adapted from an HDF5 example program.

The demonstrator is compiled with
h5pcc test.hangs.cpp -DVERBOSE -lstdc++

( on my system, for some strange reason, MPI has been compiled with the
deprecated C++ bindings. I need to include -lmpi_cxx also, but that
shouldn't be necessary for anyone else. I hope that's not the reason for
the hang-ups. )

Thanks in advance for your help!

Wolf Dapp

--

<test.hangs.cpp>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

miller86 · April 7, 2015, 5:19pm

Some things to watch out for. . .

Are you by chance accidentally leaving one or more objects in the file 'open' (e.g. did you forget some H5Xclose() call somewhere). I cannot atest to that causing actual hangs in H5Fclose but I know HDF has some logic to detect possible infinite loop in sym-link/group structure for which it sometimes actually outputs a message along the lines of "…infinite loop detected while closing file 'foo.h5' . . .". i sometimes wind up using H5Fget_obj_count just prior to H5Fclose to try to debug this when it (occasionally) has happend for me.

You say you are running in parallel. Is the file on an actual parallel filesystem? Are you by chance mucking with the filesystem's metadata via calls to stat or mkdir or chdir at any time before or after your create or close the HDF5 file? If so, are you ensuring parallel sync. via MPI_barrier before proceeding after such calls?

The core counts you mention are small so you might be able to raise(SIGSTOP) just before H5Fclose and then gdb (or totalview) to several of the processes to see whats happening. Likewise, you mght be able to run valgrind on each process (sending output to separate files) to help debug too.

Sorry I don't have any other ideas. Good luck.

Mark

···

From: Wolf Dapp <wolf.dapp@gmail.com<mailto:wolf.dapp@gmail.com>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Tuesday, April 7, 2015 9:30 AM
To: "hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>" <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] parallel HDF5: H5Fclose hangs when not using a power of 2 number of processes

Dear hdf-forum members,

I have a problem I am hoping someone can help me with. I have a program
that outputs a 2D-array (contiguous, indexed linearly) using parallel
HDF5. When I choose a number of processors that is not a power of 2
(1,2,4,8,...) H5Fclose() hangs, inexplicably. I'm using HDF5 v.1.8.14,
and OpenMPI 1.7.2, on top of GCC 4.8 with Linux.

Can someone help me pinpoint my mistake?

I have searched the forum, and the first hit [searching for "h5fclose
hangs"] was a user mistake that I didn't make (to the best of my
knowledge). The second didn't go on beyond the initial problem
description, and didn't offer a solution.

Attached is a (maybe insufficiently bare-boned, apologies) demonstrator
program. Strangely, the hang only happens if nx >= 32. The code is
adapted from an HDF5 example program.

The demonstrator is compiled with
h5pcc test.hangs.cpp -DVERBOSE -lstdc++

( on my system, for some strange reason, MPI has been compiled with the
deprecated C++ bindings. I need to include -lmpi_cxx also, but that
shouldn't be necessary for anyone else. I hope that's not the reason for
the hang-ups. )

Thanks in advance for your help!

Wolf Dapp

--

Mohamad_Chaarawi · April 8, 2015, 2:23pm

Hi Wolf,

I found the problem in your program. Note that the hang vs the error stack (from Tim's email) is just different behaviors of different MPI implementations or versions. One implementation hangs when a call to MPI_File_set_size() from inside HDF5 is done with different arguments, and the other actually reports the error.

On to the mistake in your program now.. HDF5 requires the call to H5Dcreate be collective. That doesn't mean only that all processes have to call it, but also all processes have to call it with the same arguments. You are creating a chunked dataset with the same chunked dimensions except on the last process where you edit the first dimension (nxLocal). This happens here:

if ((nx%iNumOfProc) != 0) {

nxLocal += 1;

ixStart = myID*nxLocal;

if (myID == iNumOfProc-1)

nxLocal -= (nxLocal*iNumOfProc-nx); // last proc has less elements

}

You pass nxLocal to the chunk dimensions here:

chunk_dims[0] = nxLocal;

As long as 32*numofprocesses is 0, you don’t modify nxLocal on the last process, which explains why it works in those situations.

Note that it is ok to Read and Write to datasets collectively with different arguments, but you have to create the dataset with the same arguments including the same chunk dimensions. So what you do above causes one process to see a dataset with different chunk sizes in its metadata cache, so on file close time, when processes flush their metadata cache, one process has a different size of the file than the other processes and this is what causes the problem.

Makes sense?

Thanks,

Mohamad

···

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Wolf Dapp
Sent: Tuesday, April 07, 2015 11:30 AM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] parallel HDF5: H5Fclose hangs when not using a power of 2 number of processes

Dear hdf-forum members,

I have a problem I am hoping someone can help me with. I have a program that outputs a 2D-array (contiguous, indexed linearly) using parallel HDF5. When I choose a number of processors that is not a power of 2

(1,2,4,8,...) H5Fclose() hangs, inexplicably. I'm using HDF5 v.1.8.14, and OpenMPI 1.7.2, on top of GCC 4.8 with Linux.

Can someone help me pinpoint my mistake?

I have searched the forum, and the first hit [searching for "h5fclose hangs"] was a user mistake that I didn't make (to the best of my knowledge). The second didn't go on beyond the initial problem description, and didn't offer a solution.

Attached is a (maybe insufficiently bare-boned, apologies) demonstrator program. Strangely, the hang only happens if nx >= 32. The code is adapted from an HDF5 example program.

The demonstrator is compiled with

h5pcc test.hangs.cpp -DVERBOSE -lstdc++

( on my system, for some strange reason, MPI has been compiled with the deprecated C++ bindings. I need to include -lmpi_cxx also, but that shouldn't be necessary for anyone else. I hope that's not the reason for the hang-ups. )

Thanks in advance for your help!

Wolf Dapp

--

Wolf_Dapp · April 8, 2015, 2:52pm

Hi Mohamad,

thanks for your reply, and thanks for pointing that out. I appreciate
that each processor has a different chunk size, but I wasn't aware this
is a problem.

What would you suggest as a workaround, or solution? The number of
elements per processor /is/ objectively different, and if I simply give
the last process the /same/ chunk_size (without adjusting the file
space), the program crashes violently.

#001: H5Dio.c line 342 in H5D__pre_write(): file selection+offset not
within extent
major: Dataspace
minor: Out of range

If I set both dimsf[0] and chunk_dims[0] such that the (padded) data
fits, and each process writes the same chunk (i.e., if I pad the
filespace), then it works, but then the rest of my workflow will break
down, because the file is then not 32x32, but 33x32, with the last
column just zeroes, and in addition the file would be different
depending on how many processes write to it. I suppose I could somehow
resize the filespace to get it back to the proper dimension? However,
I'd probably have the same problem reading the data back in.

Or should I write to the file independently? I suppose I'd pay a hefty
performance price... (in a production run, ~1000 processes write ~20 GB
collectively, repeatedly).

Is there a recommended way how to handle this? The only worked example I
can find for collectively writing different numbers of elements writes
/nothing at all/ on one of the processes (as opposed to a smaller number
of elements than the others):
http://www.hdfgroup.org/ftp/HDF5/examples/misc-examples/coll_test.c

Thanks again for your help!
Wolf

···

On 04/08/15 16:23, Mohamad Chaarawi wrote:

Hi Wolf,

I found the problem in your program. Note that the hang vs the error
stack (from Tim's email) is just different behaviors of different MPI
implementations or versions. One implementation hangs when a call to
MPI_File_set_size() from inside HDF5 is done with different arguments,
and the other actually reports the error.

On to the mistake in your program now.. HDF5 requires the call to
H5Dcreate be collective. That doesn't mean only that all processes have
to call it, but also all processes have to call it with the same
arguments. You are creating a chunked dataset with the same chunked
dimensions except on the last process where you edit the first dimension
(nxLocal). This happens here:

if ((nx%iNumOfProc) != 0) {
   nxLocal += 1;
   ixStart = myID*nxLocal;
   if (myID == iNumOfProc-1)
     nxLocal -= (nxLocal*iNumOfProc-nx); // last proc has less elements
}

You pass nxLocal to the chunk dimensions here:
chunk_dims[0] = nxLocal;

As long as 32*numofprocesses is 0, you don�t modify nxLocal on the last
process, which explains why it works in those situations.

Note that it is ok to Read and Write to datasets collectively with
different arguments, but you have to create the dataset with the same
arguments including the same chunk dimensions. So what you do above
causes one process to see a dataset with different chunk sizes in its
metadata cache, so on file close time, when processes flush their
metadata cache, one process has a different size of the file than the
other processes and this is what causes the problem.

Makes sense?

Thanks,

Mohamad

--

Mohamad_Chaarawi · April 8, 2015, 3:15pm

Hi Wolf,

It is OK to have to read/write different amount of elements/data from each processor. That is not the problem.
The problem is that you cannot have each processor specify a different layout of the dataset on disk. This is the same problem for example as having 1 process says the layout of the dataset is contiguous and other says it's chunked.

The solution is very simple.. just don't adjust the chunk size for the dataset on the last process.

I modified the replicator that you provided and attached to demonstrate how this would work (I didn't do a lot of testing on it, just on my local machine, but it should work fine).

Thanks,
Mohamad

test_hangs.cpp (6.64 KB)

···

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Wolf Dapp
Sent: Wednesday, April 08, 2015 9:53 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] parallel HDF5: H5Fclose hangs when not using a power of 2 number of processes

Hi Mohamad,

thanks for your reply, and thanks for pointing that out. I appreciate that each processor has a different chunk size, but I wasn't aware this is a problem.

What would you suggest as a workaround, or solution? The number of elements per processor /is/ objectively different, and if I simply give the last process the /same/ chunk_size (without adjusting the file space), the program crashes violently.

#001: H5Dio.c line 342 in H5D__pre_write(): file selection+offset not within extent
major: Dataspace
minor: Out of range

If I set both dimsf[0] and chunk_dims[0] such that the (padded) data fits, and each process writes the same chunk (i.e., if I pad the filespace), then it works, but then the rest of my workflow will break down, because the file is then not 32x32, but 33x32, with the last column just zeroes, and in addition the file would be different depending on how many processes write to it. I suppose I could somehow resize the filespace to get it back to the proper dimension? However, I'd probably have the same problem reading the data back in.

Or should I write to the file independently? I suppose I'd pay a hefty performance price... (in a production run, ~1000 processes write ~20 GB collectively, repeatedly).

Is there a recommended way how to handle this? The only worked example I can find for collectively writing different numbers of elements writes /nothing at all/ on one of the processes (as opposed to a smaller number of elements than the others):
http://www.hdfgroup.org/ftp/HDF5/examples/misc-examples/coll_test.c

Thanks again for your help!
Wolf

On 04/08/15 16:23, Mohamad Chaarawi wrote:

Hi Wolf,

I found the problem in your program. Note that the hang vs the error
stack (from Tim's email) is just different behaviors of different MPI
implementations or versions. One implementation hangs when a call to
MPI_File_set_size() from inside HDF5 is done with different arguments,
and the other actually reports the error.

On to the mistake in your program now.. HDF5 requires the call to
H5Dcreate be collective. That doesn't mean only that all processes
have to call it, but also all processes have to call it with the same
arguments. You are creating a chunked dataset with the same chunked
dimensions except on the last process where you edit the first
dimension (nxLocal). This happens here:

if ((nx%iNumOfProc) != 0) {
   nxLocal += 1;
   ixStart = myID*nxLocal;
   if (myID == iNumOfProc-1)
     nxLocal -= (nxLocal*iNumOfProc-nx); // last proc has less elements
}

You pass nxLocal to the chunk dimensions here:
chunk_dims[0] = nxLocal;

As long as 32*numofprocesses is 0, you don't modify nxLocal on the
last process, which explains why it works in those situations.

Note that it is ok to Read and Write to datasets collectively with
different arguments, but you have to create the dataset with the same
arguments including the same chunk dimensions. So what you do above
causes one process to see a dataset with different chunk sizes in its
metadata cache, so on file close time, when processes flush their
metadata cache, one process has a different size of the file than the
other processes and this is what causes the problem.

Makes sense?

Thanks,

Mohamad

--

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

Wolf_Dapp · April 8, 2015, 3:57pm

Okay, so the only thing you did is to move the H5Pxxxx calls up,
/before/ the H5Sxxxx calls, and give them each the same arguments?

BTW, why shouldn't the line read something like
chunk_dims[0] = (nx%iNumOfProc) ? nx/iNumOfProc+1 : nx/iNumOfProc;
Why does your version still work for np != 2^X even though the chunks
will be too small? (on the other hand, with the above, the added size of
the chunks will be too large, and a chunk size of 1 also seems to work...)

I don't quite understand what this does in general, I guess. Now each
processor has the same chunk size. However, the memspaces and hyperslabs
are still different. Why aren't those calls collective?

Does the chunk size only mean that each process writes the data it owns
in chunks of the given size? If one chunk is not enough it simply writes
a second/third/fourth chunk? And if the data is smaller than the chunk,
it writes whatever it has? Is that how it works?

Thank you very much for your help, Mohamad! Thanks to Mark and Timothy
for their input, too! Much appreciated!

Cheers,
Wolf

···

On 04/08/15 17:15, Mohamad Chaarawi wrote:

Hi Wolf,

It is OK to have to read/write different amount of elements/data from
each processor. That is not the problem. The problem is that you
cannot have each processor specify a different layout of the dataset
on disk. This is the same problem for example as having 1 process
says the layout of the dataset is contiguous and other says it's
chunked.

The solution is very simple.. just don't adjust the chunk size for
the dataset on the last process.

I modified the replicator that you provided and attached to
demonstrate how this would work (I didn't do a lot of testing on it,
just on my local machine, but it should work fine).

Thanks, Mohamad

--

Mohamad_Chaarawi · April 8, 2015, 4:29pm

Hi Wolf,

I think you are confusing the dataset layout on disk vs Raw data I/O on the dataset itself.
I suggest you go through these and maybe it would clear things up more for you:
http://www.hdfgroup.org/HDF5/doc/UG/UG_frame10Datasets.html - section 5.5 for space allocation
http://www.hdfgroup.org/HDF5/doc/UG/UG_frame12Dataspaces.html for raw data I/O on datasets

More documentation and examples is available here:
http://www.hdfgroup.org/HDF5/Tutor/parallel.html

As a rule of thumb for parallel HDF5 users, I suggest that if you don't understand what dataset chunking is and what it does, don't use it since it will probably hurt your performance; use contiguous layout for datasets instead (you can accomplish that by removing the H5Pset calls on the dataset creation property list for H5Dcreate).

Thanks,
Mohamad

···

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Wolf Dapp
Sent: Wednesday, April 08, 2015 10:58 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] parallel HDF5: H5Fclose hangs when not using a power of 2 number of processes

On 04/08/15 17:15, Mohamad Chaarawi wrote:

Hi Wolf,

It is OK to have to read/write different amount of elements/data from
each processor. That is not the problem. The problem is that you
cannot have each processor specify a different layout of the dataset
on disk. This is the same problem for example as having 1 process says
the layout of the dataset is contiguous and other says it's chunked.

The solution is very simple.. just don't adjust the chunk size for the
dataset on the last process.

I modified the replicator that you provided and attached to
demonstrate how this would work (I didn't do a lot of testing on it,
just on my local machine, but it should work fine).

Thanks, Mohamad

Okay, so the only thing you did is to move the H5Pxxxx calls up, /before/ the H5Sxxxx calls, and give them each the same arguments?

BTW, why shouldn't the line read something like chunk_dims[0] = (nx%iNumOfProc) ? nx/iNumOfProc+1 : nx/iNumOfProc; Why does your version still work for np != 2^X even though the chunks will be too small? (on the other hand, with the above, the added size of the chunks will be too large, and a chunk size of 1 also seems to work...)

I don't quite understand what this does in general, I guess. Now each processor has the same chunk size. However, the memspaces and hyperslabs are still different. Why aren't those calls collective?

Does the chunk size only mean that each process writes the data it owns in chunks of the given size? If one chunk is not enough it simply writes a second/third/fourth chunk? And if the data is smaller than the chunk, it writes whatever it has? Is that how it works?

Thank you very much for your help, Mohamad! Thanks to Mark and Timothy for their input, too! Much appreciated!

Cheers,
Wolf

--

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

Pierre_de_Buyl · April 8, 2015, 4:21pm

Does the chunk size only mean that each process writes the data it owns
in chunks of the given size? If one chunk is not enough it simply writes
a second/third/fourth chunk? And if the data is smaller than the chunk,
it writes whatever it has? Is that how it works?

The chunk size is a parameter for the storage of the data. It is a property of a
dataset in the file (and so has to be set to the same value on every processor).
It does not restrict the number of elements that you write but its value can
influence the performance and/or compression efficiency.

Pierre

···

Thank you very much for your help, Mohamad! Thanks to Mark and Timothy
for their input, too! Much appreciated!

Cheers,
Wolf

Wolf_Dapp · April 8, 2015, 1:47pm

Hi Mark,

thanks for your reply.

I try to be very careful about closing everything I open, and so I think
I can answer your first question with a "no". It also seems unlikely as
this would occur for 8 processes as well as for 5. There's no errors
when the program terminates (for np = 4, say), or when it deadlocks.

The problem occurs both on an actually parallel file system (gpfs,
lustre) and on a "normal" filesystem (Btrfs, I believe). I am not making
any system calls, nor do modify or stat the filesystem (outside of the
file creation that happens via HDF5), as can be seen in the demonstrator
that was attached to the original email.

(did that file get scrubbed? If it WAS with the posting, could somebody
try to run it for, say, np 3, and see whether the error is reproducible
on their system?)

I have tried looking at the hangup in totalview, and the deadlock
actually occurs within H5Fclose.
H5Fclose -> H5I_dec_app_ref -> H5Fclose -> H5F_try_close -> H5F_dest ->
H5FD_truncate -> H5FD_mpio_truncate -> PMPI_Barrier -> ...

I'm not quite sure how to run valgrind on an mpi-enabled process, but
I'll try to find out.

Again, most errors I can think of would also happen if there's 2 or 4 or
8 processors, not only for 3, 5, 6, 7, 9, ...

The only obvious difference is that the number of elements written by
each process is different if np != power of 2. In the case that it
actually WORKS, each process writes the exact same number of elements to
the file. But that shouldn't actually be a problem...

Cheers,
Wolf

···

Miller, Mark C. wrote:

Some things to watch out for. . .

Are you by chance accidentally leaving one or more objects in the
file 'open' (e.g. did you forget some H5Xclose() call somewhere). I
cannot atest to that causing actual hangs in H5Fclose but I know HDF
has some logic to detect possible infinite loop in sym-link/group
structure for which it sometimes actually outputs a message along the
lines of "�infinite loop detected while closing file 'foo.h5' . . .".
i sometimes wind up using H5Fget_obj_count just prior to H5Fclose to
try to debug this when it (occasionally) has happend for me.

You say you are running in parallel. Is the file on an actual
parallel filesystem? Are you by chance mucking with the filesystem's
metadata via calls to stat or mkdir or chdir at any time before or
after your create or close the HDF5 file? If so, are you ensuring
parallel sync. via MPI_barrier before proceeding after such calls?

The core counts you mention are small so you might be able to
raise(SIGSTOP) just before H5Fclose and then gdb (or totalview) to
several of the processes to see whats happening. Likewise, you mght
be able to run valgrind on each process (sending output to separate
files) to help debug too.

Sorry I don't have any other ideas. Good luck.

Mark

Date: Tue, 07 Apr 2015 18:30:14 +0200 From: Wolf Dapp
<wolf.dapp@gmail.com> To: hdf-forum@lists.hdfgroup.org Subject:
[Hdf-forum] parallel HDF5: H5Fclose hangs when not using a power of 2
number of processes Message-ID: <55240616.5020708@gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear hdf-forum members,

I have a problem I am hoping someone can help me with. I have a
program that outputs a 2D-array (contiguous, indexed linearly) using
parallel HDF5. When I choose a number of processors that is not a
power of 2 (1,2,4,8,...) H5Fclose() hangs, inexplicably. I'm using
HDF5 v.1.8.14, and OpenMPI 1.7.2, on top of GCC 4.8 with Linux.

Can someone help me pinpoint my mistake?

I have searched the forum, and the first hit [searching for
"h5fclose hangs"] was a user mistake that I didn't make (to the best
of my knowledge). The second didn't go on beyond the initial problem
description, and didn't offer a solution.

Attached is a (maybe insufficiently bare-boned, apologies)
demonstrator program. Strangely, the hang only happens if nx >= 32.
The code is adapted from an HDF5 example program.

The demonstrator is compiled with h5pcc test.hangs.cpp -DVERBOSE
-lstdc++

( on my system, for some strange reason, MPI has been compiled with
the deprecated C++ bindings. I need to include -lmpi_cxx also, but
that shouldn't be necessary for anyone else. I hope that's not the
reason for the hang-ups. )

Thanks in advance for your help!

Wolf Dapp

--

Wolf_Dapp · April 8, 2015, 2:02pm

Hi Timothy,

(sorry, I had the digest mode enabled and only saw your message after
replying to Mark)

Thanks for your reply. How many processes did you try to run the program
with? (I'm thinking at least 5, since "process rank 4" threw the error).
Did you try different numbers of processes? Did the error happen with
just a particular number of mpi tasks?

Strange that you get a segfault and I get a deadlock.

The error traceback may indeed be helpful, but unfortunately it is not
meaningful to me. Any help you could give me debugging or understanding
what's going on would be much appreciated. This is not a life-and-death
situation, so tomorrow would be early enough

Cheers,
Wolf

···

Date: Tue, 7 Apr 2015 17:18:35 +0000
From: Timothy Brown <Timothy.Brown-1@Colorado.EDU>
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] parallel HDF5: H5Fclose hangs when not using
  a power of 2 number of processes
Message-ID: <A71910AE-01A7-4BBF-8038-5FDACDE00671@colorado.edu>
Content-Type: text/plain; charset="us-ascii"

Hi Wolf,

It doesn't hang for me. I get a seg fault with the following traceback.
By the way, I'm using
- gcc 4.9.2
- openmpi 1.8.4
- szip 2.1
- hdf5 1.8.14
On a x86_64 linux machine.

test.hangs: test.hangs.cpp:121: void writeH5(const char*, double*) [with T = float]: Assertion `status_h5 >= 0' failed.
HDF5-DIAG: Error detected in HDF5 (1.8.14) MPI-process 0:
  #000: H5F.c line 795 in H5Fclose(): decrementing file ID failed
    major: Object atom
    minor: Unable to close file
  #001: H5I.c line 1475 in H5I_dec_app_ref(): can't decrement ID ref count
    major: Object atom
    minor: Unable to decrement reference count
  #002: H5Fint.c line 1259 in H5F_close(): can't close file
    major: File accessibilty
    minor: Unable to close file
  #003: H5Fint.c line 1421 in H5F_try_close(): problems closing file
    major: File accessibilty
    minor: Unable to close file
  #004: H5Fint.c line 861 in H5F_dest(): low level truncate failed
    major: File accessibilty
    minor: Write failed
  #005: H5FD.c line 1908 in H5FD_truncate(): driver truncate request failed
    major: Virtual File Layer
    minor: Can't update object
  #006: H5FDmpio.c line 1982 in H5FD_mpio_truncate(): MPI_File_set_size failed
    major: Internal error (too specific to document in detail)
    minor: Some MPI function failed
  #007: H5FDmpio.c line 1982 in H5FD_mpio_truncate(): MPI_ERR_ARG: invalid argument of some other kind
    major: Internal error (too specific to document in detail)
    minor: MPI Error String
test.hangs: test.hangs.cpp:121: void writeH5(const char*, double*) [with T = float]: Assertion `status_h5 >= 0' failed.
--------------------------------------------------------------------------
mpiexec noticed that process rank 4 with PID 1158 on node node1446 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

If you want I can try and help debug it, however I am flat out today, so it'd have to wait till tomorrow. In the mean time, hope this error helps.

Timoth

--

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

parallel HDF5: H5Fclose hangs when not using a power of 2 number of processes