Hang on large file read

Greetings

I've been forwarded some info regarding a hang on hdf5 read from a user of enzo.
The problem appears when 512 (or more) processes open a file in serial and each independently read a hyperslab (non overlapping, I believe, but this may not be correct).
I asked for a stack trace and the attached screenshot was sent to me.

My advice to them was to open the file in parallel using the mpio driver rather than in serial and the problem with probably go away.
But I'd like to ask if from the stacktrace you can tell if this is a known bug?
The problem occurs with 1.8.12 and 1.9.something

thanks

JB

···

--
John Biddiscombe, email:biddisco @.at.@ cscs.ch

CSCS, Swiss National Supercomputing Centre | Tel: +41 (91) 610.82.07
Via Trevano 131, 6900 Lugano, Switzerland | Fax: +41 (91) 610.82.82

Hi John,

Greetings

I've been forwarded some info regarding a hang on hdf5 read from a user of
enzo.

The problem appears when 512 (or more) processes open a file in serial and
each independently read a hyperslab (non overlapping, I believe, but this
may not be correct).

Looking at the stack trace, this is indeed a set of non-overlapping
hyperslabs, as it occurs during what Enzo refers to as
ParallelRootGridIO.

I asked for a stack trace and the attached screenshot was sent to me.

My advice to them was to open the file in parallel using the mpio driver
rather than in serial and the problem with probably go away.

I suspect this would probably require a rather large and somewhat
invasive change to the Enzo codebase, particularly as there are
incompatibilities with MPI-enabled HDF5 in other areas of Enzo.

-Matt

···

On Thu, Apr 3, 2014 at 6:36 AM, Biddiscombe, John A. <biddisco@cscs.ch> wrote:

But I'd like to ask if from the stacktrace you can tell if this is a known
bug?

The problem occurs with 1.8.12 and 1.9.something

thanks

JB

--

John Biddiscombe, email:biddisco @.at.@ cscs.ch

http://www.cscs.ch/

CSCS, Swiss National Supercomputing Centre | Tel: +41 (91) 610.82.07

Via Trevano 131, 6900 Lugano, Switzerland | Fax: +41 (91) 610.82.82

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

darn, the call graph is truncated just as it gets interesting.

Is there a chance that it is not hung, but instead it is progressing really really slowly? for certain data descriptions, HDF5 or MPI-IO might be doing element-at-a-time i/o. Or it might be doing something like data sieving where it will read a lot of data in order to get to the few items of interest.

==rob

···

On 04/03/2014 05:36 AM, Biddiscombe, John A. wrote:

Greetings

I�ve been forwarded some info regarding a hang on hdf5 read from a user
of enzo.

The problem appears when 512 (or more) processes open a file in serial
and each independently read a hyperslab (non overlapping, I believe, but
this may not be correct).

I asked for a stack trace and the attached screenshot was sent to me.

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

What kind of incompatibilities? Are these bugs in the MPI-IO implementation?

==rob

···

On 04/03/2014 05:44 AM, Matthew Turk wrote:

On Thu, Apr 3, 2014 at 6:36 AM, Biddiscombe, John A. <biddisco@cscs.ch> wrote:

My advice to them was to open the file in parallel using the mpio driver
rather than in serial and the problem with probably go away.

I suspect this would probably require a rather large and somewhat
invasive change to the Enzo codebase, particularly as there are
incompatibilities with MPI-enabled HDF5 in other areas of Enzo.

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

My advice to them was to open the file in parallel using the mpio driver
rather than in serial and the problem with probably go away.

I suspect this would probably require a rather large and somewhat
invasive change to the Enzo codebase, particularly as there are
incompatibilities with MPI-enabled HDF5 in other areas of Enzo.

What kind of incompatibilities? Are these bugs in the MPI-IO
implementation?

No, they're just incompatibilities with it in Enzo. Here's an example
thread from the mailing list.

https://groups.google.com/d/msg/enzo-users/2as4N0iOS5Y/LftkV0oXuHEJ

-Matt

···

On Fri, Apr 4, 2014 at 11:04 AM, Rob Latham <robl@mcs.anl.gov> wrote:

On 04/03/2014 05:44 AM, Matthew Turk wrote:

On Thu, Apr 3, 2014 at 6:36 AM, Biddiscombe, John A. <biddisco@cscs.ch> >> wrote:

==rob

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

What kind of incompatibilities? Are these bugs in the MPI-IO
implementation?

No, they're just incompatibilities with it in Enzo. Here's an example
thread from the mailing list.

https://groups.google.com/d/msg/enzo-users/2as4N0iOS5Y/LftkV0oXuHEJ

well that *particular* build error is because of a mix-mash of namespaces. no one needs mpicxx.h anyway, so you can add "-DMPICH_SKIP_MPICXX -DOMPI_SKIP_MPICXX" to your MACH_DEFINES or MACH_CPPFLAGS.

Was that seriously the only reason enzo hasn't looked at using parallel hdf5?

==rob

···

On 04/04/2014 10:08 AM, Matthew Turk wrote:

-Matt

==rob

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

What kind of incompatibilities? Are these bugs in the MPI-IO
implementation?

No, they're just incompatibilities with it in Enzo. Here's an example
thread from the mailing list.

https://groups.google.com/d/msg/enzo-users/2as4N0iOS5Y/LftkV0oXuHEJ

well that *particular* build error is because of a mix-mash of namespaces.
no one needs mpicxx.h anyway, so you can add "-DMPICH_SKIP_MPICXX
-DOMPI_SKIP_MPICXX" to your MACH_DEFINES or MACH_CPPFLAGS.

Was that seriously the only reason enzo hasn't looked at using parallel
hdf5?

To be perfectly candid, the reason Enzo hasn't looked at using
parallel HDF5 is probably because the person who designed the IO
system (who has, very sadly, now passed on) saw no measurable benefit
to it, and structured the IO in such a way that we did not use it.

There is only one time that multiple processors read from the same
file, which is during the initialization of a simulation that utilizes
ParallelRootGridIO. Once that has completed, the grids are decomposed
across processors, and each processor writes to a single file. As it
stands, traversing the leaf nodes in large simulations is already
extremely costly, and unless we address that I do not think moving to
a monolithic file per snapshot is going to provide us with much
benefit, as it would exacerbate that problem. (I would be keen to be
proven wrong on this, however.)

All of that aside, I apologize for jumping in and re-directing the
conversation to Enzo rather than to the issue that John brought up
initially. The enzo-dev mailing list (
https://groups.google.com/forum/#!forum/enzo-dev ) would be a good
place to continue further discussions.

-Matt

···

On Fri, Apr 4, 2014 at 4:34 PM, Rob Latham <robl@mcs.anl.gov> wrote:

On 04/04/2014 10:08 AM, Matthew Turk wrote:

==rob

-Matt

==rob

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Rob, Matthew,

It turns out as you correctly guess that it wasn't actually hanging, just
reading painfully slowly. The test they ran took 2 hours to read, but it did
complete.

Changing the read to use the mpio fapl was a 5 line change to the code and
adding MPICH_SKIP_MPICXX got rid of the 'incompatibilities'. It is rather
sad if that was all that was stopping the use of parallel io in the code.

The code did not run any faster unfortunately, so I also asked them to add a
transfer property list using collective IO, but this has also not made a
significant difference I'm told.

I guess I'll have to actually look at the code now and see if I can help with
some profiling.

Thanks for the feedback

JB

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf
Of Matthew Turk
Sent: 04 April 2014 23:06
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Hang on large file read

>
>
>
>>> What kind of incompatibilities? Are these bugs in the MPI-IO
>>> implementation?
>>
>>
>> No, they're just incompatibilities with it in Enzo. Here's an
>> example thread from the mailing list.
>>
>> https://groups.google.com/d/msg/enzo-
users/2as4N0iOS5Y/LftkV0oXuHEJ
>
>
> well that *particular* build error is because of a mix-mash of

namespaces.

> no one needs mpicxx.h anyway, so you can add "-DMPICH_SKIP_MPICXX
> -DOMPI_SKIP_MPICXX" to your MACH_DEFINES or MACH_CPPFLAGS.
>
> Was that seriously the only reason enzo hasn't looked at using
> parallel hdf5?

To be perfectly candid, the reason Enzo hasn't looked at using parallel

HDF5 is

probably because the person who designed the IO system (who has, very
sadly, now passed on) saw no measurable benefit to it, and structured the

IO

in such a way that we did not use it.

There is only one time that multiple processors read from the same file,
which is during the initialization of a simulation that utilizes
ParallelRootGridIO. Once that has completed, the grids are decomposed
across processors, and each processor writes to a single file. As it

stands,

traversing the leaf nodes in large simulations is already extremely

costly, and

unless we address that I do not think moving to a monolithic file per

snapshot

is going to provide us with much benefit, as it would exacerbate that
problem. (I would be keen to be proven wrong on this, however.)

All of that aside, I apologize for jumping in and re-directing the

conversation

to Enzo rather than to the issue that John brought up initially. The

enzo-dev

mailing list ( https://groups.google.com/forum/#!forum/enzo-dev ) would
be a good place to continue further discussions.

-Matt

>
> ==rob
>
>
>>
>> -Matt
>>
>>>
>>>
>>> ==rob
>>>
>>> --
>>> Rob Latham
>>> Mathematics and Computer Science Division Argonne National Lab, IL
>>> USA
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> Hdf-forum@lists.hdfgroup.org
>>>
>>> http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfg
>>> roup.org
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> Hdf-forum@lists.hdfgroup.org
>>
>> http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgr
>> oup.org
>>
>
> --
> Rob Latham
> Mathematics and Computer Science Division Argonne National Lab, IL USA
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
> http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgro
> up.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org

http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

···

-----Original Message-----
On Fri, Apr 4, 2014 at 4:34 PM, Rob Latham <robl@mcs.anl.gov> wrote:
> On 04/04/2014 10:08 AM, Matthew Turk wrote: