Rob, Matthew,
It turns out as you correctly guess that it wasn't actually hanging, just
reading painfully slowly. The test they ran took 2 hours to read, but it did
complete.
Changing the read to use the mpio fapl was a 5 line change to the code and
adding MPICH_SKIP_MPICXX got rid of the 'incompatibilities'. It is rather
sad if that was all that was stopping the use of parallel io in the code.
The code did not run any faster unfortunately, so I also asked them to add a
transfer property list using collective IO, but this has also not made a
significant difference I'm told.
I guess I'll have to actually look at the code now and see if I can help with
some profiling.
Thanks for the feedback
JB
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf
Of Matthew Turk
Sent: 04 April 2014 23:06
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Hang on large file read
>
>
>
>>> What kind of incompatibilities? Are these bugs in the MPI-IO
>>> implementation?
>>
>>
>> No, they're just incompatibilities with it in Enzo. Here's an
>> example thread from the mailing list.
>>
>> https://groups.google.com/d/msg/enzo-
users/2as4N0iOS5Y/LftkV0oXuHEJ
>
>
> well that *particular* build error is because of a mix-mash of
namespaces.
> no one needs mpicxx.h anyway, so you can add "-DMPICH_SKIP_MPICXX
> -DOMPI_SKIP_MPICXX" to your MACH_DEFINES or MACH_CPPFLAGS.
>
> Was that seriously the only reason enzo hasn't looked at using
> parallel hdf5?
To be perfectly candid, the reason Enzo hasn't looked at using parallel
HDF5 is
probably because the person who designed the IO system (who has, very
sadly, now passed on) saw no measurable benefit to it, and structured the
IO
in such a way that we did not use it.
There is only one time that multiple processors read from the same file,
which is during the initialization of a simulation that utilizes
ParallelRootGridIO. Once that has completed, the grids are decomposed
across processors, and each processor writes to a single file. As it
stands,
traversing the leaf nodes in large simulations is already extremely
costly, and
unless we address that I do not think moving to a monolithic file per
snapshot
is going to provide us with much benefit, as it would exacerbate that
problem. (I would be keen to be proven wrong on this, however.)
All of that aside, I apologize for jumping in and re-directing the
conversation
to Enzo rather than to the issue that John brought up initially. The
enzo-dev
mailing list ( https://groups.google.com/forum/#!forum/enzo-dev ) would
be a good place to continue further discussions.
-Matt
>
> ==rob
>
>
>>
>> -Matt
>>
>>>
>>>
>>> ==rob
>>>
>>> --
>>> Rob Latham
>>> Mathematics and Computer Science Division Argonne National Lab, IL
>>> USA
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> Hdf-forum@lists.hdfgroup.org
>>>
>>> http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfg
>>> roup.org
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> Hdf-forum@lists.hdfgroup.org
>>
>> http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgr
>> oup.org
>>
>
> --
> Rob Latham
> Mathematics and Computer Science Division Argonne National Lab, IL USA
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@lists.hdfgroup.org
> http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgro
> up.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
···
-----Original Message-----
On Fri, Apr 4, 2014 at 4:34 PM, Rob Latham <robl@mcs.anl.gov> wrote:
> On 04/04/2014 10:08 AM, Matthew Turk wrote: