Parallel file access recommendation

Jan_Oliver_Oelerich · May 23, 2017, 9:46am

Hello HDF users,

I am using HDF5 through NetCDF and I recently changed my program so that each MPI process writes its data directly to the output file as opposed to the master process gathering the results and being the only one who does I/O.

Now I see that my program slows down file systems a lot (of the whole HPC cluster) and I don't really know how to handle I/O. The file system is a high throughput Beegfs system.

My program uses a hybrid parallelization approach, i.e. work is split into N MPI processes, each of which spawns M worker threads. Currently, I write to the output file from each of the M*N threads, but the writing is guarded by a mutex, so thread-safety shouldn't be a problem. Each writing process is a complete `open file, write, close file` cycle.

Each write is at a separate region of the HDF5 file, so no chunks are shared among any two processes. The amount of data to be written per process is 1/(M*N) times the size of the whole file.

Shouldn't this be exactly how HDF5 + MPI is supposed to be used? What is the `best practice` regarding parallel file access with HDF5?

Thank you and best regards,
Jan Oliver Oelerich

···

--
Dr. Jan Oliver Oelerich
Faculty of Physics and Material Sciences Center
Philipps-Universität Marburg

Addr.: Room 02D35, Hans-Meerwein-Straße 6, 35032 Marburg, Germany
Phone: +49 6421 2822260
Mail : jan.oliver.oelerich@physik.uni-marburg.de
Web : http://academics.oelerich.org

koziol · May 23, 2017, 3:18pm

Hi Jan,

···

On May 23, 2017, at 2:46 AM, Jan Oliver Oelerich <jan.oliver.oelerich@physik.uni-marburg.de> wrote:

Hello HDF users,

I am using HDF5 through NetCDF and I recently changed my program so that each MPI process writes its data directly to the output file as opposed to the master process gathering the results and being the only one who does I/O.

Now I see that my program slows down file systems a lot (of the whole HPC cluster) and I don't really know how to handle I/O. The file system is a high throughput Beegfs system.

My program uses a hybrid parallelization approach, i.e. work is split into N MPI processes, each of which spawns M worker threads. Currently, I write to the output file from each of the M*N threads, but the writing is guarded by a mutex, so thread-safety shouldn't be a problem. Each writing process is a complete `open file, write, close file` cycle.

Each write is at a separate region of the HDF5 file, so no chunks are shared among any two processes. The amount of data to be written per process is 1/(M*N) times the size of the whole file.

Shouldn't this be exactly how HDF5 + MPI is supposed to be used? What is the `best practice` regarding parallel file access with HDF5?

Yes, this is probably the correct way to operate, but generally things are much better for this case when collective I/O operations are used. Are you using collective or independent I/O? (Independent is the default)

Quincey

Aaron · May 23, 2017, 3:46pm

A year or so back, we changed to BeeGFS as well. There were some issues
getting parrallel I/O setup. First thing you want to do is run the
parrallel mpio test. I believe they can be found here:
https://support.hdfgroup.org/HDF5/Tutor/pprog.html\.

This will help you verify if your cluster has mpio setup correctly. If
that doesn't work, you'll need to get in touch with the management group to
fix that.

Then you need to make sure you are using an HDF5 library that is configured
to do parrallel I/O.

I know there aren't a lot of specifics here, but it took me about two weeks
of convincing to get my cluster management group to realize that things
weren't working quite right. Once everything was setup, I was able to
generate and write about 40 GB of data in around two minutes.

···

On Tue, May 23, 2017 at 8:18 AM, Quincey Koziol <koziol@lbl.gov> wrote:

Hi Jan,

> On May 23, 2017, at 2:46 AM, Jan Oliver Oelerich < > jan.oliver.oelerich@physik.uni-marburg.de> wrote:
>
> Hello HDF users,
>
> I am using HDF5 through NetCDF and I recently changed my program so that
each MPI process writes its data directly to the output file as opposed to
the master process gathering the results and being the only one who does
I/O.
>
> Now I see that my program slows down file systems a lot (of the whole
HPC cluster) and I don't really know how to handle I/O. The file system is
a high throughput Beegfs system.
>
> My program uses a hybrid parallelization approach, i.e. work is split
into N MPI processes, each of which spawns M worker threads. Currently, I
write to the output file from each of the M*N threads, but the writing is
guarded by a mutex, so thread-safety shouldn't be a problem. Each writing
process is a complete `open file, write, close file` cycle.
>
> Each write is at a separate region of the HDF5 file, so no chunks are
shared among any two processes. The amount of data to be written per
process is 1/(M*N) times the size of the whole file.
>
> Shouldn't this be exactly how HDF5 + MPI is supposed to be used? What is
the `best practice` regarding parallel file access with HDF5?

Yes, this is probably the correct way to operate, but generally
things are much better for this case when collective I/O operations are
used. Are you using collective or independent I/O? (Independent is the
default)

Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
Proofpoint Targeted Attack Protection.
hdfgroup.org_mailman_listinfo_hdf-2Dforum-5Flists.hdfgroup.org&d=DwICAg&c=
clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=Rx9txIqgEINHtVDIDfXdIw&m=
lnwp4oSn3StCocEX3B_WwTydNuJ5oFX7VYl-Ei3bbpw&s=5GdG4kU-9hw-z8kHIDPj6-
WfvdQeASwtycyfNyQ1tn0&e=
Twitter: Proofpoint Targeted Attack Protection
twitter.com_hdf5&d=DwICAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=
Rx9txIqgEINHtVDIDfXdIw&m=lnwp4oSn3StCocEX3B_WwTydNuJ5oFX7VYl-Ei3bbpw&s=
YAEy34105plaH2V5vqw54_wLbsigIZ__8F13hUdNgEQ&e=

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Parallel file access recommendation