H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Is H5F_ACC_TRUNC the right create option for pre-existing files with parallel HDF5 when I want to overwrite the existing file?

Looking at the performance, it seems that one of the most costly actions when running my application is closing out the file, and the top "hot call path" includes the H5FD_mpio_truncate method, and in the comments on that function, it indicates that keeping track of EOF is a costly operation over MPI. Looking through the code, it appears that if I don't use H5F_ACC_TRUNC, it will avoid the overhead of the H5FD_mpio_truncate method.

I've used H5F_ACC_TRUNC on serial codes without problems, and it is the option used in several example applications for parallel applications.
Is it better to just delete the pre-existing file and then create the file using H5F_ACC_EXCL?

Am I missing something here?

Jarom Nelson

Hi Jarom,

H5_ACC_TRUNC does not have anything to do with calling mpio_truncate at file close. You should use that when creating the file if you want to override it.
What this mode does at file open is call MPI_File_set_size() to 0 basically to empty the file.

Now for the file close performance issue there could be several causes for that. One of them could be the truncate issue. We actually have a patch for that to avoid truncating the file at file close but modify the file format to store both EOA and EOF. This does not work with the 1.8 release. Unfortunately it won't be in the 1.10.0 release either because there are other issues in the library that have to be resolved before this can be merged in. I do highly anticipate that it will be in 1.10.1 though. For now you can test to see if this is the actual cause be using this development branch of HDF5 here:
https://svn.hdfgroup.org/hdf5/features/avoid_truncate/
And set H5Pset_avoid_truncate(fcpl, H5F_AVOID_TRUNCATE_ALL); on the file creation property list.
Again this is a development branch (not production) so don't keep your HDF5 files.

The second issue (more likely the cause for bad performance I believe) could be the cost of writing out metadata at file close. If you have a significant amount of HDF5 metadata generated by your application, writing out the metadata at file close is very costly currently with the 1.8 release. It wasn't done in a parallel file system friendly manner. In 1.10 we improved this by adding an option for users to set the metadata writes done at file close be issued in one collective MPI write call. To use that feature, you can install the HDF5 trunk version and set this property on the file access property list:
H5Pset_coll_metadata_write (fapl_id, true);

You can get HDF5 trunk here:

https://svn.hdfgroup.org/hdf5/trunk/

This feature will be in the 1.10.0 release that should be out in the next month or so.

Thanks,

Mohamad

···

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of "Nelson, Jarom" <nelson99@llnl.gov<mailto:nelson99@llnl.gov>>
Reply-To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Tuesday, February 23, 2016 at 11:53 AM
To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Is H5F_ACC_TRUNC the right create option for pre-existing files with parallel HDF5 when I want to overwrite the existing file?

Looking at the performance, it seems that one of the most costly actions when running my application is closing out the file, and the top “hot call path” includes the H5FD_mpio_truncate method, and in the comments on that function, it indicates that keeping track of EOF is a costly operation over MPI. Looking through the code, it appears that if I don’t use H5F_ACC_TRUNC, it will avoid the overhead of the H5FD_mpio_truncate method.

I’ve used H5F_ACC_TRUNC on serial codes without problems, and it is the option used in several example applications for parallel applications.
Is it better to just delete the pre-existing file and then create the file using H5F_ACC_EXCL?

Am I missing something here?

Jarom Nelson

Thanks for your help. I’ll attempt to build my application using your avoid_truncate branch and see if it helps with the truncation issue. I may ping for assistance here, since my initial attempts to build HDF5 from the avoid_truncate branch are coming up with some problems.

Regarding the metadata issue, I don’t have any extra metadata other than that required for creating datasets and groups. I wouldn’t call this a “significant amount” of metadata. We are talking tens of datasets in a handful of groups, mostly written only by rank 0. The bulk of the data written is in one array distributed across ranks and written out in hyperslabs in parallel. Though, since all the Dataset and Group creation calls are collective calls, it may amount to a much more significant amount of metadata.

Should I be concerned about the collective calls to create groups and datasets generating a large overhead of metadata and slowing down the file write and close? Or is it just when the application generates a large amount of extra metadata that the metadata write can start to be a significant slowdown?

Jarom

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Mohamad Chaarawi
Sent: Tuesday, February 23, 2016 10:23 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Hi Jarom,

H5_ACC_TRUNC does not have anything to do with calling mpio_truncate at file close. You should use that when creating the file if you want to override it.
What this mode does at file open is call MPI_File_set_size() to 0 basically to empty the file.

Now for the file close performance issue there could be several causes for that. One of them could be the truncate issue. We actually have a patch for that to avoid truncating the file at file close but modify the file format to store both EOA and EOF. This does not work with the 1.8 release. Unfortunately it won't be in the 1.10.0 release either because there are other issues in the library that have to be resolved before this can be merged in. I do highly anticipate that it will be in 1.10.1 though. For now you can test to see if this is the actual cause be using this development branch of HDF5 here:
https://secure-web.cisco.com/1Sgj6HO-TTxegah2HHq3RKNI4BCSLzcIEwTBdd-A5_9jQz1AsUxtHMieBImjEPQR2Z_zbUgVNH23BfDWmMg_SkR71_MNTExRJIAFRoMQ0leuU05fwk01Pqagh5Tn5kiLKVShMuahdHSiVXOJpX-KErew1UWB3oWJKiJeBkY019D8BS-4dDCxbmhUDmI-AVP5D4Sq7avkODjtKE2r7TK0Oo0t-mhxC3NH1AnZYWMEm6tq6pLLZIO7pltDnYJvVB9zmHROPozQyEnA2XFwfNwPHeb67cofbpmdFPfOJDLi6cNErnZMnSVYNpJZKKH3LiqyUrtOckXn3W1Zidi1VwvV16gm_uYMi7UP7a0kC4_g80Vk/https%3A%2F%2Fsvn.hdfgroup.org%2Fhdf5%2Ffeatures%2Favoid_truncate/
And set H5Pset_avoid_truncate(fcpl, H5F_AVOID_TRUNCATE_ALL); on the file creation property list.
Again this is a development branch (not production) so don't keep your HDF5 files.

The second issue (more likely the cause for bad performance I believe) could be the cost of writing out metadata at file close. If you have a significant amount of HDF5 metadata generated by your application, writing out the metadata at file close is very costly currently with the 1.8 release. It wasn't done in a parallel file system friendly manner. In 1.10 we improved this by adding an option for users to set the metadata writes done at file close be issued in one collective MPI write call. To use that feature, you can install the HDF5 trunk version and set this property on the file access property list:
H5Pset_coll_metadata_write (fapl_id, true);

You can get HDF5 trunk here:

https://secure-web.cisco.com/1T5Jb_57z-qywc3MIOU9Qul5b2gjUHXYrHYDe6_absHc0IiqIO2UGuVdIAr5xlQRrn98adKVUgkYuAJW4OgGW2VZyVp9eHsOvALN7GWbTYz0AAeRMeb5qTN3JGSPRewWly_IK5WNE0E8a18houCsycm0flpNPt-1VNpKAOGRJBpTKuzLdAmGq76R1dfHkVeJfyzr-84dblPHdZqe5hj9u2x4ZXwUTQDnbpyiM800pf3AXZmuTu-b4VX72YxHdMnolcbqyTsj5MhcjcwgTXnYZxGMQvn2ahUeGvyGXi5abX808EmkIFdv_gyVXRMcR8TpG_mypin7jyIw_R10kap9FUkpNRWB-TXYqMHRjvG0NEqQ/https%3A%2F%2Fsvn.hdfgroup.org%2Fhdf5%2Ftrunk%2F

This feature will be in the 1.10.0 release that should be out in the next month or so.

Thanks,

Mohamad

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of "Nelson, Jarom" <nelson99@llnl.gov<mailto:nelson99@llnl.gov>>
Reply-To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Tuesday, February 23, 2016 at 11:53 AM
To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Is H5F_ACC_TRUNC the right create option for pre-existing files with parallel HDF5 when I want to overwrite the existing file?

Looking at the performance, it seems that one of the most costly actions when running my application is closing out the file, and the top “hot call path” includes the H5FD_mpio_truncate method, and in the comments on that function, it indicates that keeping track of EOF is a costly operation over MPI. Looking through the code, it appears that if I don’t use H5F_ACC_TRUNC, it will avoid the overhead of the H5FD_mpio_truncate method.

I’ve used H5F_ACC_TRUNC on serial codes without problems, and it is the option used in several example applications for parallel applications.
Is it better to just delete the pre-existing file and then create the file using H5F_ACC_EXCL?

Am I missing something here?

Jarom Nelson

Testing out the H5Pset_avoid_truncate call, I get the following error when I attempt to set this property on my file access property list:

HDF5-DIAG: Error detected in HDF5 (1.9.233) MPI-process 0:
  #000: H5Pfcpl.c line 1422 in H5Pset_avoid_truncate(): can't find object for ID
    major: Object atom
    minor: Unable to find atom information (already closed?)
  #001: H5Pint.c line 3789 in H5P_object_verify(): property list is not a member of the class
    major: Property lists
    minor: Unable to register new atom

Attached is my simple test program that produces the above error.

Despite the error, my application ran to completion and the file generated appears to work correctly, though it is a very simple test case file. However, I suspect that the property is not being set correctly, because the change does not seem to improve the time it takes to close the HDF5 file. Comparisons using my non-toy program show an increase in time to write and close the file from 64 ranks to 128 ranks (~9 seconds to ~12 seconds).

Note, I’m hopefully optimistic that I built the library correctly from the branch given. Building the avoid_truncate branch checkout from svn didn’t match the build instructions for a release. I ended up doing the following:

### install autoconf version 2.69
./autogen.sh
CC=$(which mpicc) ./configure --enable-parallel --with-zlib
make ### this failed complaining that libtool didn’t have any targets configured, or some similar error message
./config.status
./config.lt
make ### this now worked, I think
make check
make install
make check-install

After that, the new library linked OK with my test code, and it appears that despite the error . TBD whether it is actually working correctly.

Jarom

h5g_parallel.cpp (4.49 KB)

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Nelson, Jarom
Sent: Tuesday, February 23, 2016 2:55 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Thanks for your help. I’ll attempt to build my application using your avoid_truncate branch and see if it helps with the truncation issue. I may ping for assistance here, since my initial attempts to build HDF5 from the avoid_truncate branch are coming up with some problems.

Regarding the metadata issue, I don’t have any extra metadata other than that required for creating datasets and groups. I wouldn’t call this a “significant amount” of metadata. We are talking tens of datasets in a handful of groups, mostly written only by rank 0. The bulk of the data written is in one array distributed across ranks and written out in hyperslabs in parallel. Though, since all the Dataset and Group creation calls are collective calls, it may amount to a much more significant amount of metadata.

Should I be concerned about the collective calls to create groups and datasets generating a large overhead of metadata and slowing down the file write and close? Or is it just when the application generates a large amount of extra metadata that the metadata write can start to be a significant slowdown?

Jarom
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Mohamad Chaarawi
Sent: Tuesday, February 23, 2016 10:23 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Hi Jarom,

H5_ACC_TRUNC does not have anything to do with calling mpio_truncate at file close. You should use that when creating the file if you want to override it.
What this mode does at file open is call MPI_File_set_size() to 0 basically to empty the file.

Now for the file close performance issue there could be several causes for that. One of them could be the truncate issue. We actually have a patch for that to avoid truncating the file at file close but modify the file format to store both EOA and EOF. This does not work with the 1.8 release. Unfortunately it won't be in the 1.10.0 release either because there are other issues in the library that have to be resolved before this can be merged in. I do highly anticipate that it will be in 1.10.1 though. For now you can test to see if this is the actual cause be using this development branch of HDF5 here:
https://secure-web.cisco.com/1Sgj6HO-TTxegah2HHq3RKNI4BCSLzcIEwTBdd-A5_9jQz1AsUxtHMieBImjEPQR2Z_zbUgVNH23BfDWmMg_SkR71_MNTExRJIAFRoMQ0leuU05fwk01Pqagh5Tn5kiLKVShMuahdHSiVXOJpX-KErew1UWB3oWJKiJeBkY019D8BS-4dDCxbmhUDmI-AVP5D4Sq7avkODjtKE2r7TK0Oo0t-mhxC3NH1AnZYWMEm6tq6pLLZIO7pltDnYJvVB9zmHROPozQyEnA2XFwfNwPHeb67cofbpmdFPfOJDLi6cNErnZMnSVYNpJZKKH3LiqyUrtOckXn3W1Zidi1VwvV16gm_uYMi7UP7a0kC4_g80Vk/https%3A%2F%2Fsvn.hdfgroup.org%2Fhdf5%2Ffeatures%2Favoid_truncate/
And set H5Pset_avoid_truncate(fcpl, H5F_AVOID_TRUNCATE_ALL); on the file creation property list.
Again this is a development branch (not production) so don't keep your HDF5 files.

The second issue (more likely the cause for bad performance I believe) could be the cost of writing out metadata at file close. If you have a significant amount of HDF5 metadata generated by your application, writing out the metadata at file close is very costly currently with the 1.8 release. It wasn't done in a parallel file system friendly manner. In 1.10 we improved this by adding an option for users to set the metadata writes done at file close be issued in one collective MPI write call. To use that feature, you can install the HDF5 trunk version and set this property on the file access property list:
H5Pset_coll_metadata_write (fapl_id, true);

You can get HDF5 trunk here:

https://secure-web.cisco.com/1T5Jb_57z-qywc3MIOU9Qul5b2gjUHXYrHYDe6_absHc0IiqIO2UGuVdIAr5xlQRrn98adKVUgkYuAJW4OgGW2VZyVp9eHsOvALN7GWbTYz0AAeRMeb5qTN3JGSPRewWly_IK5WNE0E8a18houCsycm0flpNPt-1VNpKAOGRJBpTKuzLdAmGq76R1dfHkVeJfyzr-84dblPHdZqe5hj9u2x4ZXwUTQDnbpyiM800pf3AXZmuTu-b4VX72YxHdMnolcbqyTsj5MhcjcwgTXnYZxGMQvn2ahUeGvyGXi5abX808EmkIFdv_gyVXRMcR8TpG_mypin7jyIw_R10kap9FUkpNRWB-TXYqMHRjvG0NEqQ/https%3A%2F%2Fsvn.hdfgroup.org%2Fhdf5%2Ftrunk%2F

This feature will be in the 1.10.0 release that should be out in the next month or so.

Thanks,

Mohamad

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of "Nelson, Jarom" <nelson99@llnl.gov<mailto:nelson99@llnl.gov>>
Reply-To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Tuesday, February 23, 2016 at 11:53 AM
To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Is H5F_ACC_TRUNC the right create option for pre-existing files with parallel HDF5 when I want to overwrite the existing file?

Looking at the performance, it seems that one of the most costly actions when running my application is closing out the file, and the top “hot call path” includes the H5FD_mpio_truncate method, and in the comments on that function, it indicates that keeping track of EOF is a costly operation over MPI. Looking through the code, it appears that if I don’t use H5F_ACC_TRUNC, it will avoid the overhead of the H5FD_mpio_truncate method.

I’ve used H5F_ACC_TRUNC on serial codes without problems, and it is the option used in several example applications for parallel applications.
Is it better to just delete the pre-existing file and then create the file using H5F_ACC_EXCL?

Am I missing something here?

Jarom Nelson

10s of datasets and groups do seem like a fairly reasonable amount of metadata to see benefits from the collective metadata write option. If the datasets are chunked, that means more metadata is generated too. I would try the collective write feature and see if the file close speeds up. It does so significantly in most scenarios we have tested it with.

Mohamad

···

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of "Nelson, Jarom" <nelson99@llnl.gov<mailto:nelson99@llnl.gov>>
Reply-To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Tuesday, February 23, 2016 at 4:54 PM
To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Thanks for your help. I’ll attempt to build my application using your avoid_truncate branch and see if it helps with the truncation issue. I may ping for assistance here, since my initial attempts to build HDF5 from the avoid_truncate branch are coming up with some problems.

Regarding the metadata issue, I don’t have any extra metadata other than that required for creating datasets and groups. I wouldn’t call this a “significant amount” of metadata. We are talking tens of datasets in a handful of groups, mostly written only by rank 0. The bulk of the data written is in one array distributed across ranks and written out in hyperslabs in parallel. Though, since all the Dataset and Group creation calls are collective calls, it may amount to a much more significant amount of metadata.

Should I be concerned about the collective calls to create groups and datasets generating a large overhead of metadata and slowing down the file write and close? Or is it just when the application generates a large amount of extra metadata that the metadata write can start to be a significant slowdown?

Jarom

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Mohamad Chaarawi
Sent: Tuesday, February 23, 2016 10:23 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Hi Jarom,

H5_ACC_TRUNC does not have anything to do with calling mpio_truncate at file close. You should use that when creating the file if you want to override it.
What this mode does at file open is call MPI_File_set_size() to 0 basically to empty the file.

Now for the file close performance issue there could be several causes for that. One of them could be the truncate issue. We actually have a patch for that to avoid truncating the file at file close but modify the file format to store both EOA and EOF. This does not work with the 1.8 release. Unfortunately it won't be in the 1.10.0 release either because there are other issues in the library that have to be resolved before this can be merged in. I do highly anticipate that it will be in 1.10.1 though. For now you can test to see if this is the actual cause be using this development branch of HDF5 here:
https://secure-web.cisco.com/1Sgj6HO-TTxegah2HHq3RKNI4BCSLzcIEwTBdd-A5_9jQz1AsUxtHMieBImjEPQR2Z_zbUgVNH23BfDWmMg_SkR71_MNTExRJIAFRoMQ0leuU05fwk01Pqagh5Tn5kiLKVShMuahdHSiVXOJpX-KErew1UWB3oWJKiJeBkY019D8BS-4dDCxbmhUDmI-AVP5D4Sq7avkODjtKE2r7TK0Oo0t-mhxC3NH1AnZYWMEm6tq6pLLZIO7pltDnYJvVB9zmHROPozQyEnA2XFwfNwPHeb67cofbpmdFPfOJDLi6cNErnZMnSVYNpJZKKH3LiqyUrtOckXn3W1Zidi1VwvV16gm_uYMi7UP7a0kC4_g80Vk/https%3A%2F%2Fsvn.hdfgroup.org%2Fhdf5%2Ffeatures%2Favoid_truncate/
And set H5Pset_avoid_truncate(fcpl, H5F_AVOID_TRUNCATE_ALL); on the file creation property list.
Again this is a development branch (not production) so don't keep your HDF5 files.

The second issue (more likely the cause for bad performance I believe) could be the cost of writing out metadata at file close. If you have a significant amount of HDF5 metadata generated by your application, writing out the metadata at file close is very costly currently with the 1.8 release. It wasn't done in a parallel file system friendly manner. In 1.10 we improved this by adding an option for users to set the metadata writes done at file close be issued in one collective MPI write call. To use that feature, you can install the HDF5 trunk version and set this property on the file access property list:
H5Pset_coll_metadata_write (fapl_id, true);

You can get HDF5 trunk here:

https://secure-web.cisco.com/1T5Jb_57z-qywc3MIOU9Qul5b2gjUHXYrHYDe6_absHc0IiqIO2UGuVdIAr5xlQRrn98adKVUgkYuAJW4OgGW2VZyVp9eHsOvALN7GWbTYz0AAeRMeb5qTN3JGSPRewWly_IK5WNE0E8a18houCsycm0flpNPt-1VNpKAOGRJBpTKuzLdAmGq76R1dfHkVeJfyzr-84dblPHdZqe5hj9u2x4ZXwUTQDnbpyiM800pf3AXZmuTu-b4VX72YxHdMnolcbqyTsj5MhcjcwgTXnYZxGMQvn2ahUeGvyGXi5abX808EmkIFdv_gyVXRMcR8TpG_mypin7jyIw_R10kap9FUkpNRWB-TXYqMHRjvG0NEqQ/https%3A%2F%2Fsvn.hdfgroup.org%2Fhdf5%2Ftrunk%2F

This feature will be in the 1.10.0 release that should be out in the next month or so.

Thanks,

Mohamad

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of "Nelson, Jarom" <nelson99@llnl.gov<mailto:nelson99@llnl.gov>>
Reply-To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Tuesday, February 23, 2016 at 11:53 AM
To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Is H5F_ACC_TRUNC the right create option for pre-existing files with parallel HDF5 when I want to overwrite the existing file?

Looking at the performance, it seems that one of the most costly actions when running my application is closing out the file, and the top “hot call path” includes the H5FD_mpio_truncate method, and in the comments on that function, it indicates that keeping track of EOF is a costly operation over MPI. Looking through the code, it appears that if I don’t use H5F_ACC_TRUNC, it will avoid the overhead of the H5FD_mpio_truncate method.

I’ve used H5F_ACC_TRUNC on serial codes without problems, and it is the option used in several example applications for parallel applications.
Is it better to just delete the pre-existing file and then create the file using H5F_ACC_EXCL?

Am I missing something here?

Jarom Nelson

Hi Jarom,

You are using the file access property list for the H5Pset_avoid_truncate call.
This requires a file creation property list.

Thanks,
Mohamad

···

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of "Nelson, Jarom" <nelson99@llnl.gov<mailto:nelson99@llnl.gov>>
Reply-To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Wednesday, February 24, 2016 at 1:28 PM
To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Testing out the H5Pset_avoid_truncate call, I get the following error when I attempt to set this property on my file access property list:

HDF5-DIAG: Error detected in HDF5 (1.9.233) MPI-process 0:
  #000: H5Pfcpl.c line 1422 in H5Pset_avoid_truncate(): can't find object for ID
    major: Object atom
    minor: Unable to find atom information (already closed?)
  #001: H5Pint.c line 3789 in H5P_object_verify(): property list is not a member of the class
    major: Property lists
    minor: Unable to register new atom

Attached is my simple test program that produces the above error.

Despite the error, my application ran to completion and the file generated appears to work correctly, though it is a very simple test case file. However, I suspect that the property is not being set correctly, because the change does not seem to improve the time it takes to close the HDF5 file. Comparisons using my non-toy program show an increase in time to write and close the file from 64 ranks to 128 ranks (~9 seconds to ~12 seconds).

Note, I’m hopefully optimistic that I built the library correctly from the branch given. Building the avoid_truncate branch checkout from svn didn’t match the build instructions for a release. I ended up doing the following:

### install autoconf version 2.69
./autogen.sh
CC=$(which mpicc) ./configure --enable-parallel --with-zlib
make ### this failed complaining that libtool didn’t have any targets configured, or some similar error message
./config.status
./config.lt
make ### this now worked, I think
make check
make install
make check-install

After that, the new library linked OK with my test code, and it appears that despite the error . TBD whether it is actually working correctly.

Jarom

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Nelson, Jarom
Sent: Tuesday, February 23, 2016 2:55 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Thanks for your help. I’ll attempt to build my application using your avoid_truncate branch and see if it helps with the truncation issue. I may ping for assistance here, since my initial attempts to build HDF5 from the avoid_truncate branch are coming up with some problems.

Regarding the metadata issue, I don’t have any extra metadata other than that required for creating datasets and groups. I wouldn’t call this a “significant amount” of metadata. We are talking tens of datasets in a handful of groups, mostly written only by rank 0. The bulk of the data written is in one array distributed across ranks and written out in hyperslabs in parallel. Though, since all the Dataset and Group creation calls are collective calls, it may amount to a much more significant amount of metadata.

Should I be concerned about the collective calls to create groups and datasets generating a large overhead of metadata and slowing down the file write and close? Or is it just when the application generates a large amount of extra metadata that the metadata write can start to be a significant slowdown?

Jarom
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Mohamad Chaarawi
Sent: Tuesday, February 23, 2016 10:23 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Hi Jarom,

H5_ACC_TRUNC does not have anything to do with calling mpio_truncate at file close. You should use that when creating the file if you want to override it.
What this mode does at file open is call MPI_File_set_size() to 0 basically to empty the file.

Now for the file close performance issue there could be several causes for that. One of them could be the truncate issue. We actually have a patch for that to avoid truncating the file at file close but modify the file format to store both EOA and EOF. This does not work with the 1.8 release. Unfortunately it won't be in the 1.10.0 release either because there are other issues in the library that have to be resolved before this can be merged in. I do highly anticipate that it will be in 1.10.1 though. For now you can test to see if this is the actual cause be using this development branch of HDF5 here:
https://secure-web.cisco.com/1Sgj6HO-TTxegah2HHq3RKNI4BCSLzcIEwTBdd-A5_9jQz1AsUxtHMieBImjEPQR2Z_zbUgVNH23BfDWmMg_SkR71_MNTExRJIAFRoMQ0leuU05fwk01Pqagh5Tn5kiLKVShMuahdHSiVXOJpX-KErew1UWB3oWJKiJeBkY019D8BS-4dDCxbmhUDmI-AVP5D4Sq7avkODjtKE2r7TK0Oo0t-mhxC3NH1AnZYWMEm6tq6pLLZIO7pltDnYJvVB9zmHROPozQyEnA2XFwfNwPHeb67cofbpmdFPfOJDLi6cNErnZMnSVYNpJZKKH3LiqyUrtOckXn3W1Zidi1VwvV16gm_uYMi7UP7a0kC4_g80Vk/https%3A%2F%2Fsvn.hdfgroup.org%2Fhdf5%2Ffeatures%2Favoid_truncate/
And set H5Pset_avoid_truncate(fcpl, H5F_AVOID_TRUNCATE_ALL); on the file creation property list.
Again this is a development branch (not production) so don't keep your HDF5 files.

The second issue (more likely the cause for bad performance I believe) could be the cost of writing out metadata at file close. If you have a significant amount of HDF5 metadata generated by your application, writing out the metadata at file close is very costly currently with the 1.8 release. It wasn't done in a parallel file system friendly manner. In 1.10 we improved this by adding an option for users to set the metadata writes done at file close be issued in one collective MPI write call. To use that feature, you can install the HDF5 trunk version and set this property on the file access property list:
H5Pset_coll_metadata_write (fapl_id, true);

You can get HDF5 trunk here:

https://secure-web.cisco.com/1T5Jb_57z-qywc3MIOU9Qul5b2gjUHXYrHYDe6_absHc0IiqIO2UGuVdIAr5xlQRrn98adKVUgkYuAJW4OgGW2VZyVp9eHsOvALN7GWbTYz0AAeRMeb5qTN3JGSPRewWly_IK5WNE0E8a18houCsycm0flpNPt-1VNpKAOGRJBpTKuzLdAmGq76R1dfHkVeJfyzr-84dblPHdZqe5hj9u2x4ZXwUTQDnbpyiM800pf3AXZmuTu-b4VX72YxHdMnolcbqyTsj5MhcjcwgTXnYZxGMQvn2ahUeGvyGXi5abX808EmkIFdv_gyVXRMcR8TpG_mypin7jyIw_R10kap9FUkpNRWB-TXYqMHRjvG0NEqQ/https%3A%2F%2Fsvn.hdfgroup.org%2Fhdf5%2Ftrunk%2F

This feature will be in the 1.10.0 release that should be out in the next month or so.

Thanks,

Mohamad

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of "Nelson, Jarom" <nelson99@llnl.gov<mailto:nelson99@llnl.gov>>
Reply-To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Tuesday, February 23, 2016 at 11:53 AM
To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Is H5F_ACC_TRUNC the right create option for pre-existing files with parallel HDF5 when I want to overwrite the existing file?

Looking at the performance, it seems that one of the most costly actions when running my application is closing out the file, and the top “hot call path” includes the H5FD_mpio_truncate method, and in the comments on that function, it indicates that keeping track of EOF is a costly operation over MPI. Looking through the code, it appears that if I don’t use H5F_ACC_TRUNC, it will avoid the overhead of the H5FD_mpio_truncate method.

I’ve used H5F_ACC_TRUNC on serial codes without problems, and it is the option used in several example applications for parallel applications.
Is it better to just delete the pre-existing file and then create the file using H5F_ACC_EXCL?

Am I missing something here?

Jarom Nelson

OK, property list type corrected in the attached. With the correct property list type, I’m getting an infinite loop closing the library (error stack attached) if I set the avoid_truncate flag. Same code with H5P_DEFAULT works fine.

I’ll try the collective metadata write next. Data is contiguous, but is written out as separate hyperslabs across ranks.
Thanks!

Jarom

h5g_parallel.cpp (4.58 KB)

avoid_truncate_infinite_loop_closing.txt (1.22 KB)

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Mohamad Chaarawi
Sent: Wednesday, February 24, 2016 11:46 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

10s of datasets and groups do seem like a fairly reasonable amount of metadata to see benefits from the collective metadata write option. If the datasets are chunked, that means more metadata is generated too. I would try the collective write feature and see if the file close speeds up. It does so significantly in most scenarios we have tested it with.

Mohamad

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Mohamad Chaarawi
Sent: Wednesday, February 24, 2016 11:40 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Hi Jarom,

You are using the file access property list for the H5Pset_avoid_truncate call.
This requires a file creation property list.

Thanks,
Mohamad

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of "Nelson, Jarom" <nelson99@llnl.gov<mailto:nelson99@llnl.gov>>
Reply-To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Wednesday, February 24, 2016 at 1:28 PM
To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Testing out the H5Pset_avoid_truncate call, I get the following error when I attempt to set this property on my file access property list:

HDF5-DIAG: Error detected in HDF5 (1.9.233) MPI-process 0:
  #000: H5Pfcpl.c line 1422 in H5Pset_avoid_truncate(): can't find object for ID
    major: Object atom
    minor: Unable to find atom information (already closed?)
  #001: H5Pint.c line 3789 in H5P_object_verify(): property list is not a member of the class
    major: Property lists
    minor: Unable to register new atom

Attached is my simple test program that produces the above error.

Despite the error, my application ran to completion and the file generated appears to work correctly, though it is a very simple test case file. However, I suspect that the property is not being set correctly, because the change does not seem to improve the time it takes to close the HDF5 file. Comparisons using my non-toy program show an increase in time to write and close the file from 64 ranks to 128 ranks (~9 seconds to ~12 seconds).

Note, I’m hopefully optimistic that I built the library correctly from the branch given. Building the avoid_truncate branch checkout from svn didn’t match the build instructions for a release. I ended up doing the following:

### install autoconf version 2.69
./autogen.sh
CC=$(which mpicc) ./configure --enable-parallel --with-zlib
make ### this failed complaining that libtool didn’t have any targets configured, or some similar error message
./config.status
./config.lt
make ### this now worked, I think
make check
make install
make check-install

After that, the new library linked OK with my test code, and it appears that despite the error . TBD whether it is actually working correctly.

Jarom

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Nelson, Jarom
Sent: Tuesday, February 23, 2016 2:55 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Thanks for your help. I’ll attempt to build my application using your avoid_truncate branch and see if it helps with the truncation issue. I may ping for assistance here, since my initial attempts to build HDF5 from the avoid_truncate branch are coming up with some problems.

Regarding the metadata issue, I don’t have any extra metadata other than that required for creating datasets and groups. I wouldn’t call this a “significant amount” of metadata. We are talking tens of datasets in a handful of groups, mostly written only by rank 0. The bulk of the data written is in one array distributed across ranks and written out in hyperslabs in parallel. Though, since all the Dataset and Group creation calls are collective calls, it may amount to a much more significant amount of metadata.

Should I be concerned about the collective calls to create groups and datasets generating a large overhead of metadata and slowing down the file write and close? Or is it just when the application generates a large amount of extra metadata that the metadata write can start to be a significant slowdown?

Jarom
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Mohamad Chaarawi
Sent: Tuesday, February 23, 2016 10:23 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Hi Jarom,

H5_ACC_TRUNC does not have anything to do with calling mpio_truncate at file close. You should use that when creating the file if you want to override it.
What this mode does at file open is call MPI_File_set_size() to 0 basically to empty the file.

Now for the file close performance issue there could be several causes for that. One of them could be the truncate issue. We actually have a patch for that to avoid truncating the file at file close but modify the file format to store both EOA and EOF. This does not work with the 1.8 release. Unfortunately it won't be in the 1.10.0 release either because there are other issues in the library that have to be resolved before this can be merged in. I do highly anticipate that it will be in 1.10.1 though. For now you can test to see if this is the actual cause be using this development branch of HDF5 here:
https://secure-web.cisco.com/1Sgj6HO-TTxegah2HHq3RKNI4BCSLzcIEwTBdd-A5_9jQz1AsUxtHMieBImjEPQR2Z_zbUgVNH23BfDWmMg_SkR71_MNTExRJIAFRoMQ0leuU05fwk01Pqagh5Tn5kiLKVShMuahdHSiVXOJpX-KErew1UWB3oWJKiJeBkY019D8BS-4dDCxbmhUDmI-AVP5D4Sq7avkODjtKE2r7TK0Oo0t-mhxC3NH1AnZYWMEm6tq6pLLZIO7pltDnYJvVB9zmHROPozQyEnA2XFwfNwPHeb67cofbpmdFPfOJDLi6cNErnZMnSVYNpJZKKH3LiqyUrtOckXn3W1Zidi1VwvV16gm_uYMi7UP7a0kC4_g80Vk/https%3A%2F%2Fsvn.hdfgroup.org%2Fhdf5%2Ffeatures%2Favoid_truncate/
And set H5Pset_avoid_truncate(fcpl, H5F_AVOID_TRUNCATE_ALL); on the file creation property list.
Again this is a development branch (not production) so don't keep your HDF5 files.

The second issue (more likely the cause for bad performance I believe) could be the cost of writing out metadata at file close. If you have a significant amount of HDF5 metadata generated by your application, writing out the metadata at file close is very costly currently with the 1.8 release. It wasn't done in a parallel file system friendly manner. In 1.10 we improved this by adding an option for users to set the metadata writes done at file close be issued in one collective MPI write call. To use that feature, you can install the HDF5 trunk version and set this property on the file access property list:
H5Pset_coll_metadata_write (fapl_id, true);

You can get HDF5 trunk here:

https://secure-web.cisco.com/1T5Jb_57z-qywc3MIOU9Qul5b2gjUHXYrHYDe6_absHc0IiqIO2UGuVdIAr5xlQRrn98adKVUgkYuAJW4OgGW2VZyVp9eHsOvALN7GWbTYz0AAeRMeb5qTN3JGSPRewWly_IK5WNE0E8a18houCsycm0flpNPt-1VNpKAOGRJBpTKuzLdAmGq76R1dfHkVeJfyzr-84dblPHdZqe5hj9u2x4ZXwUTQDnbpyiM800pf3AXZmuTu-b4VX72YxHdMnolcbqyTsj5MhcjcwgTXnYZxGMQvn2ahUeGvyGXi5abX808EmkIFdv_gyVXRMcR8TpG_mypin7jyIw_R10kap9FUkpNRWB-TXYqMHRjvG0NEqQ/https%3A%2F%2Fsvn.hdfgroup.org%2Fhdf5%2Ftrunk%2F

This feature will be in the 1.10.0 release that should be out in the next month or so.

Thanks,

Mohamad

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of "Nelson, Jarom" <nelson99@llnl.gov<mailto:nelson99@llnl.gov>>
Reply-To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Tuesday, February 23, 2016 at 11:53 AM
To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Is H5F_ACC_TRUNC the right create option for pre-existing files with parallel HDF5 when I want to overwrite the existing file?

Looking at the performance, it seems that one of the most costly actions when running my application is closing out the file, and the top “hot call path” includes the H5FD_mpio_truncate method, and in the comments on that function, it indicates that keeping track of EOF is a costly operation over MPI. Looking through the code, it appears that if I don’t use H5F_ACC_TRUNC, it will avoid the overhead of the H5FD_mpio_truncate method.

I’ve used H5F_ACC_TRUNC on serial codes without problems, and it is the option used in several example applications for parallel applications.
Is it better to just delete the pre-existing file and then create the file using H5F_ACC_EXCL?

Am I missing something here?

Jarom Nelson

I tried your program with avoid truncate and it works fine for me.
I am using MPICH 3.2 though.

Were you getting other errors reported from the library other than the trace that you provided?

Mohamad

···

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of "Nelson, Jarom" <nelson99@llnl.gov<mailto:nelson99@llnl.gov>>
Reply-To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Wednesday, February 24, 2016 at 2:19 PM
To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

OK, property list type corrected in the attached. With the correct property list type, I’m getting an infinite loop closing the library (error stack attached) if I set the avoid_truncate flag. Same code with H5P_DEFAULT works fine.

I’ll try the collective metadata write next. Data is contiguous, but is written out as separate hyperslabs across ranks.
Thanks!

Jarom

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Mohamad Chaarawi
Sent: Wednesday, February 24, 2016 11:46 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

10s of datasets and groups do seem like a fairly reasonable amount of metadata to see benefits from the collective metadata write option. If the datasets are chunked, that means more metadata is generated too. I would try the collective write feature and see if the file close speeds up. It does so significantly in most scenarios we have tested it with.

Mohamad

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Mohamad Chaarawi
Sent: Wednesday, February 24, 2016 11:40 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Hi Jarom,

You are using the file access property list for the H5Pset_avoid_truncate call.
This requires a file creation property list.

Thanks,
Mohamad

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of "Nelson, Jarom" <nelson99@llnl.gov<mailto:nelson99@llnl.gov>>
Reply-To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Wednesday, February 24, 2016 at 1:28 PM
To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Testing out the H5Pset_avoid_truncate call, I get the following error when I attempt to set this property on my file access property list:

HDF5-DIAG: Error detected in HDF5 (1.9.233) MPI-process 0:
  #000: H5Pfcpl.c line 1422 in H5Pset_avoid_truncate(): can't find object for ID
    major: Object atom
    minor: Unable to find atom information (already closed?)
  #001: H5Pint.c line 3789 in H5P_object_verify(): property list is not a member of the class
    major: Property lists
    minor: Unable to register new atom

Attached is my simple test program that produces the above error.

Despite the error, my application ran to completion and the file generated appears to work correctly, though it is a very simple test case file. However, I suspect that the property is not being set correctly, because the change does not seem to improve the time it takes to close the HDF5 file. Comparisons using my non-toy program show an increase in time to write and close the file from 64 ranks to 128 ranks (~9 seconds to ~12 seconds).

Note, I’m hopefully optimistic that I built the library correctly from the branch given. Building the avoid_truncate branch checkout from svn didn’t match the build instructions for a release. I ended up doing the following:

### install autoconf version 2.69
./autogen.sh
CC=$(which mpicc) ./configure --enable-parallel --with-zlib
make ### this failed complaining that libtool didn’t have any targets configured, or some similar error message
./config.status
./config.lt
make ### this now worked, I think
make check
make install
make check-install

After that, the new library linked OK with my test code, and it appears that despite the error . TBD whether it is actually working correctly.

Jarom

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Nelson, Jarom
Sent: Tuesday, February 23, 2016 2:55 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Thanks for your help. I’ll attempt to build my application using your avoid_truncate branch and see if it helps with the truncation issue. I may ping for assistance here, since my initial attempts to build HDF5 from the avoid_truncate branch are coming up with some problems.

Regarding the metadata issue, I don’t have any extra metadata other than that required for creating datasets and groups. I wouldn’t call this a “significant amount” of metadata. We are talking tens of datasets in a handful of groups, mostly written only by rank 0. The bulk of the data written is in one array distributed across ranks and written out in hyperslabs in parallel. Though, since all the Dataset and Group creation calls are collective calls, it may amount to a much more significant amount of metadata.

Should I be concerned about the collective calls to create groups and datasets generating a large overhead of metadata and slowing down the file write and close? Or is it just when the application generates a large amount of extra metadata that the metadata write can start to be a significant slowdown?

Jarom
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Mohamad Chaarawi
Sent: Tuesday, February 23, 2016 10:23 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Hi Jarom,

H5_ACC_TRUNC does not have anything to do with calling mpio_truncate at file close. You should use that when creating the file if you want to override it.
What this mode does at file open is call MPI_File_set_size() to 0 basically to empty the file.

Now for the file close performance issue there could be several causes for that. One of them could be the truncate issue. We actually have a patch for that to avoid truncating the file at file close but modify the file format to store both EOA and EOF. This does not work with the 1.8 release. Unfortunately it won't be in the 1.10.0 release either because there are other issues in the library that have to be resolved before this can be merged in. I do highly anticipate that it will be in 1.10.1 though. For now you can test to see if this is the actual cause be using this development branch of HDF5 here:
https://secure-web.cisco.com/1Sgj6HO-TTxegah2HHq3RKNI4BCSLzcIEwTBdd-A5_9jQz1AsUxtHMieBImjEPQR2Z_zbUgVNH23BfDWmMg_SkR71_MNTExRJIAFRoMQ0leuU05fwk01Pqagh5Tn5kiLKVShMuahdHSiVXOJpX-KErew1UWB3oWJKiJeBkY019D8BS-4dDCxbmhUDmI-AVP5D4Sq7avkODjtKE2r7TK0Oo0t-mhxC3NH1AnZYWMEm6tq6pLLZIO7pltDnYJvVB9zmHROPozQyEnA2XFwfNwPHeb67cofbpmdFPfOJDLi6cNErnZMnSVYNpJZKKH3LiqyUrtOckXn3W1Zidi1VwvV16gm_uYMi7UP7a0kC4_g80Vk/https%3A%2F%2Fsvn.hdfgroup.org%2Fhdf5%2Ffeatures%2Favoid_truncate/
And set H5Pset_avoid_truncate(fcpl, H5F_AVOID_TRUNCATE_ALL); on the file creation property list.
Again this is a development branch (not production) so don't keep your HDF5 files.

The second issue (more likely the cause for bad performance I believe) could be the cost of writing out metadata at file close. If you have a significant amount of HDF5 metadata generated by your application, writing out the metadata at file close is very costly currently with the 1.8 release. It wasn't done in a parallel file system friendly manner. In 1.10 we improved this by adding an option for users to set the metadata writes done at file close be issued in one collective MPI write call. To use that feature, you can install the HDF5 trunk version and set this property on the file access property list:
H5Pset_coll_metadata_write (fapl_id, true);

You can get HDF5 trunk here:

https://secure-web.cisco.com/1T5Jb_57z-qywc3MIOU9Qul5b2gjUHXYrHYDe6_absHc0IiqIO2UGuVdIAr5xlQRrn98adKVUgkYuAJW4OgGW2VZyVp9eHsOvALN7GWbTYz0AAeRMeb5qTN3JGSPRewWly_IK5WNE0E8a18houCsycm0flpNPt-1VNpKAOGRJBpTKuzLdAmGq76R1dfHkVeJfyzr-84dblPHdZqe5hj9u2x4ZXwUTQDnbpyiM800pf3AXZmuTu-b4VX72YxHdMnolcbqyTsj5MhcjcwgTXnYZxGMQvn2ahUeGvyGXi5abX808EmkIFdv_gyVXRMcR8TpG_mypin7jyIw_R10kap9FUkpNRWB-TXYqMHRjvG0NEqQ/https%3A%2F%2Fsvn.hdfgroup.org%2Fhdf5%2Ftrunk%2F

This feature will be in the 1.10.0 release that should be out in the next month or so.

Thanks,

Mohamad

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of "Nelson, Jarom" <nelson99@llnl.gov<mailto:nelson99@llnl.gov>>
Reply-To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Tuesday, February 23, 2016 at 11:53 AM
To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Is H5F_ACC_TRUNC the right create option for pre-existing files with parallel HDF5 when I want to overwrite the existing file?

Looking at the performance, it seems that one of the most costly actions when running my application is closing out the file, and the top “hot call path” includes the H5FD_mpio_truncate method, and in the comments on that function, it indicates that keeping track of EOF is a costly operation over MPI. Looking through the code, it appears that if I don’t use H5F_ACC_TRUNC, it will avoid the overhead of the H5FD_mpio_truncate method.

I’ve used H5F_ACC_TRUNC on serial codes without problems, and it is the option used in several example applications for parallel applications.
Is it better to just delete the pre-existing file and then create the file using H5F_ACC_EXCL?

Am I missing something here?

Jarom Nelson

The full output is attached. There are two instances of the infinite loop error with slightly different call stacks.

No other errors.
MVAPICH2 2.1
GCC 4.6.1

Jarom

avoid_truncate_infinite_loop_closing.txt (2.95 KB)

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Mohamad Chaarawi
Sent: Wednesday, February 24, 2016 12:53 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

I tried your program with avoid truncate and it works fine for me.
I am using MPICH 3.2 though.

Were you getting other errors reported from the library other than the trace that you provided?

Mohamad

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of "Nelson, Jarom" <nelson99@llnl.gov<mailto:nelson99@llnl.gov>>
Reply-To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Wednesday, February 24, 2016 at 2:19 PM
To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

OK, property list type corrected in the attached. With the correct property list type, I’m getting an infinite loop closing the library (error stack attached) if I set the avoid_truncate flag. Same code with H5P_DEFAULT works fine.

I’ll try the collective metadata write next. Data is contiguous, but is written out as separate hyperslabs across ranks.
Thanks!

Jarom

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Mohamad Chaarawi
Sent: Wednesday, February 24, 2016 11:46 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

10s of datasets and groups do seem like a fairly reasonable amount of metadata to see benefits from the collective metadata write option. If the datasets are chunked, that means more metadata is generated too. I would try the collective write feature and see if the file close speeds up. It does so significantly in most scenarios we have tested it with.

Mohamad

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Mohamad Chaarawi
Sent: Wednesday, February 24, 2016 11:40 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Hi Jarom,

You are using the file access property list for the H5Pset_avoid_truncate call.
This requires a file creation property list.

Thanks,
Mohamad

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of "Nelson, Jarom" <nelson99@llnl.gov<mailto:nelson99@llnl.gov>>
Reply-To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Wednesday, February 24, 2016 at 1:28 PM
To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Testing out the H5Pset_avoid_truncate call, I get the following error when I attempt to set this property on my file access property list:

HDF5-DIAG: Error detected in HDF5 (1.9.233) MPI-process 0:
  #000: H5Pfcpl.c line 1422 in H5Pset_avoid_truncate(): can't find object for ID
    major: Object atom
    minor: Unable to find atom information (already closed?)
  #001: H5Pint.c line 3789 in H5P_object_verify(): property list is not a member of the class
    major: Property lists
    minor: Unable to register new atom

Attached is my simple test program that produces the above error.

Despite the error, my application ran to completion and the file generated appears to work correctly, though it is a very simple test case file. However, I suspect that the property is not being set correctly, because the change does not seem to improve the time it takes to close the HDF5 file. Comparisons using my non-toy program show an increase in time to write and close the file from 64 ranks to 128 ranks (~9 seconds to ~12 seconds).

Note, I’m hopefully optimistic that I built the library correctly from the branch given. Building the avoid_truncate branch checkout from svn didn’t match the build instructions for a release. I ended up doing the following:

### install autoconf version 2.69
./autogen.sh
CC=$(which mpicc) ./configure --enable-parallel --with-zlib
make ### this failed complaining that libtool didn’t have any targets configured, or some similar error message
./config.status
./config.lt
make ### this now worked, I think
make check
make install
make check-install

After that, the new library linked OK with my test code, and it appears that despite the error . TBD whether it is actually working correctly.

Jarom

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Nelson, Jarom
Sent: Tuesday, February 23, 2016 2:55 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Thanks for your help. I’ll attempt to build my application using your avoid_truncate branch and see if it helps with the truncation issue. I may ping for assistance here, since my initial attempts to build HDF5 from the avoid_truncate branch are coming up with some problems.

Regarding the metadata issue, I don’t have any extra metadata other than that required for creating datasets and groups. I wouldn’t call this a “significant amount” of metadata. We are talking tens of datasets in a handful of groups, mostly written only by rank 0. The bulk of the data written is in one array distributed across ranks and written out in hyperslabs in parallel. Though, since all the Dataset and Group creation calls are collective calls, it may amount to a much more significant amount of metadata.

Should I be concerned about the collective calls to create groups and datasets generating a large overhead of metadata and slowing down the file write and close? Or is it just when the application generates a large amount of extra metadata that the metadata write can start to be a significant slowdown?

Jarom
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Mohamad Chaarawi
Sent: Tuesday, February 23, 2016 10:23 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Hi Jarom,

H5_ACC_TRUNC does not have anything to do with calling mpio_truncate at file close. You should use that when creating the file if you want to override it.
What this mode does at file open is call MPI_File_set_size() to 0 basically to empty the file.

Now for the file close performance issue there could be several causes for that. One of them could be the truncate issue. We actually have a patch for that to avoid truncating the file at file close but modify the file format to store both EOA and EOF. This does not work with the 1.8 release. Unfortunately it won't be in the 1.10.0 release either because there are other issues in the library that have to be resolved before this can be merged in. I do highly anticipate that it will be in 1.10.1 though. For now you can test to see if this is the actual cause be using this development branch of HDF5 here:
https://secure-web.cisco.com/1Sgj6HO-TTxegah2HHq3RKNI4BCSLzcIEwTBdd-A5_9jQz1AsUxtHMieBImjEPQR2Z_zbUgVNH23BfDWmMg_SkR71_MNTExRJIAFRoMQ0leuU05fwk01Pqagh5Tn5kiLKVShMuahdHSiVXOJpX-KErew1UWB3oWJKiJeBkY019D8BS-4dDCxbmhUDmI-AVP5D4Sq7avkODjtKE2r7TK0Oo0t-mhxC3NH1AnZYWMEm6tq6pLLZIO7pltDnYJvVB9zmHROPozQyEnA2XFwfNwPHeb67cofbpmdFPfOJDLi6cNErnZMnSVYNpJZKKH3LiqyUrtOckXn3W1Zidi1VwvV16gm_uYMi7UP7a0kC4_g80Vk/https%3A%2F%2Fsvn.hdfgroup.org%2Fhdf5%2Ffeatures%2Favoid_truncate/
And set H5Pset_avoid_truncate(fcpl, H5F_AVOID_TRUNCATE_ALL); on the file creation property list.
Again this is a development branch (not production) so don't keep your HDF5 files.

The second issue (more likely the cause for bad performance I believe) could be the cost of writing out metadata at file close. If you have a significant amount of HDF5 metadata generated by your application, writing out the metadata at file close is very costly currently with the 1.8 release. It wasn't done in a parallel file system friendly manner. In 1.10 we improved this by adding an option for users to set the metadata writes done at file close be issued in one collective MPI write call. To use that feature, you can install the HDF5 trunk version and set this property on the file access property list:
H5Pset_coll_metadata_write (fapl_id, true);

You can get HDF5 trunk here:

https://secure-web.cisco.com/1T5Jb_57z-qywc3MIOU9Qul5b2gjUHXYrHYDe6_absHc0IiqIO2UGuVdIAr5xlQRrn98adKVUgkYuAJW4OgGW2VZyVp9eHsOvALN7GWbTYz0AAeRMeb5qTN3JGSPRewWly_IK5WNE0E8a18houCsycm0flpNPt-1VNpKAOGRJBpTKuzLdAmGq76R1dfHkVeJfyzr-84dblPHdZqe5hj9u2x4ZXwUTQDnbpyiM800pf3AXZmuTu-b4VX72YxHdMnolcbqyTsj5MhcjcwgTXnYZxGMQvn2ahUeGvyGXi5abX808EmkIFdv_gyVXRMcR8TpG_mypin7jyIw_R10kap9FUkpNRWB-TXYqMHRjvG0NEqQ/https%3A%2F%2Fsvn.hdfgroup.org%2Fhdf5%2Ftrunk%2F

This feature will be in the 1.10.0 release that should be out in the next month or so.

Thanks,

Mohamad

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of "Nelson, Jarom" <nelson99@llnl.gov<mailto:nelson99@llnl.gov>>
Reply-To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Tuesday, February 23, 2016 at 11:53 AM
To: hdf-forum <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] H5Fcreate on Parallel HDF5: H5F_ACC_TRUNC vs. H5F_ACC_EXCL

Is H5F_ACC_TRUNC the right create option for pre-existing files with parallel HDF5 when I want to overwrite the existing file?

Looking at the performance, it seems that one of the most costly actions when running my application is closing out the file, and the top “hot call path” includes the H5FD_mpio_truncate method, and in the comments on that function, it indicates that keeping track of EOF is a costly operation over MPI. Looking through the code, it appears that if I don’t use H5F_ACC_TRUNC, it will avoid the overhead of the H5FD_mpio_truncate method.

I’ve used H5F_ACC_TRUNC on serial codes without problems, and it is the option used in several example applications for parallel applications.
Is it better to just delete the pre-existing file and then create the file using H5F_ACC_EXCL?

Am I missing something here?

Jarom Nelson