Improving performance of collective object definitions in phdf5

Hi,

I have a serious performance issue using phdf5 to writing lot of 1D float array data on clusters when the number of processers exceeds about 96.

I profiled the code and it shows that most of the MPI time is spent on H5Dcreate.

The writing (independent) is pretty quick. Are there any ways to speed up performance of the collective object definition?

Ideally ones that don't involve tailoring settings to a specific cluster.

Here is the function that is slow to finish (and often hangs due to exceeding memory?) on more than ~96 processers:

herr_t ASDF_define_waveforms(hid_t loc_id, int num_waveforms, int nsamples,
                            long long int start_time, double sampling_rate,
                            char *event_name, char **waveform_names,
                            int *data_id) {
  int i;
  char char_sampling_rate[10];
  char char_start_time[10];

  // converts to decimal base.
  snprintf(char_start_time, sizeof(char_start_time), "%lld", start_time);
  snprintf(char_sampling_rate,
           sizeof(char_sampling_rate), "%1.7f", sampling_rate);

  for (i = 0; i < num_waveforms; ++i) {
    //CHK_H5(groups[i] = H5Gcreate(loc_id, waveform_names[i],
    // H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT));

    hid_t space_id, dcpl;
    hsize_t dims[1] = {nsamples}; // Length of waveform
    hsize_t maxdims[1] = {H5S_UNLIMITED};

    CHK_H5(space_id= H5Screate_simple(1, dims, maxdims));
    CHK_H5(dcpl = H5Pcreate(H5P_DATASET_CREATE));
    CHK_H5(H5Pset_chunk(dcpl, 1, dims));

    CHK_H5(data_id[i] = H5Dcreate(loc_id, waveform_names[i], H5T_IEEE_F32LE, space_id,
                                  H5P_DEFAULT, dcpl, H5P_DEFAULT));

    CHK_H5(ASDF_write_string_attribute(data_id[i], "event_id",
                                       event_name));
    CHK_H5(ASDF_write_double_attribute(data_id[i], "sampling_rate",
                                       sampling_rate));
    CHK_H5(ASDF_write_integer_attribute(data_id[i], "starttime",
                                       start_time));

    CHK_H5(H5Pclose(dcpl));
    CHK_H5(H5Sclose(space_id));
  }
  return 0; // Success
}

It is run in Fortran code in 3 do loops like this:

do k = 1 mysize
  do j = 1, num_stations_rank(k)
    do i = 1, 3
      call ASDF_define_waveforms(...)
    enddo
  enddo
enddo

So when mysize >96 this is a pretty large number of calls. Any help is appreciated.

Thanks,
James

Hi James,

How many datasets are you creating in total (i.e. what is num_waveforms)?
See this message from Elena last week on how to resolve performance problem when creating large number of objects:

"
Try to use H5Pset_libver_bounds function (see https://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetLibverBounds\) using H5F_LIBVER_LATEST for the second and third arguments to set up a file access property list and then use the access property list when opening existing file or creating a new one.

here is a C code snippet:

fapl_id = H5Pcreate (H5P_FILE_ACCESS);
H5Pset_libver_bounds (fapl_id, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST);
file_id = H5Fcreate(filename, H5F_ACC_TRUNC, H5P_DEFAULT, fapl_d);

By default, the HDF5 library uses the earliest version of the file format when creating groups. The indexing structure used for that version has a know deficiency when working with a big number (>50K) of objects in a group. The issue was addressed in HDF5 1.8, but requires an applications to "turn on" the latest file format.

Implications of the latest file format on the performance are not well documented. The HDF Group is aware of the issue and will be addressing it for the upcoming releases.
"

I do suspect that there is another issue here too. You might be triggering evictions from the metadata cache due to the number of datasets creates and the evictions are causing bad I/O performance. Could you try running your program with this HDF5 branch and see if you get any improvement:
https://svn.hdfgroup.org/hdf5/features/phdf5_metadata_opt/
(This is a development branch based of HDF5 trunk and requires recent versions of autotools to run ./autogen.sh before being able to configure).
If you have trouble building this, ping me off-list and we can work things out.

Thanks,
Mohamad

···

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of James A. Smith
Sent: Friday, December 04, 2015 4:09 PM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] Improving performance of collective object definitions in phdf5

Hi,

I have a serious performance issue using phdf5 to writing lot of 1D float array data on clusters when the number of processers exceeds about 96.

I profiled the code and it shows that most of the MPI time is spent on H5Dcreate.

The writing (independent) is pretty quick. Are there any ways to speed up performance of the collective object definition?

Ideally ones that don't involve tailoring settings to a specific cluster.

Here is the function that is slow to finish (and often hangs due to exceeding memory?) on more than ~96 processers:

herr_t ASDF_define_waveforms(hid_t loc_id, int num_waveforms, int nsamples,
                            long long int start_time, double sampling_rate,
                            char *event_name, char **waveform_names,
                            int *data_id) {
  int i;
  char char_sampling_rate[10];
  char char_start_time[10];

  // converts to decimal base.
  snprintf(char_start_time, sizeof(char_start_time), "%lld", start_time);
  snprintf(char_sampling_rate,
           sizeof(char_sampling_rate), "%1.7f", sampling_rate);

  for (i = 0; i < num_waveforms; ++i) {
    //CHK_H5(groups[i] = H5Gcreate(loc_id, waveform_names[i],
    // H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT));

    hid_t space_id, dcpl;
    hsize_t dims[1] = {nsamples}; // Length of waveform
    hsize_t maxdims[1] = {H5S_UNLIMITED};

    CHK_H5(space_id= H5Screate_simple(1, dims, maxdims));
    CHK_H5(dcpl = H5Pcreate(H5P_DATASET_CREATE));
    CHK_H5(H5Pset_chunk(dcpl, 1, dims));

    CHK_H5(data_id[i] = H5Dcreate(loc_id, waveform_names[i], H5T_IEEE_F32LE, space_id,
                                  H5P_DEFAULT, dcpl, H5P_DEFAULT));

    CHK_H5(ASDF_write_string_attribute(data_id[i], "event_id",
                                       event_name));
    CHK_H5(ASDF_write_double_attribute(data_id[i], "sampling_rate",
                                       sampling_rate));
    CHK_H5(ASDF_write_integer_attribute(data_id[i], "starttime",
                                       start_time));

    CHK_H5(H5Pclose(dcpl));
    CHK_H5(H5Sclose(space_id));
  }
  return 0; // Success
}

It is run in Fortran code in 3 do loops like this:

do k = 1 mysize
  do j = 1, num_stations_rank(k)
    do i = 1, 3
      call ASDF_define_waveforms(...)
    enddo
  enddo
enddo

So when mysize >96 this is a pretty large number of calls. Any help is appreciated.

Thanks,
James

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi Mohamad,

num_waveforms is around 2000 (defined on say 384 processors means 785,000 waveform definitions on all processers). We would like to get this code to scale for a num_waveforms of 300,000.

I will give you suggestions a shot. Thanks!

James

···

On Dec 7, 2015, at 3:26 PM, Mohamad Chaarawi <chaarawi@hdfgroup.org> wrote:

Hi James,

How many datasets are you creating in total (i.e. what is num_waveforms)?
See this message from Elena last week on how to resolve performance problem when creating large number of objects:

"
Try to use H5Pset_libver_bounds function (see https://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetLibverBounds) using H5F_LIBVER_LATEST for the second and third arguments to set up a file access property list and then use the access property list when opening existing file or creating a new one.

here is a C code snippet:

fapl_id = H5Pcreate (H5P_FILE_ACCESS);
H5Pset_libver_bounds (fapl_id, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST);
file_id = H5Fcreate(filename, H5F_ACC_TRUNC, H5P_DEFAULT, fapl_d);

By default, the HDF5 library uses the earliest version of the file format when creating groups. The indexing structure used for that version has a know deficiency when working with a big number (>50K) of objects in a group. The issue was addressed in HDF5 1.8, but requires an applications to "turn on" the latest file format.

Implications of the latest file format on the performance are not well documented. The HDF Group is aware of the issue and will be addressing it for the upcoming releases.
"

I do suspect that there is another issue here too. You might be triggering evictions from the metadata cache due to the number of datasets creates and the evictions are causing bad I/O performance. Could you try running your program with this HDF5 branch and see if you get any improvement:
https://svn.hdfgroup.org/hdf5/features/phdf5_metadata_opt/
(This is a development branch based of HDF5 trunk and requires recent versions of autotools to run ./autogen.sh before being able to configure).
If you have trouble building this, ping me off-list and we can work things out.

Thanks,
Mohamad

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of James A. Smith
Sent: Friday, December 04, 2015 4:09 PM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] Improving performance of collective object definitions in phdf5

Hi,

I have a serious performance issue using phdf5 to writing lot of 1D float array data on clusters when the number of processers exceeds about 96.

I profiled the code and it shows that most of the MPI time is spent on H5Dcreate.

The writing (independent) is pretty quick. Are there any ways to speed up performance of the collective object definition?

Ideally ones that don't involve tailoring settings to a specific cluster.

Here is the function that is slow to finish (and often hangs due to exceeding memory?) on more than ~96 processers:

herr_t ASDF_define_waveforms(hid_t loc_id, int num_waveforms, int nsamples,
                           long long int start_time, double sampling_rate,
                           char *event_name, char **waveform_names,
                           int *data_id) {
int i;
char char_sampling_rate[10];
char char_start_time[10];

// converts to decimal base.
snprintf(char_start_time, sizeof(char_start_time), "%lld", start_time);
snprintf(char_sampling_rate,
          sizeof(char_sampling_rate), "%1.7f", sampling_rate);

for (i = 0; i < num_waveforms; ++i) {
   //CHK_H5(groups[i] = H5Gcreate(loc_id, waveform_names[i],
   // H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT));

   hid_t space_id, dcpl;
   hsize_t dims[1] = {nsamples}; // Length of waveform
   hsize_t maxdims[1] = {H5S_UNLIMITED};

   CHK_H5(space_id= H5Screate_simple(1, dims, maxdims));
   CHK_H5(dcpl = H5Pcreate(H5P_DATASET_CREATE));
   CHK_H5(H5Pset_chunk(dcpl, 1, dims));

   CHK_H5(data_id[i] = H5Dcreate(loc_id, waveform_names[i], H5T_IEEE_F32LE, space_id,
                                 H5P_DEFAULT, dcpl, H5P_DEFAULT));

   CHK_H5(ASDF_write_string_attribute(data_id[i], "event_id",
                                      event_name));
   CHK_H5(ASDF_write_double_attribute(data_id[i], "sampling_rate",
                                      sampling_rate));
   CHK_H5(ASDF_write_integer_attribute(data_id[i], "starttime",
                                      start_time));

   CHK_H5(H5Pclose(dcpl));
   CHK_H5(H5Sclose(space_id));
}
return 0; // Success
}

It is run in Fortran code in 3 do loops like this:

do k = 1 mysize
do j = 1, num_stations_rank(k)
   do i = 1, 3
     call ASDF_define_waveforms(...)
   enddo
enddo
enddo

So when mysize >96 this is a pretty large number of calls. Any help is appreciated.

Thanks,
James

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5