Hi,
I have a serious performance issue using phdf5 to writing lot of 1D float array data on clusters when the number of processers exceeds about 96.
I profiled the code and it shows that most of the MPI time is spent on H5Dcreate.
The writing (independent) is pretty quick. Are there any ways to speed up performance of the collective object definition?
Ideally ones that don't involve tailoring settings to a specific cluster.
Here is the function that is slow to finish (and often hangs due to exceeding memory?) on more than ~96 processers:
herr_t ASDF_define_waveforms(hid_t loc_id, int num_waveforms, int nsamples,
long long int start_time, double sampling_rate,
char *event_name, char **waveform_names,
int *data_id) {
int i;
char char_sampling_rate[10];
char char_start_time[10];
// converts to decimal base.
snprintf(char_start_time, sizeof(char_start_time), "%lld", start_time);
snprintf(char_sampling_rate,
sizeof(char_sampling_rate), "%1.7f", sampling_rate);
for (i = 0; i < num_waveforms; ++i) {
//CHK_H5(groups[i] = H5Gcreate(loc_id, waveform_names[i],
// H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT));
hid_t space_id, dcpl;
hsize_t dims[1] = {nsamples}; // Length of waveform
hsize_t maxdims[1] = {H5S_UNLIMITED};
CHK_H5(space_id= H5Screate_simple(1, dims, maxdims));
CHK_H5(dcpl = H5Pcreate(H5P_DATASET_CREATE));
CHK_H5(H5Pset_chunk(dcpl, 1, dims));
CHK_H5(data_id[i] = H5Dcreate(loc_id, waveform_names[i], H5T_IEEE_F32LE, space_id,
H5P_DEFAULT, dcpl, H5P_DEFAULT));
CHK_H5(ASDF_write_string_attribute(data_id[i], "event_id",
event_name));
CHK_H5(ASDF_write_double_attribute(data_id[i], "sampling_rate",
sampling_rate));
CHK_H5(ASDF_write_integer_attribute(data_id[i], "starttime",
start_time));
CHK_H5(H5Pclose(dcpl));
CHK_H5(H5Sclose(space_id));
}
return 0; // Success
}
It is run in Fortran code in 3 do loops like this:
do k = 1 mysize
do j = 1, num_stations_rank(k)
do i = 1, 3
call ASDF_define_waveforms(...)
enddo
enddo
enddo
So when mysize >96 this is a pretty large number of calls. Any help is appreciated.
Thanks,
James