Speeding up HDF5 write operation

Nikhil_Laghave1 · August 1, 2008, 6:57pm

Hi Quincey,

It is mentioned below that the implementation may be getting serialized because
of the non-regular sized datasets. However, I am not very clear about what is a
non-regular hyperslab. I looked up one of the HDF5 documents and it says:

"A regular hyperslab is a hyperslab that generated inside an HDF5 program with
only one H5Sselect_hyperslab routine call for the selected data space."

According to this definition, my dataspace is regular since I use only one
H5Sselect_hyperslab call for each process. The size of the dataspace for each
process however is different. Could this be the problem ? Parallel HDF5 is not
showing any improvements in speed even as the number of processors increase,
hence I think the implementation is getting serialized.

My program follows the important condition for collective IO i.e. hyperslab
selection in each process must be regular and all hyperslab selections must be
within one chunk.

Thanks and Regards,
Nikhil

Hi Nikhil,

> Hi,
>
> Sorry about that. Its attached this time.

OK, I took a look at your section of code and although it's doing
parallel writes, they may be getting serialized somewhere under HDF5
by the MPI implementation due to the [apparently] non-regular pattern
you are writing. It's also very likely that you are writing too small
of an amount of data to see much benefit from parallel I/O.

Quincey

> Regards,
> NIkhil
>
>> Hi Nikhil,
>>
>>
>>> Hi,
>>>
>>> Thanks for your reply.
>>>
>>> I am attaching part of my code that does the parallel write.
>>> Points to notice are:
>>>
>>> 1. for 'nprocs' processors, there are 'nend' diagonal processors
>>> that are
>>> actually doing the write, where:
>>>
>>> nprocs = nend * (nend+1) / 2
>>>
>>> 2. the subroutine for parallel write, 'phdfwrite' is present in the
>>> file hdfmodule.f
>>>
>>> 3. This subroutine is called only by the diagonal processors(nend)
>>>
>>> Please find attached the source files.
>>
>> There was no attachment on your message.
>>
>> Quincey
>>
>>> I also notice that for 265875 real nos.,
>>> there is no speed difference even between INDEPENDENT and COLLECTIVE
>>> IO. Is this
>>> because of the small size of the array. Also do you find anything
>>> that I may be
>>> doing which reduces the speed ?
>>>
>>> Best Regards,
>>> Nikhil
>>>
>>>> Hi Nikhil,
>>>>
>>>>
>>>>> Hi All,
>>>>>
>>>>> I am writing a HDF5 file in parallel. But to my surprise, the
>>>>> performance of the
>>>>> parallel write isn't better compared to the serial binary write
>>>>> operation. To
>>>>> write 265875 real numbers, my HDF write takes about 0.1 seconds
>>>>> whereas the
>>>>> serial binary operation takes around 0.07 seconds. This is
>>>>> surprising as
>>>>> parallel should be atleast as fast as serial if not any faster.
>>>>>
>>>>> Can anybody give me any suggestions as to what can be done to
>>>>> noticably speedup
>>>>> this write operation ?
>>>>
>>>> Hmm, are you using collective or independent parallel I/O? Also,
>>>> that's a pretty small dataset, so you are not likely to see much
>>>> difference either way.
>>>>
>>>>> Will the performance of HDF5 write be better than binary for very
>>>>> large arrays ?
>>>>
>>>> Our goal is to make HDF5 writes be equivalent to binary for large
>>>> raw
>>>> data I/O operations, but to make the files produced self-
>>>> describing,
>>>> portable, etc. also.
>>>>
>>>>> If not how can I bring any substantial speedup ?
>>>>
>>>> This is a very hard question to answer without more details...
>>>>
>>>> Quincey
>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>> Nikhil
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ----------------------------------------------------------------------
>>>>> This mailing list is for HDF software users discussion.
>>>>> To subscribe to this list, send a message to

hdf-forum-subscribe@hdfgroup.org

>>>>> .
>>>>> To unsubscribe, send a message to hdf-forum-
>>>>> unsubscribe@hdfgroup.org.
>>>>>
>>>>>
>>>>
>>>>
>>>> ----------------------------------------------------------------------
>>>> This mailing list is for HDF software users discussion.
>>>> To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org
>>>> .
>>>> To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org
>>>> .
>>>>
>>>
>>>
>>> Regards,
>>> Nikhil
>>>
>>>
>>>
>>> ----------------------------------------------------------------------
>>> This mailing list is for HDF software users discussion.
>>> To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org
>>> .
>>> To unsubscribe, send a message to hdf-forum-
>>> unsubscribe@hdfgroup.org.
>>>
>>>
>>
>
>
> Regards,
> Nikhil
> <
> parallel
> .f
> >
> <
> hdfmodule
> .f
> >
> ----------------------------------------------------------------------
> This mailing list is for HDF software users discussion.
> To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org
> .
> To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Regards,
Nikhil

···

On Jul 10, 2008, at 2:34 PM, Nikhil Laghave wrote:
>> On Jul 10, 2008, at 2:07 PM, Nikhil Laghave wrote:
>>>> On Jul 9, 2008, at 6:39 PM, Nikhil Laghave wrote:

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Speeding up HDF5 write operation