Speeding up HDF5 write operation

Hi,

Yes, the selections in the file and memory dataspace are both the same rank and
dimension. I am basically writing a very large vector, so rank always remains 1
and I use the same dimension for both the memory and file dataspace.

Is there something else I may need to check ?

Can I find out where exactly the problem lies ?

Regards,
Nikhil

Hi Nikhil,

> Hi,
>
> While doing paallel writes, if the size of the data being written by
> each
> processor is not the same, can it lead to the operation getting
> serialized by
> the MPI Implementation of HDF5 ?

  This probably shouldn't matter, the HDF5 library should just create
an MPI file view that incorporates the different sizes.

> After looking at all possible reasons that may be slowing my write
> operation,
> I now think that this may be reason.

  Are the selections in the memory dataspaces you are using the same
rank and dimensions as the file dataspace selections?

  Quincey

> Regards,
> Nikhil
>
>> Hi Nikhil,
>>
>>
>>> Hi,
>>>
>>> Sorry about that. Its attached this time.
>>
>> OK, I took a look at your section of code and although it's doing
>> parallel writes, they may be getting serialized somewhere under HDF5
>> by the MPI implementation due to the [apparently] non-regular pattern
>> you are writing. It's also very likely that you are writing too
>> small
>> of an amount of data to see much benefit from parallel I/O.
>>
>> Quincey
>>
>>
>>> Regards,
>>> NIkhil
>>>
>>>> Hi Nikhil,
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> Thanks for your reply.
>>>>>
>>>>> I am attaching part of my code that does the parallel write.
>>>>> Points to notice are:
>>>>>
>>>>> 1. for 'nprocs' processors, there are 'nend' diagonal processors
>>>>> that are
>>>>> actually doing the write, where:
>>>>>
>>>>> nprocs = nend * (nend+1) / 2
>>>>>
>>>>> 2. the subroutine for parallel write, 'phdfwrite' is present in
>>>>> the
>>>>> file hdfmodule.f
>>>>>
>>>>> 3. This subroutine is called only by the diagonal processors(nend)
>>>>>
>>>>> Please find attached the source files.
>>>>
>>>> There was no attachment on your message.
>>>>
>>>> Quincey
>>>>
>>>>> I also notice that for 265875 real nos.,
>>>>> there is no speed difference even between INDEPENDENT and
>>>>> COLLECTIVE
>>>>> IO. Is this
>>>>> because of the small size of the array. Also do you find anything
>>>>> that I may be
>>>>> doing which reduces the speed ?
>>>>>
>>>>> Best Regards,
>>>>> Nikhil
>>>>>
>>>>>> Hi Nikhil,
>>>>>>
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I am writing a HDF5 file in parallel. But to my surprise, the
>>>>>>> performance of the
>>>>>>> parallel write isn't better compared to the serial binary write
>>>>>>> operation. To
>>>>>>> write 265875 real numbers, my HDF write takes about 0.1 seconds
>>>>>>> whereas the
>>>>>>> serial binary operation takes around 0.07 seconds. This is
>>>>>>> surprising as
>>>>>>> parallel should be atleast as fast as serial if not any faster.
>>>>>>>
>>>>>>> Can anybody give me any suggestions as to what can be done to
>>>>>>> noticably speedup
>>>>>>> this write operation ?
>>>>>>
>>>>>> Hmm, are you using collective or independent parallel I/O?
>>>>>> Also,
>>>>>> that's a pretty small dataset, so you are not likely to see much
>>>>>> difference either way.
>>>>>>
>>>>>>> Will the performance of HDF5 write be better than binary for
>>>>>>> very
>>>>>>> large arrays ?
>>>>>>
>>>>>> Our goal is to make HDF5 writes be equivalent to binary for
>>>>>> large
>>>>>> raw
>>>>>> data I/O operations, but to make the files produced self-
>>>>>> describing,
>>>>>> portable, etc. also.
>>>>>>
>>>>>>> If not how can I bring any substantial speedup ?
>>>>>>
>>>>>> This is a very hard question to answer without more
>>>>>> details... :slight_smile:
>>>>>>
>>>>>> Quincey
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Nikhil
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------------
>>>>>>> This mailing list is for HDF software users discussion.
>>>>>>> To subscribe to this list, send a message to hdf-forum-
> subscribe@hdfgroup.org
>>>>>>> .
>>>>>>> To unsubscribe, send a message to hdf-forum-
>>>>>>> unsubscribe@hdfgroup.org.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------------
>>>>>> This mailing list is for HDF software users discussion.
>>>>>> To subscribe to this list, send a message to hdf-forum-
> subscribe@hdfgroup.org
>>>>>> .
>>>>>> To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org
>>>>>> .
>>>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>> Nikhil
>>>>>
>>>>>
>>>>>
>>>>> ----------------------------------------------------------------------
>>>>> This mailing list is for HDF software users discussion.
>>>>> To subscribe to this list, send a message to hdf-forum-
> subscribe@hdfgroup.org
>>>>> .
>>>>> To unsubscribe, send a message to hdf-forum-
>>>>> unsubscribe@hdfgroup.org.
>>>>>
>>>>>
>>>>
>>>
>>>
>>> Regards,
>>> Nikhil
>>> <
>>> parallel
>>> .f
>>>>
>>> <
>>> hdfmodule
>>> .f
>>>>
>>> ----------------------------------------------------------------------
>>> This mailing list is for HDF software users discussion.
>>> To subscribe to this list, send a message to hdf-forum-
> subscribe@hdfgroup.org
>>> .
>>> To unsubscribe, send a message to hdf-forum-
>>> unsubscribe@hdfgroup.org.
>>
>>
>> ----------------------------------------------------------------------
>> This mailing list is for HDF software users discussion.
>> To subscribe to this list, send a message to hdf-forum-
> subscribe@hdfgroup.org.
>> To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.
>>
>
>
> Regards,
> Nikhil
>
>
>

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Regards,
Nikhil

···

On Aug 5, 2008, at 12:21 PM, Nikhil Laghave wrote:
>> On Jul 10, 2008, at 2:34 PM, Nikhil Laghave wrote:
>>>> On Jul 10, 2008, at 2:07 PM, Nikhil Laghave wrote:
>>>>>> On Jul 9, 2008, at 6:39 PM, Nikhil Laghave wrote:

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Nikhil,

Hi,

Yes, the selections in the file and memory dataspace are both the same rank and
dimension. I am basically writing a very large vector, so rank always remains 1
and I use the same dimension for both the memory and file dataspace.

Is there something else I may need to check ?

Can I find out where exactly the problem lies ?

  Hmm, I'm running out of obvious things to check. :-/ If you've got the time & inclination, you could try digging into the HDF5 code more, to pin down more precisely what's going on with your code's use of HDF5. I might suggest using MPI's 'Jumpshot' tool to look at the MPI communication going on, which might help reveal underlying issues.

  Sorry I can't be of more help here, but perhaps someone else in the community has more time right now...

  Quincey

···

On Aug 5, 2008, at 2:38 PM, Nikhil Laghave wrote:

Regards,
Nikhil

Hi Nikhil,

On Aug 5, 2008, at 12:21 PM, Nikhil Laghave wrote:

Hi,

While doing paallel writes, if the size of the data being written by
each
processor is not the same, can it lead to the operation getting
serialized by
the MPI Implementation of HDF5 ?

  This probably shouldn't matter, the HDF5 library should just create
an MPI file view that incorporates the different sizes.

After looking at all possible reasons that may be slowing my write
operation,
I now think that this may be reason.

  Are the selections in the memory dataspaces you are using the same
rank and dimensions as the file dataspace selections?

  Quincey

Regards,
Nikhil

Hi Nikhil,

On Jul 10, 2008, at 2:34 PM, Nikhil Laghave wrote:

Hi,

Sorry about that. Its attached this time.

  OK, I took a look at your section of code and although it's doing
parallel writes, they may be getting serialized somewhere under HDF5
by the MPI implementation due to the [apparently] non-regular pattern
you are writing. It's also very likely that you are writing too
small
of an amount of data to see much benefit from parallel I/O.

  Quincey

Regards,
NIkhil

Hi Nikhil,

On Jul 10, 2008, at 2:07 PM, Nikhil Laghave wrote:

Hi,

Thanks for your reply.

I am attaching part of my code that does the parallel write.
Points to notice are:

1. for 'nprocs' processors, there are 'nend' diagonal processors
that are
actually doing the write, where:

nprocs = nend * (nend+1) / 2

2. the subroutine for parallel write, 'phdfwrite' is present in
the
file hdfmodule.f

3. This subroutine is called only by the diagonal processors(nend)

Please find attached the source files.

  There was no attachment on your message.

    Quincey

I also notice that for 265875 real nos.,
there is no speed difference even between INDEPENDENT and
COLLECTIVE
IO. Is this
because of the small size of the array. Also do you find anything
that I may be
doing which reduces the speed ?

Best Regards,
Nikhil

Hi Nikhil,

On Jul 9, 2008, at 6:39 PM, Nikhil Laghave wrote:

Hi All,

I am writing a HDF5 file in parallel. But to my surprise, the
performance of the
parallel write isn't better compared to the serial binary write
operation. To
write 265875 real numbers, my HDF write takes about 0.1 seconds
whereas the
serial binary operation takes around 0.07 seconds. This is
surprising as
parallel should be atleast as fast as serial if not any faster.

Can anybody give me any suggestions as to what can be done to
noticably speedup
this write operation ?

  Hmm, are you using collective or independent parallel I/O?
Also,
that's a pretty small dataset, so you are not likely to see much
difference either way.

Will the performance of HDF5 write be better than binary for
very
large arrays ?

  Our goal is to make HDF5 writes be equivalent to binary for
large
raw
data I/O operations, but to make the files produced self-
describing,
portable, etc. also.

If not how can I bring any substantial speedup ?

  This is a very hard question to answer without more
details... :slight_smile:

    Quincey

Regards,
Nikhil

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-

subscribe@hdfgroup.org

.
To unsubscribe, send a message to hdf-forum-
unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-

subscribe@hdfgroup.org

.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org
.

Regards,
Nikhil

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-

subscribe@hdfgroup.org

.
To unsubscribe, send a message to hdf-forum-
unsubscribe@hdfgroup.org.

Regards,
Nikhil
<
parallel
.f

<
hdfmodule
.f

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-

subscribe@hdfgroup.org

.
To unsubscribe, send a message to hdf-forum-
unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-

subscribe@hdfgroup.org.

To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Regards,
Nikhil

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Regards,
Nikhil

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.