One dataset per process

VictorSV · August 11, 2015, 3:20pm

Hello all,

I'm new in this forum, sorry in advance if it's a duplicate question.

I've read this post written in the year 2009:

http://hdf-forum.184993.n3.nabble.com/hdf-forum-One-dataset-per-process-tt194128.html

it talks about the bad performance of using the approach of "One dataset
per process". Now I've checked this approach getting bad results too.

Are this comments still valid?

I want to write a partitioned mesh where each processor only have its own
point of view, there isn't a global mesh concept. What's the best approach
to do this? There is a starndard to keeping the data grouped per processor
inside the HDF5 file hierarchy (groups,datasets,etc.)?

Thanks in advance,
Víctor.

Mohamad_Chaarawi · August 11, 2015, 3:40pm

Hi Victor,

Creating datasets (or any other object) is a collective operation and so you need all processes calling the H5Dcreate() for every dataset.
Accessing raw data (H5Dwrite/H5Dread) on each dataset however can be independent or collective, so you can create all the datasets, one for each process collectively, then have each process access its dataset independently.

Bad performance comes from how much data you are accessing from each process. If each process is accessing small amounts of data to each dataset in every H5Dwrite/H5Dread, then bad performance is expected. In that case I suggest you try and see if accessing one big dataset collectively is an option for your application.
But if the data size per access is large, then performance shouldn’t be bad.

Thanks,
Mohamad

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of victor sv
Sent: Tuesday, August 11, 2015 10:21 AM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] [hdf-forum] One dataset per process

Hello all,
I'm new in this forum, sorry in advance if it's a duplicate question.
I've read this post written in the year 2009:

http://hdf-forum.184993.n3.nabble.com/hdf-forum-One-dataset-per-process-tt194128.html

it talks about the bad performance of using the approach of "One dataset per process". Now I've checked this approach getting bad results too.

Are this comments still valid?
I want to write a partitioned mesh where each processor only have its own point of view, there isn't a global mesh concept. What's the best approach to do this? There is a starndard to keeping the data grouped per processor inside the HDF5 file hierarchy (groups,datasets,etc.)?
Thanks in advance,
Víctor.

VictorSV · August 11, 2015, 3:45pm

Thank you Mohamad,

yes, actually I'm using small amount of data. I'm going to try with bigger
ones and see the results

Best regars,
Víctor.

···

2015-08-11 17:40 GMT+02:00 Mohamad Chaarawi <chaarawi@hdfgroup.org>:

Hi Victor,

Creating datasets (or any other object) is a collective operation and so
you need all processes calling the H5Dcreate() for every dataset.

Accessing raw data (H5Dwrite/H5Dread) on each dataset however can be
independent or collective, so you can create all the datasets, one for each
process collectively, then have each process access its dataset
independently.

Bad performance comes from how much data you are accessing from each
process. If each process is accessing small amounts of data to each dataset
in every H5Dwrite/H5Dread, then bad performance is expected. In that case I
suggest you try and see if accessing one big dataset collectively is an
option for your application.

But if the data size per access is large, then performance shouldn’t be
bad.

Thanks,

Mohamad

*From:* Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] *On
Behalf Of *victor sv
*Sent:* Tuesday, August 11, 2015 10:21 AM
*To:* hdf-forum@lists.hdfgroup.org
*Subject:* [Hdf-forum] [hdf-forum] One dataset per process

Hello all,

I'm new in this forum, sorry in advance if it's a duplicate question.

I've read this post written in the year 2009:

http://hdf-forum.184993.n3.nabble.com/hdf-forum-One-dataset-per-process-tt194128.html

it talks about the bad performance of using the approach of "One dataset
per process". Now I've checked this approach getting bad results too.

Are this comments still valid?

I want to write a partitioned mesh where each processor only have its own
point of view, there isn't a global mesh concept. What's the best approach
to do this? There is a starndard to keeping the data grouped per processor
inside the HDF5 file hierarchy (groups,datasets,etc.)?

Thanks in advance,

Víctor.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: x.com

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

One dataset per process