Store data in hierarchy, does it impact the performance?

Hi Quincey,

Here I attached my code for the benchmark.

Thank you

regards,
Elisa

Code.txt (24.6 KB)

···

--- On Tue, 9/14/10, Quincey Koziol <koziol@hdfgroup.org> wrote:

From: Quincey Koziol <koziol@hdfgroup.org>
Subject: Re: [Hdf-forum] Store data in hierarchy, does it impact the performance?
To: "HDF Users Discussion List" <hdf-forum@hdfgroup.org>
Date: Tuesday, September 14, 2010, 6:23 PM

Hi Elisa,

On Sep 15, 2010, at 12:09 AM, elisa sibarani wrote:

Hi Quincey,

Here I attached the example of HDF5 file with hierarchy, file with no hierarchy inside, and the benchmark result.

Actually, I meant source code for the benchmark\.  Can you send that?

Quincey

Thank you for the reply.

Regards,
Elisa MS

--- On Tue, 9/14/10, Quincey Koziol <koziol@hdfgroup.org> wrote:

From: Quincey Koziol <koziol@hdfgroup.org>
Subject: Re: [Hdf-forum] Store data in hierarchy, does it impact the performance?
To: "HDF Users Discussion List" <hdf-forum@hdfgroup.org>
Date: Tuesday, September 14, 2010, 5:47 PM

Hi Elisa,

On Sep 14, 2010, at 4:43 PM, elisa sibarani wrote:

> Hi All,
>
> I really need your help or idea, do I need to store data in hierarchy if I want to use HDF5? When I do a small benchmark, the performance of the file decrease when the data store directly in a dataset (after a 'root' group), rather than in a hierarchical way, Is there any reason behind that result?
>
> Please, really need the reason of this question.

 Hmm, I don&#39;t have any good reason why this should be so\.  Do you have a small benchmark that demonstrates the issue?

 Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

<BenchmarkResult.jpg><HDF_NoHierarchy.jpg><HDF_WithHierarchy.jpg>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Elisa,

Hi Quincey,

Here I attached my code for the benchmark.

  Hmm, I think this is in C#, yes? I don't have a good way to run this (on my Mac) unfortunately. Also, it's fairly complex - can you boil it down to some simpler C routines that concisely show the cases where you are seeing the performance difference?

  Quincey

···

On Sep 15, 2010, at 12:46 AM, elisa sibarani wrote:

Thank you

regards,
Elisa

--- On Tue, 9/14/10, Quincey Koziol <koziol@hdfgroup.org> wrote:

From: Quincey Koziol <koziol@hdfgroup.org>
Subject: Re: [Hdf-forum] Store data in hierarchy, does it impact the performance?
To: "HDF Users Discussion List" <hdf-forum@hdfgroup.org>
Date: Tuesday, September 14, 2010, 6:23 PM

Hi Elisa,

On Sep 15, 2010, at 12:09 AM, elisa sibarani wrote:

> Hi Quincey,
>
> Here I attached the example of HDF5 file with hierarchy, file with no hierarchy inside, and the benchmark result.

    Actually, I meant source code for the benchmark. Can you send that?

    Quincey

> Thank you for the reply.
>
> Regards,
> Elisa MS
>
>
> --- On Tue, 9/14/10, Quincey Koziol <koziol@hdfgroup.org> wrote:
>
> From: Quincey Koziol <koziol@hdfgroup.org>
> Subject: Re: [Hdf-forum] Store data in hierarchy, does it impact the performance?
> To: "HDF Users Discussion List" <hdf-forum@hdfgroup.org>
> Date: Tuesday, September 14, 2010, 5:47 PM
>
> Hi Elisa,
>
> On Sep 14, 2010, at 4:43 PM, elisa sibarani wrote:
>
> > Hi All,
> >
> > I really need your help or idea, do I need to store data in hierarchy if I want to use HDF5? When I do a small benchmark, the performance of the file decrease when the data store directly in a dataset (after a 'root' group), rather than in a hierarchical way, Is there any reason behind that result?
> >
> > Please, really need the reason of this question.
>
> Hmm, I don't have any good reason why this should be so. Do you have a small benchmark that demonstrates the issue?
>
> Quincey
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
> <BenchmarkResult.jpg><HDF_NoHierarchy.jpg><HDF_WithHierarchy.jpg>_______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

<Code.txt>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Quincey,

Yes this is in C#. There are two parts in it:
1. HDF with Hierarchy
It consists of writing and retrieving to the file:
a. Writing to the file
Check first whether a group exists or not, I'm using the function H5Exists, in here, because I have more than one hierarchy, there will be more than one for this code. If the group is exists, I should just open it, but if it is not, then I should create the new one. After that I create a dataset using the routine H5D.create and write to the dataset using H5D.write as follows:

···

-----------------------------------------------------------------------------------------------------------------------------------
dataSetId = H5D.create(fileId, "/" + circuitid + "/" + groupname[0] + "/" + groupname[1] + "/" + groupname[2] + "/" + groupname[3],typeId1, spaceId);
H5D.write(dataSetId,typeId1 ,new H5Array<struct1>(dset_data));
-----------------------------------------------------------------------------------------------------------------------------------
I also write attribute to the dataset using H5A.write, but before that I also have to check whether it is already exists or not.
b. Retrieve data from file, again it should check for the group whether it is already exists or not, and finally open the dataset and read the content using H5D.open:
-----------------------------------------------------------------------------------------------------------------------------------
dataSetId = H5D.open(fileId, "/" + circuitid + "/" + groupname[0] + "/" + groupname[1] + "/" + groupname[2] + "/" + ij + ":00:00");
H5D.read(dataSetId,typeId1, new H5Array<struct1>(readDataBack));
-----------------------------------------------------------------------------------------------------------------------------------
2. HDF with No Hierarchy
also consists of two process:
a. Writing to the file, the process does not have to check whether any group exists or not because there is no hierarchy in this file, the process directly goes to create a dataset:
-----------------------------------------------------------------------------------------------------------------------------------
string dsetname = circuitid + "_" + groupname[0] + "_" + groupname[1] + "_" + groupname[2] + "_" + groupname[3];
dataSetId = H5D.create(fileId,dsetname ,typeId1, spaceId);
H5D.write(dataSetId,typeId1 ,new H5Array<struct1>(dset_data));
-----------------------------------------------------------------------------------------------------------------------------------
b. Retrieve from file, also directly open the dataset without have to check whether any group already exists or not:
-----------------------------------------------------------------------------------------------------------------------------------
dataSetId = H5D.open(fileId, circuitid + "_" + groupname[0] + "_" + groupname[1] + "_" + groupname[2] + "_" + ij + ":00:00");
H5D.read(dataSetId, typeId1, new H5Array<struct1>(readDataBack));
-----------------------------------------------------------------------------------------------------------------------------------

Therefore, I could not find the reason what makes the No Hierarchy consumes more time than the Hierarchy because there is no need to check each group exists or not, or even create a group, but what makes it longer than the hierarchy for insert and retrieve.

I hope my explanation gives you any idea about what I'm asking.

Thanks again for replying.

Regards,
Elisa

--- On Wed, 9/15/10, Quincey Koziol <koziol@hdfgroup.org> wrote:

From: Quincey Koziol <koziol@hdfgroup.org>
Subject: Re: [Hdf-forum] Store data in hierarchy, does it impact the performance?
To: "HDF Users Discussion List" <hdf-forum@hdfgroup.org>
Date: Wednesday, September 15, 2010, 7:11 AM

Hi Elisa,

On Sep 15, 2010, at 12:46 AM, elisa sibarani wrote:

Hi Quincey,

Here I attached my code for the benchmark.

Hmm, I think this is in C\#, yes?  I don&#39;t have a good way to run this \(on my Mac\) unfortunately\.  Also, it&#39;s fairly complex \- can you boil it down to some simpler C routines that concisely show the cases where you are seeing the performance difference?

Quincey

Thank you

regards,
Elisa

--- On Tue, 9/14/10, Quincey Koziol <koziol@hdfgroup.org> wrote:

From: Quincey Koziol <koziol@hdfgroup.org>
Subject: Re: [Hdf-forum] Store data in hierarchy, does it impact the performance?
To: "HDF Users Discussion List" <hdf-forum@hdfgroup.org>
Date: Tuesday, September 14, 2010, 6:23 PM

Hi Elisa,

On Sep 15, 2010, at 12:09 AM, elisa sibarani wrote:

> Hi Quincey,
>
> Here I attached the example of HDF5 file with hierarchy, file with no hierarchy inside, and the benchmark result.

 Actually, I meant source code for the benchmark\.  Can you send that?

 Quincey

> Thank you for the reply.
>
> Regards,
> Elisa MS
>
>
> --- On Tue, 9/14/10, Quincey Koziol <koziol@hdfgroup.org> wrote:
>
> From: Quincey Koziol <koziol@hdfgroup.org>
> Subject: Re: [Hdf-forum] Store data in hierarchy, does it impact the performance?
> To: "HDF Users Discussion List" <hdf-forum@hdfgroup.org>
> Date: Tuesday, September 14, 2010, 5:47 PM
>
> Hi Elisa,
>
> On Sep 14, 2010, at 4:43 PM, elisa sibarani wrote:
>
> > Hi All,
> >
> > I really need your help or idea, do I need to store data in hierarchy if I want to use HDF5? When I do a small benchmark, the performance of the file decrease when the data store directly in a dataset (after a 'root' group), rather than in a hierarchical way, Is there any reason behind that result?
> >
> > Please, really need the reason of this question.
>
> Hmm, I don't have any good reason why this should be so. Do you have a small benchmark that demonstrates the issue?
>
> Quincey
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
> <BenchmarkResult.jpg><HDF_NoHierarchy.jpg><HDF_WithHierarchy.jpg>_______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

<Code.txt>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Elisa,

Hi Quincey,

Yes this is in C#. There are two parts in it:
1. HDF with Hierarchy
    It consists of writing and retrieving to the file:
    a. Writing to the file
        Check first whether a group exists or not, I'm using the function H5Exists, in here, because I have more than one hierarchy, there will be more than one for this code. If the group is exists, I should just open it, but if it is not, then I should create the new one. After that I create a dataset using the routine H5D.create and write to the dataset using H5D.write as follows:
-----------------------------------------------------------------------------------------------------------------------------------
dataSetId = H5D.create(fileId, "/" + circuitid + "/" + groupname[0] + "/" + groupname[1] + "/" + groupname[2] + "/" + groupname[3],typeId1, spaceId);
H5D.write(dataSetId,typeId1 ,new H5Array<struct1>(dset_data));
-----------------------------------------------------------------------------------------------------------------------------------
I also write attribute to the dataset using H5A.write, but before that I also have to check whether it is already exists or not.
     b. Retrieve data from file, again it should check for the group whether it is already exists or not, and finally open the dataset and read the content using H5D.open:
-----------------------------------------------------------------------------------------------------------------------------------
dataSetId = H5D.open(fileId, "/" + circuitid + "/" + groupname[0] + "/" + groupname[1] + "/" + groupname[2] + "/" + ij + ":00:00");
H5D.read(dataSetId,typeId1, new H5Array<struct1>(readDataBack));
-----------------------------------------------------------------------------------------------------------------------------------
2. HDF with No Hierarchy
    also consists of two process:
    a. Writing to the file, the process does not have to check whether any group exists or not because there is no hierarchy in this file, the process directly goes to create a dataset:
-----------------------------------------------------------------------------------------------------------------------------------
string dsetname = circuitid + "_" + groupname[0] + "_" + groupname[1] + "_" + groupname[2] + "_" + groupname[3];
dataSetId = H5D.create(fileId,dsetname ,typeId1, spaceId);
H5D.write(dataSetId,typeId1 ,new H5Array<struct1>(dset_data));
-----------------------------------------------------------------------------------------------------------------------------------
    b. Retrieve from file, also directly open the dataset without have to check whether any group already exists or not:
-----------------------------------------------------------------------------------------------------------------------------------
dataSetId = H5D.open(fileId, circuitid + "_" + groupname[0] + "_" + groupname[1] + "_" + groupname[2] + "_" + ij + ":00:00");
H5D.read(dataSetId, typeId1, new H5Array<struct1>(readDataBack));
-----------------------------------------------------------------------------------------------------------------------------------

Therefore, I could not find the reason what makes the No Hierarchy consumes more time than the Hierarchy because there is no need to check each group exists or not, or even create a group, but what makes it longer than the hierarchy for insert and retrieve.

I hope my explanation gives you any idea about what I'm asking.

  Looks like a reasonable set of actions. Unfortunately, I can't think of any reason
why the Hierarchy is faster. Again, simple C programs that showed explicitly the difference would probably enable others to give you more concrete feedback.

  Quincey

···

On Sep 16, 2010, at 5:32 AM, elisa sibarani wrote:

Thanks again for replying.

Regards,
Elisa

--- On Wed, 9/15/10, Quincey Koziol <koziol@hdfgroup.org> wrote:

From: Quincey Koziol <koziol@hdfgroup.org>
Subject: Re: [Hdf-forum] Store data in hierarchy, does it impact the performance?
To: "HDF Users Discussion List" <hdf-forum@hdfgroup.org>
Date: Wednesday, September 15, 2010, 7:11 AM

Hi Elisa,

On Sep 15, 2010, at 12:46 AM, elisa sibarani wrote:

> Hi Quincey,
>
> Here I attached my code for the benchmark.

    Hmm, I think this is in C#, yes? I don't have a good way to run this (on my Mac) unfortunately. Also, it's fairly complex - can you boil it down to some simpler C routines that concisely show the cases where you are seeing the performance difference?

    Quincey

> Thank you
>
> regards,
> Elisa
>
> --- On Tue, 9/14/10, Quincey Koziol <koziol@hdfgroup.org> wrote:
>
> From: Quincey Koziol <koziol@hdfgroup.org>
> Subject: Re: [Hdf-forum] Store data in hierarchy, does it impact the performance?
> To: "HDF Users Discussion List" <hdf-forum@hdfgroup.org>
> Date: Tuesday, September 14, 2010, 6:23 PM
>
> Hi Elisa,
>
> On Sep 15, 2010, at 12:09 AM, elisa sibarani wrote:
>
> > Hi Quincey,
> >
> > Here I attached the example of HDF5 file with hierarchy, file with no hierarchy inside, and the benchmark result.
>
> Actually, I meant source code for the benchmark. Can you send that?
>
> Quincey
>
> > Thank you for the reply.
> >
> > Regards,
> > Elisa MS
> >
> >
> > --- On Tue, 9/14/10, Quincey Koziol <koziol@hdfgroup.org> wrote:
> >
> > From: Quincey Koziol <koziol@hdfgroup.org>
> > Subject: Re: [Hdf-forum] Store data in hierarchy, does it impact the performance?
> > To: "HDF Users Discussion List" <hdf-forum@hdfgroup.org>
> > Date: Tuesday, September 14, 2010, 5:47 PM
> >
> > Hi Elisa,
> >
> > On Sep 14, 2010, at 4:43 PM, elisa sibarani wrote:
> >
> > > Hi All,
> > >
> > > I really need your help or idea, do I need to store data in hierarchy if I want to use HDF5? When I do a small benchmark, the performance of the file decrease when the data store directly in a dataset (after a 'root' group), rather than in a hierarchical way, Is there any reason behind that result?
> > >
> > > Please, really need the reason of this question.
> >
> > Hmm, I don't have any good reason why this should be so. Do you have a small benchmark that demonstrates the issue?
> >
> > Quincey
> >
> >
> > _______________________________________________
> > Hdf-forum is for HDF software users discussion.
> > Hdf-forum@hdfgroup.org
> > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> >
> >
> > <BenchmarkResult.jpg><HDF_NoHierarchy.jpg><HDF_WithHierarchy.jpg>_______________________________________________
> > Hdf-forum is for HDF software users discussion.
> > Hdf-forum@hdfgroup.org
> > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
> <Code.txt>_______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org