Anupam Gopal
Energy Application & Systems Engineering
GE Energy
T518-385-4586
F 518-385-5703
E anupam.gopal@ge.com
http://www.gepsec.com
General Electric International, Inc.
-----Original Message-----
From: Quincey Koziol [mailto:koziol@hdfgroup.org]
Sent: Tuesday, June 03, 2008 9:21 AM
To: Gopal, Anupam (GE Infra, Energy)
Cc: hdf-forum@hdfgroup.org
Subject: Re: [hdf-forum] H5 read function timing discrepancy
Hi Anupam,
On Jun 3, 2008, at 8:08 AM, Gopal, Anupam (GE Infra, Energy) wrote:
Yes that's a good point let me try that, but just to let you know, I
write data to the XY plane, Each plane is chunked. so for this
dataset
of (3151*5*5162), I write 5162 times each time adding a chunk of
3151*5. This was about writing to the dataset. While retrieving the
data, I retrieve a single line in the Z direction, that is an array
of
size (1*5162). I assume this means I am am accessing data which is
scattered across 5162 chunks, right ??. But the real question is I am
doing the same for each of these datasets, so if there is a delay in
reading then it should show up in both the cases.
However yesterday I tried to plot the initial size of the dataset
with
the time it takes to retrieve the data, and it seems like as the
initial size of the datasets increase, the time taken to retrieve the
data (same amount) also increases proportionately.
I agree with an earlier comment in this thread: it's probably
related to your chunk sizes. What are the dimensions for chunks in
each
of these datasets? Since HDF5 [generally] accesses entire chunks at a
time (i.e. bringing each chunk with elements to access into memory and
extracting necessary elements from it), if the chunks are different
sizes, then the I/O times will be different.
Quincey
<image001.gif>
Anupam Gopal
Energy Application & Systems Engineering
GE Energy
T518-385-4586
F 518-385-5703
E anupam.gopal@ge.com
http://www.gepsec.com
General Electric International, Inc.
From: Ray Burkholder [mailto:ray@oneunified.net]
Sent: Monday, June 02, 2008 5:45 PM
To: hdf-forum@hdfgroup.org
Subject: RE: [hdf-forum] H5 read function timing discrepancy
I don't know if it has anything to do with it, but it depends upon
how
things are stored and which points your hyperslab retrieves. If you
hyperslab retrieves points which are scattered in a number of
different compressed chunks, each one of those chunks needs to be
decompressed, then the data accessed.
Hence, one way to confirm if it is a compression thing or something
else is to remove it, if you can.
If you are running this on windows with VS, you could use the
profiling utility to see in which routine most of the processing time
is taking place. That may help to track down the culprit.
I think Unix/Linux have profilers of one fashion or another as well.
And which version of HDF5 are you using?
From: Gopal, Anupam (GE Infra, Energy) [mailto:anupam.gopal@ge.com]
Sent: Monday, June 02, 2008 18:30
To: Ray Burkholder; hdf-forum@hdfgroup.org
Subject: RE: [hdf-forum] H5 read function timing discrepancy
Thanks for your reply. To answer ur question, No I havent, I did not
see any point in doing that, as both are generated using the same
code. and hence have identical properties. If compression or
chunking makes one of them slower to read, then it should have the
same effect on the other one. The reason one is 10 times bigger than
the other is because it has 10 times more data than the other. It has
nothing to do with compression. any other ideas ??
<image001.gif>
Anupam Gopal
Energy Application & Systems Engineering
GE Energy
T518-385-4586
F 518-385-5703
E anupam.gopal@ge.com
http://www.gepsec.com
General Electric International, Inc.
From: Ray Burkholder [mailto:ray@oneunified.net]
Sent: Monday, June 02, 2008 5:17 PM
To: hdf-forum@hdfgroup.org
Subject: RE: [hdf-forum] H5 read function timing discrepancy
Have you tried taking compression off and comparing that way? One
set
appears to be 10x bigger than the other, which is approx. your
time difference. Perhaps decompression, depending upon how you've
chunked and distributed values, may be the culprit. From: Gopal,
Anupam (GE Infra, Energy) [mailto:anupam.gopal@ge.com]
Sent: Monday, June 02, 2008 18:09
To: hdf-forum@hdfgroup.org
Subject: [hdf-forum] H5 read function timing discrepancy
Hi,
I have observed some anomalies in the HDF5 read functionality for
which I do not have any explanation, I was hoping if anyone of have
an
answer to my question or have experienced it before.
Here it goes:
I have two datasets. They are identical ( chunking, compression,
allocation time, dimension, data type). The only difference is in the
dimension size one of them is (130*5*12404) and the other one is
(3151*5*5162). I read the same hyperslab (same offset and number of
elements). But one takes on an average 450 mili seconds and the other
one 6000 mili sec. Any idea why ??
Please let me know if it makes sense to anyone !!
Regards,
<image001.gif>
Anupam Gopal
Energy Application & Systems Engineering
GE Energy
T518-385-4586
F 518-385-5703
E anupam.gopal@ge.com
http://www.gepsec.com
General Electric International, Inc.
--
Scanned for viruses & dangerous content at One Unified and is
believed
to be clean.
--
Scanned for viruses & dangerous content at One Unified and is
believed
to be clean.
--
Scanned for viruses & dangerous content at One Unified and is
believed
to be clean.
--
Scanned for viruses & dangerous content at One Unified and is
believed
to be clean.
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to
hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org
.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.