Thanks Elena.
I still have some more questions. I am trying to optimize my datasets
for faster access. My observation is that the time to access complete
data increases linearly with the number of datasets. Say there is only 1
dataset in each group vs there are 10 datasets in each group. I wrote a
program to read just 1 datasets from each group from both files. First
file which contains only 1 dataset takes very less time compared to file
with 10 datasets. 2nd file is 2 -3 times slower that first file (3
seconds vs 9 seconds). Both reading same amount of data (dataset that I
read from both file contains same data).
I have repacked the file such that the chunk size equals to the
dimensions of each dataset.
This is not what I expect. Since the structure of hdf5 file is similar
to unix files, number of files should not affect the access time as long
as you are reading same amount of data. Datasets are accessed using
points. What is it that I am doing wrong? How can I maximize my reading
performance.
My data looks as follows:
Each group has 5 levels of market data.
Different formats I have tried
File 1: file has level 1 in each group. (Fastest but only provides
level1)
File 2: file has 5 datasets for each level within each group. (this is
scaling linearly. Slower than file 1 format. Even if I read only level1,
still very slow compared to file 1)
File 3: file has all 5 levels in 1 dataset (horizontally spread).
My reading access pattern: I have to read either only level1 or all 5
levels together.
I am thinking of 2 different files, 1 with level1 dataset and other with
all datasets in 1 file. I feel this is quite inefficient. I would like
to keep all the data in single file. Do you guys have any suggestions?
Alok Jadhav
GAT IT
···
From: hdf-forum-bounces@hdfgroup.org
[mailto:hdf-forum-bounces@hdfgroup.org] On Behalf Of Elena Pourmal
Sent: Monday, August 27, 2012 8:45 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] why changing the format had adverse effect
Hi Alok,
Please try to run h5stat tool
(http://www.hdfgroup.org/HDF5/doc/RM/Tools.html#Tools-Stat) to see how
space is allocated in the file for raw data and HDF5 metadata.
Elena
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On Aug 26, 2012, at 9:15 PM, alokjadhav wrote:
Hi,
could someone comment on this? I am still not sure why new format with
less
number of elements is taking so much more storage space. One more
observation is that
format 1 has around 300 groups ..each with 2 datasets. -> total 600
datasets.
format 2 has around 200 groups .. each with 11 datasets -> total of
2200
datasets.
in fromat 1, each dataset is a double array where as in format 2, each
dataset is a complex type. (doubles and ints mixed).
What is the overhead of having a complex dtype vs a double array. having
2200 datasets vs 600 datasets, can it double the size of the hdf5 file?
i am
basically converting horizontal data into vertical data with more
datasets.
Regards,
Alok
--
View this message in context:
http://hdf-forum.184993.n3.nabble.com/why-changing-the-format-had-advers
e-effect-tp4025330p4025344.html
Sent from the hdf-forum mailing list archive at Nabble.com.
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
===============================================================================
Please access the attached hyperlink for an important electronic communications disclaimer:
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
--
View this message in context: http://hdf-forum.184993.n3.nabble.com/why-changing-the-format-had-adverse-effect-tp4025330p4025351.html
Sent from the hdf-forum mailing list archive at Nabble.com.