How to best organize this data

Tuan · February 6, 2010, 2:08am

Hi all,
I'm a newbie to HDF5. I need to measure the data from N stations as a
time series data. So, at a single time instance, I have N data elements.
My question is how I organized this data for best storage, in terms of
memory saving and data access.
Here is what i think

- a single dataset, organize as a 2-dimensional array N-by-unlimitedsize.
- a single dataset, organize as a 2-dimensional array unlimitedsize-by-N
- N dataset, each dataset is a vector of unlimitedsize to store the time
series data for each station.
what is your suggestion?

Thank you,
Tuan

miller86 · February 6, 2010, 5:37pm

Hello Tuan,

Part of the answer to your question I think depends on what you hope to
be able to 'easily' (e.g. with HDF5 lib itself as opposed to your own
custom code on top of HDF5) do downstream in post-processing software
like a Viz tool or something and/or how your data acquisition scenario
could change with time. For example, do you expect to be able to add and
remove 'stations' over the course of the data collection process? If so,
then even having the 'N' dimension be an 'unlimited' dimension (I think
HDF5 can have more than 1 unlimited dimension) might be useful. Do 2D
hyperslab requests for data make sense in any post-processing scenario?
I mean, can you imagine selecting a subset of stations and a subset of
times to read data from and do you expect that to be a common occurrence
and/or require high performance? If so, a 2D dataset of some kinds is
probably going to be more useful to you than a collection of 1D
datasets. You'd have to write more custom software in the latter case.

How much total data do you expect to be collecting and do you intend to
be able to maybe move only 'some' of it around between platforms? I
mean, do you expect all data over all time to be able to reasonably fit
into a single HDF5 file and would such a file be of a 'convenient' or
'manageable' size? If not, then you might want to consider storing data
in multiple HDF5 files. In that case, I might consider each 'station' a
separate file.

Overall, baring some of the more exotic issues I mention above, I would
go with the first option your describe, N x unlimited.

Good luck.

Mark

···

On Fri, 2010-02-05 at 18:08, Hoang Trong Minh Tuan wrote:

Hi all,
I'm a newbie to HDF5. I need to measure the data from N stations
as a time series data. So, at a single time instance, I have N data
elements.
My question is how I organized this data for best storage, in terms of
memory saving and data access.
Here is what i think

- a single dataset, organize as a 2-dimensional array
N-by-unlimitedsize.
- a single dataset, organize as a 2-dimensional array
unlimitedsize-by-N
- N dataset, each dataset is a vector of unlimitedsize to store the
time series data for each station.
what is your suggestion?

Thank you,
Tuan

______________________________________________________________________
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

How to best organize this data