hdf5 and data indexing

magawake · June 18, 2009, 11:59am

Currently we store historical research data at our engineering lab.
The data is formatted this way on the UNIX filesystem:

$YYYY/$MM/$DD/traversal.tsv.gz

Each compressed traversal.tsv is about 3gig, and we have about 500 of
these files for each day.

For example:
2009/01/01/traversal.tsv.gz
2009/01/02/traversal.tsv.gz
2009/01/03/traversal.tsv.gz
2009/01/04/traversal.tsv.gz
2009/01/05/traversal.tsv.gz

I am using HDF5's group to create the same Unix filesystem structure
in my hdf5 file, but will I be able to access data quickly if I place
some sort of index? Instead of reading the entire hdf5 file?

···

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Quincey_Koziol · June 18, 2009, 12:48pm

Hi Mag,

···

On Jun 18, 2009, at 6:59 AM, Mag Gam wrote:

Currently we store historical research data at our engineering lab.
The data is formatted this way on the UNIX filesystem:

$YYYY/$MM/$DD/traversal.tsv.gz

Each compressed traversal.tsv is about 3gig, and we have about 500 of
these files for each day.

For example:
2009/01/01/traversal.tsv.gz
2009/01/02/traversal.tsv.gz
2009/01/03/traversal.tsv.gz
2009/01/04/traversal.tsv.gz
2009/01/05/traversal.tsv.gz

I am using HDF5's group to create the same Unix filesystem structure
in my hdf5 file, but will I be able to access data quickly if I place
some sort of index? Instead of reading the entire hdf5 file?

I'm a little fuzzy about what you are asking here, but each object in a group can be located and accessed without reading in the entire HDF5 file. Is that what you mean?

Quincey

mike.jackson · June 18, 2009, 1:14pm

You can also get a list of the members of each group which would help you figure out how many traversal.tsv.gz objects there are for a certain day. Does that help?

···

---
Mike Jackson www.bluequartz.net

On Jun 18, 2009, at 7:59 AM, Mag Gam wrote:

Currently we store historical research data at our engineering lab.
The data is formatted this way on the UNIX filesystem:

$YYYY/$MM/$DD/traversal.tsv.gz

Each compressed traversal.tsv is about 3gig, and we have about 500 of
these files for each day.

For example:
2009/01/01/traversal.tsv.gz
2009/01/02/traversal.tsv.gz
2009/01/03/traversal.tsv.gz
2009/01/04/traversal.tsv.gz
2009/01/05/traversal.tsv.gz

I am using HDF5's group to create the same Unix filesystem structure
in my hdf5 file, but will I be able to access data quickly if I place
some sort of index? Instead of reading the entire hdf5 file?

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

magawake · June 18, 2009, 11:50pm

Thanks for the responses.

Quincey,

thats exactly what I was asking for. I will give it a try.

···

On Thu, Jun 18, 2009 at 9:14 AM, Michael Jackson<mike.jackson@bluequartz.net> wrote:

You can also get a list of the members of each group which would help you
figure out how many traversal.tsv.gz objects there are for a certain day.
Does that help?

---
Mike Jackson www.bluequartz.net

On Jun 18, 2009, at 7:59 AM, Mag Gam wrote:

Currently we store historical research data at our engineering lab.
The data is formatted this way on the UNIX filesystem:

$YYYY/$MM/$DD/traversal.tsv.gz

Each compressed traversal.tsv is about 3gig, and we have about 500 of
these files for each day.

For example:
2009/01/01/traversal.tsv.gz
2009/01/02/traversal.tsv.gz
2009/01/03/traversal.tsv.gz
2009/01/04/traversal.tsv.gz
2009/01/05/traversal.tsv.gz

I am using HDF5's group to create the same Unix filesystem structure
in my hdf5 file, but will I be able to access data quickly if I place
some sort of index? Instead of reading the entire hdf5 file?

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to
hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

hdf5 and data indexing