HDF5 Circular Database

Kirk_Harrison · March 25, 2010, 8:11pm

I am interested in using HDF5 to manage sensor data within a continuous
Circular Database/File. I wish to define a database of a fixed size to
manage a finite amount of historical data. When the database file is full
(i.e. reach the defined capacity) I would like to begin overwriting the
oldest data within the file.) This is for an application for a system
where I only care about the most recent data over a specific duration with
obvious constraints on the amount of storage available.

Does HDF5 have such capability or is there a recommended
approach/suggestions anyone has?

Best Regards,
Kirk Harrison

miller86 · March 25, 2010, 8:25pm

You should be able to do that pretty easily with HDF5.

If you are absolutely certain your datasets will never, ever change in
size, you could create an 'empty' database by going through and creating
N datasets (H5Dcreate) of desired size (H5Screate_simple) but not
actually writing anything to any of the datasets.

Then, as time evolves, you pick a particular dataset to open (H5Dopen),
write to (writing afresh if the dataset has yet to be written to or
overwriting whats already there if it has already been written to --
makes no difference to the application. It just calls H5Dwrite) and
H5Dclose.

If you think you might want to be able to vary dataset size over time,
use 'chunked' datasets (H5Pset_chunk) instead of the default
(contiguous). If you need to maintain other tidbits of information about
the datasets such as time of acquisition, sensor # (whatever), and that
data is 'small' (<16kb), attach attributes (H5Acreate) to your datasets
and overwrite those attributes as you would datasets (H5Aopen, H5Awrite,
H5Aclose).

Mark

···

On Thu, 2010-03-25 at 13:11, kharrison@shensol.com wrote:

I am interested in using HDF5 to manage sensor data within a continuous
Circular Database/File. I wish to define a database of a fixed size to
manage a finite amount of historical data. When the database file is full
(i.e. reach the defined capacity) I would like to begin overwriting the
oldest data within the file.) This is for an application for a system
where I only care about the most recent data over a specific duration with
obvious constraints on the amount of storage available.

Does HDF5 have such capability or is there a recommended
approach/suggestions anyone has?

Best Regards,
Kirk Harrison

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851

Kirk_Harrison · March 25, 2010, 9:03pm

Mark,

I am new to HDF5 and still working my way through the Tutorials. It looks
promising thus far, but have been concerned about the Circular Database
implementation.
The dataset size will be static based upon the time duration for which I
want to provide data lookup and the data output rate of the sensors. I
suppose what I need to figure out then, based on your approach, is how to
"seek" to the appropriate location (record) within the dataset for
continued writing of the data. This is probably where your suggestion of
adding an attribute (time of acquisition) comes into play.

Thanks for the reassurance and the tips,
Kirk

···

You should be able to do that pretty easily with HDF5.

If you are absolutely certain your datasets will never, ever change in
size, you could create an 'empty' database by going through and creating
N datasets (H5Dcreate) of desired size (H5Screate_simple) but not
actually writing anything to any of the datasets.

Then, as time evolves, you pick a particular dataset to open (H5Dopen),
write to (writing afresh if the dataset has yet to be written to or
overwriting whats already there if it has already been written to --
makes no difference to the application. It just calls H5Dwrite) and
H5Dclose.

If you think you might want to be able to vary dataset size over time,
use 'chunked' datasets (H5Pset_chunk) instead of the default
(contiguous). If you need to maintain other tidbits of information about
the datasets such as time of acquisition, sensor # (whatever), and that
data is 'small' (<16kb), attach attributes (H5Acreate) to your datasets
and overwrite those attributes as you would datasets (H5Aopen, H5Awrite,
H5Aclose).

Mark

On Thu, 2010-03-25 at 13:11, kharrison@shensol.com wrote:

I am interested in using HDF5 to manage sensor data within a continuous
Circular Database/File. I wish to define a database of a fixed size to
manage a finite amount of historical data. When the database file is
full
(i.e. reach the defined capacity) I would like to begin overwriting the
oldest data within the file.) This is for an application for a system
where I only care about the most recent data over a specific duration
with
obvious constraints on the amount of storage available.

Does HDF5 have such capability or is there a recommended
approach/suggestions anyone has?

Best Regards,
Kirk Harrison

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

miller86 · March 25, 2010, 9:58pm

Well, I had envisioned your 'buffer' as being a collection of datasets.

You could just have a single dataset that is the 'buffer' and then you'd
have to use hyperslabs or selections to write to just a portion of that
dataset (as Quincey already mentioned).

HTH

Mark

···

On Thu, 2010-03-25 at 14:03, kharrison@shensol.com wrote:

Mark,

I am new to HDF5 and still working my way through the Tutorials. It looks
promising thus far, but have been concerned about the Circular Database
implementation.
The dataset size will be static based upon the time duration for which I
want to provide data lookup and the data output rate of the sensors. I
suppose what I need to figure out then, based on your approach, is how to
"seek" to the appropriate location (record) within the dataset for
continued writing of the data. This is probably where your suggestion of
adding an attribute (time of acquisition) comes into play.

Thanks for the reassurance and the tips,
Kirk

> You should be able to do that pretty easily with HDF5.
>
> If you are absolutely certain your datasets will never, ever change in
> size, you could create an 'empty' database by going through and creating
> N datasets (H5Dcreate) of desired size (H5Screate_simple) but not
> actually writing anything to any of the datasets.
>
> Then, as time evolves, you pick a particular dataset to open (H5Dopen),
> write to (writing afresh if the dataset has yet to be written to or
> overwriting whats already there if it has already been written to --
> makes no difference to the application. It just calls H5Dwrite) and
> H5Dclose.
>
> If you think you might want to be able to vary dataset size over time,
> use 'chunked' datasets (H5Pset_chunk) instead of the default
> (contiguous). If you need to maintain other tidbits of information about
> the datasets such as time of acquisition, sensor # (whatever), and that
> data is 'small' (<16kb), attach attributes (H5Acreate) to your datasets
> and overwrite those attributes as you would datasets (H5Aopen, H5Awrite,
> H5Aclose).
>
> Mark
>
>
> On Thu, 2010-03-25 at 13:11, kharrison@shensol.com wrote:
>> I am interested in using HDF5 to manage sensor data within a continuous
>> Circular Database/File. I wish to define a database of a fixed size to
>> manage a finite amount of historical data. When the database file is
>> full
>> (i.e. reach the defined capacity) I would like to begin overwriting the
>> oldest data within the file.) This is for an application for a system
>> where I only care about the most recent data over a specific duration
>> with
>> obvious constraints on the amount of storage available.
>>
>> Does HDF5 have such capability or is there a recommended
>> approach/suggestions anyone has?
>>
>> Best Regards,
>> Kirk Harrison
>>
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> Hdf-forum@hdfgroup.org
>> http://**mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> --
> Mark C. Miller, Lawrence Livermore National Laboratory
> ================!!LLNL BUSINESS ONLY!!================
> miller86@llnl.gov urgent: miller86@pager.llnl.gov
> T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851

Quincey_Koziol · March 25, 2010, 9:42pm

Hi Kirk,

Mark,

I am new to HDF5 and still working my way through the Tutorials. It looks
promising thus far, but have been concerned about the Circular Database
implementation.
The dataset size will be static based upon the time duration for which I
want to provide data lookup and the data output rate of the sensors. I
suppose what I need to figure out then, based on your approach, is how to
"seek" to the appropriate location (record) within the dataset for
continued writing of the data. This is probably where your suggestion of
adding an attribute (time of acquisition) comes into play.

You probably want to use a hyperslab or point selection for writing to your dataset in the file. (H5Sselect_hyperslab() or H5Sselect_elemets())

Quincey

···

On Mar 25, 2010, at 4:03 PM, kharrison@shensol.com wrote:

Thanks for the reassurance and the tips,
Kirk

You should be able to do that pretty easily with HDF5.

If you are absolutely certain your datasets will never, ever change in
size, you could create an 'empty' database by going through and creating
N datasets (H5Dcreate) of desired size (H5Screate_simple) but not
actually writing anything to any of the datasets.

Then, as time evolves, you pick a particular dataset to open (H5Dopen),
write to (writing afresh if the dataset has yet to be written to or
overwriting whats already there if it has already been written to --
makes no difference to the application. It just calls H5Dwrite) and
H5Dclose.

If you think you might want to be able to vary dataset size over time,
use 'chunked' datasets (H5Pset_chunk) instead of the default
(contiguous). If you need to maintain other tidbits of information about
the datasets such as time of acquisition, sensor # (whatever), and that
data is 'small' (<16kb), attach attributes (H5Acreate) to your datasets
and overwrite those attributes as you would datasets (H5Aopen, H5Awrite,
H5Aclose).

Mark

On Thu, 2010-03-25 at 13:11, kharrison@shensol.com wrote:

I am interested in using HDF5 to manage sensor data within a continuous
Circular Database/File. I wish to define a database of a fixed size to
manage a finite amount of historical data. When the database file is
full
(i.e. reach the defined capacity) I would like to begin overwriting the
oldest data within the file.) This is for an application for a system
where I only care about the most recent data over a specific duration
with
obvious constraints on the amount of storage available.

Does HDF5 have such capability or is there a recommended
approach/suggestions anyone has?

Best Regards,
Kirk Harrison

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Kirk_Harrison · March 26, 2010, 6:02pm

Mark and Quincy,

Thanks! I will look into Hyperslabs as well. I finally located a reference
under HDF5 Advanced Topics.
I have multiple streams of time series data that result from different types
of processing from the system. The data differs such that I will probably
try several approaches with each stream in an attempt to optimize
performance. In the past I have manually programmed this type of binary
file-based solution and am eager to see what capability and performance I
can get out of HDF5 for this type of domain. (I also have an associate
independently evaluating MySQL for comparison.)

Kirk

···

-----Original Message-----
From: Mark Miller [mailto:miller86@llnl.gov]
Sent: Thursday, March 25, 2010 5:59 PM
To: kharrison@shensol.com
Cc: HDF Users Discussion List
Subject: Re: [Hdf-forum] HDF5 Circular Database

Well, I had envisioned your 'buffer' as being a collection of datasets.

You could just have a single dataset that is the 'buffer' and then you'd
have to use hyperslabs or selections to write to just a portion of that
dataset (as Quincey already mentioned).

HTH

Mark

On Thu, 2010-03-25 at 14:03, kharrison@shensol.com wrote:

Mark,

I am new to HDF5 and still working my way through the Tutorials. It looks
promising thus far, but have been concerned about the Circular Database
implementation.
The dataset size will be static based upon the time duration for which I
want to provide data lookup and the data output rate of the sensors. I
suppose what I need to figure out then, based on your approach, is how to
"seek" to the appropriate location (record) within the dataset for
continued writing of the data. This is probably where your suggestion of
adding an attribute (time of acquisition) comes into play.

Thanks for the reassurance and the tips,
Kirk

> You should be able to do that pretty easily with HDF5.
>
> If you are absolutely certain your datasets will never, ever change in
> size, you could create an 'empty' database by going through and creating
> N datasets (H5Dcreate) of desired size (H5Screate_simple) but not
> actually writing anything to any of the datasets.
>
> Then, as time evolves, you pick a particular dataset to open (H5Dopen),
> write to (writing afresh if the dataset has yet to be written to or
> overwriting whats already there if it has already been written to --
> makes no difference to the application. It just calls H5Dwrite) and
> H5Dclose.
>
> If you think you might want to be able to vary dataset size over time,
> use 'chunked' datasets (H5Pset_chunk) instead of the default
> (contiguous). If you need to maintain other tidbits of information about
> the datasets such as time of acquisition, sensor # (whatever), and that
> data is 'small' (<16kb), attach attributes (H5Acreate) to your datasets
> and overwrite those attributes as you would datasets (H5Aopen, H5Awrite,
> H5Aclose).
>
> Mark
>
>
> On Thu, 2010-03-25 at 13:11, kharrison@shensol.com wrote:
>> I am interested in using HDF5 to manage sensor data within a continuous
>> Circular Database/File. I wish to define a database of a fixed size to
>> manage a finite amount of historical data. When the database file is
>> full
>> (i.e. reach the defined capacity) I would like to begin overwriting the
>> oldest data within the file.) This is for an application for a system
>> where I only care about the most recent data over a specific duration
>> with
>> obvious constraints on the amount of storage available.
>>
>> Does HDF5 have such capability or is there a recommended
>> approach/suggestions anyone has?
>>
>> Best Regards,
>> Kirk Harrison
>>
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> Hdf-forum@hdfgroup.org
>> http://**mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> --
> Mark C. Miller, Lawrence Livermore National Laboratory
> ================!!LLNL BUSINESS ONLY!!================
> miller86@llnl.gov urgent: miller86@pager.llnl.gov
> T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851

Kirk_Harrison · April 16, 2010, 4:29pm

To account for possible gaps of data within the stream I need to have a way
of indexing blocks of data within the (single) dataset that I write the data
to. (I elected to use a fixed contiguous dataset approach as opposed to a
dynamically sized one using Chunks so that I can better manage the diskspace
and circular buffer.)

I am in the process of setting up an (dynamic/chunked) indexing dataset to
access the dataset used to capture the datastream. What I envision is each
record in the index table containing elements such as:
- Start_time
- Stop_time
- Num_Records
- Reference (??? See question 3 below)
Each index record would be use to describe a region in the continuous
dataset used to capture the streamed data (which would further be used by a
client to set up hyperslabs to request specific groups of data.)

I am still in the process of learning about HDF5 Links. I was thinking I
might be able to simply have the index table contain soft links to the
stream dataset with possibly properties (Start_time, Stop_time, Num_Records,
etc...)

With all of this being said:
1) Is there a better way to do this within HDF5 (i.e., some built-in
capability to index in this fashion which I have yet to discover)
2) Can links be even placed in a table like this (point to a specific record
in a dataset)
3) What is recommended mechanism for "referencing" a particular record
within a dataset

Kirk

···

-----Original Message-----
From: Mark Miller [mailto:miller86@llnl.gov]
Sent: Friday, March 26, 2010 3:14 PM
To: Kirk Harrison
Subject: RE: [Hdf-forum] HDF5 Circular Database

If you encounter serious performance issues at the I/O level, I'd be
interested to know and may have some suggestions for improvement if you
do.

Mark

On Fri, 2010-03-26 at 11:02, Kirk Harrison wrote:

Mark and Quincy,

Thanks! I will look into Hyperslabs as well. I finally located a reference
under HDF5 Advanced Topics.
I have multiple streams of time series data that result from different

types

of processing from the system. The data differs such that I will probably
try several approaches with each stream in an attempt to optimize
performance. In the past I have manually programmed this type of binary
file-based solution and am eager to see what capability and performance I
can get out of HDF5 for this type of domain. (I also have an associate
independently evaluating MySQL for comparison.)

Kirk

-----Original Message-----
From: Mark Miller [mailto:miller86@llnl.gov]
Sent: Thursday, March 25, 2010 5:59 PM
To: kharrison@shensol.com
Cc: HDF Users Discussion List
Subject: Re: [Hdf-forum] HDF5 Circular Database

Well, I had envisioned your 'buffer' as being a collection of datasets.

You could just have a single dataset that is the 'buffer' and then you'd
have to use hyperslabs or selections to write to just a portion of that
dataset (as Quincey already mentioned).

HTH

Mark

On Thu, 2010-03-25 at 14:03, kharrison@shensol.com wrote:
> Mark,
>
> I am new to HDF5 and still working my way through the Tutorials. It

looks

> promising thus far, but have been concerned about the Circular Database
> implementation.
> The dataset size will be static based upon the time duration for which I
> want to provide data lookup and the data output rate of the sensors. I
> suppose what I need to figure out then, based on your approach, is how

to

> "seek" to the appropriate location (record) within the dataset for
> continued writing of the data. This is probably where your suggestion of
> adding an attribute (time of acquisition) comes into play.
>
> Thanks for the reassurance and the tips,
> Kirk
>
> > You should be able to do that pretty easily with HDF5.
> >
> > If you are absolutely certain your datasets will never, ever change in
> > size, you could create an 'empty' database by going through and

creating

> > N datasets (H5Dcreate) of desired size (H5Screate_simple) but not
> > actually writing anything to any of the datasets.
> >
> > Then, as time evolves, you pick a particular dataset to open

(H5Dopen),

> > write to (writing afresh if the dataset has yet to be written to or
> > overwriting whats already there if it has already been written to --
> > makes no difference to the application. It just calls H5Dwrite) and
> > H5Dclose.
> >
> > If you think you might want to be able to vary dataset size over time,
> > use 'chunked' datasets (H5Pset_chunk) instead of the default
> > (contiguous). If you need to maintain other tidbits of information

about

> > the datasets such as time of acquisition, sensor # (whatever), and

that

> > data is 'small' (<16kb), attach attributes (H5Acreate) to your

datasets

> > and overwrite those attributes as you would datasets (H5Aopen,

H5Awrite,

> > H5Aclose).
> >
> > Mark
> >
> >
> > On Thu, 2010-03-25 at 13:11, kharrison@shensol.com wrote:
> >> I am interested in using HDF5 to manage sensor data within a

continuous

> >> Circular Database/File. I wish to define a database of a fixed size

to

> >> manage a finite amount of historical data. When the database file is
> >> full
> >> (i.e. reach the defined capacity) I would like to begin overwriting

the

> >> oldest data within the file.) This is for an application for a system
> >> where I only care about the most recent data over a specific duration
> >> with
> >> obvious constraints on the amount of storage available.
> >>
> >> Does HDF5 have such capability or is there a recommended
> >> approach/suggestions anyone has?
> >>
> >> Best Regards,
> >> Kirk Harrison
> >>
> >>
> >>
> >> _______________________________________________
> >> Hdf-forum is for HDF software users discussion.
> >> Hdf-forum@hdfgroup.org
> >> http://***mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> > --
> > Mark C. Miller, Lawrence Livermore National Laboratory
> > ================!!LLNL BUSINESS ONLY!!================
> > miller86@llnl.gov urgent: miller86@pager.llnl.gov
> > T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851
> >
> >
> > _______________________________________________
> > Hdf-forum is for HDF software users discussion.
> > Hdf-forum@hdfgroup.org
> > http://**mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> >
>

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851

Quincey_Koziol · April 16, 2010, 6:11pm

Hi Kirk,

To account for possible gaps of data within the stream I need to have a way
of indexing blocks of data within the (single) dataset that I write the data
to. (I elected to use a fixed contiguous dataset approach as opposed to a
dynamically sized one using Chunks so that I can better manage the diskspace
and circular buffer.)

I am in the process of setting up an (dynamic/chunked) indexing dataset to
access the dataset used to capture the datastream. What I envision is each
record in the index table containing elements such as:
- Start_time
- Stop_time
- Num_Records
- Reference (??? See question 3 below)
Each index record would be use to describe a region in the continuous
dataset used to capture the streamed data (which would further be used by a
client to set up hyperslabs to request specific groups of data.)

I am still in the process of learning about HDF5 Links. I was thinking I
might be able to simply have the index table contain soft links to the
stream dataset with possibly properties (Start_time, Stop_time, Num_Records,
etc...)

With all of this being said:
1) Is there a better way to do this within HDF5 (i.e., some built-in
capability to index in this fashion which I have yet to discover)
2) Can links be even placed in a table like this (point to a specific record
in a dataset)
3) What is recommended mechanism for "referencing" a particular record
within a dataset

I think the answer to all three questions is: you should use a dataset region reference for this purpose (http://www.hdfgroup.org/HDF5/doc/RM/RM_H5R.html#Reference-Create\).

Quincey

···

On Apr 16, 2010, at 11:29 AM, Kirk Harrison wrote:

Kirk

-----Original Message-----
From: Mark Miller [mailto:miller86@llnl.gov]
Sent: Friday, March 26, 2010 3:14 PM
To: Kirk Harrison
Subject: RE: [Hdf-forum] HDF5 Circular Database

If you encounter serious performance issues at the I/O level, I'd be
interested to know and may have some suggestions for improvement if you
do.

Mark

On Fri, 2010-03-26 at 11:02, Kirk Harrison wrote:

Mark and Quincy,

Thanks! I will look into Hyperslabs as well. I finally located a reference
under HDF5 Advanced Topics.
I have multiple streams of time series data that result from different

types

of processing from the system. The data differs such that I will probably
try several approaches with each stream in an attempt to optimize
performance. In the past I have manually programmed this type of binary
file-based solution and am eager to see what capability and performance I
can get out of HDF5 for this type of domain. (I also have an associate
independently evaluating MySQL for comparison.)

Kirk

-----Original Message-----
From: Mark Miller [mailto:miller86@llnl.gov]
Sent: Thursday, March 25, 2010 5:59 PM
To: kharrison@shensol.com
Cc: HDF Users Discussion List
Subject: Re: [Hdf-forum] HDF5 Circular Database

Well, I had envisioned your 'buffer' as being a collection of datasets.

You could just have a single dataset that is the 'buffer' and then you'd
have to use hyperslabs or selections to write to just a portion of that
dataset (as Quincey already mentioned).

HTH

Mark

On Thu, 2010-03-25 at 14:03, kharrison@shensol.com wrote:

Mark,

I am new to HDF5 and still working my way through the Tutorials. It

looks

promising thus far, but have been concerned about the Circular Database
implementation.
The dataset size will be static based upon the time duration for which I
want to provide data lookup and the data output rate of the sensors. I
suppose what I need to figure out then, based on your approach, is how

to

"seek" to the appropriate location (record) within the dataset for
continued writing of the data. This is probably where your suggestion of
adding an attribute (time of acquisition) comes into play.

Thanks for the reassurance and the tips,
Kirk

You should be able to do that pretty easily with HDF5.

If you are absolutely certain your datasets will never, ever change in
size, you could create an 'empty' database by going through and

creating

N datasets (H5Dcreate) of desired size (H5Screate_simple) but not
actually writing anything to any of the datasets.

Then, as time evolves, you pick a particular dataset to open

(H5Dopen),

write to (writing afresh if the dataset has yet to be written to or
overwriting whats already there if it has already been written to --
makes no difference to the application. It just calls H5Dwrite) and
H5Dclose.

If you think you might want to be able to vary dataset size over time,
use 'chunked' datasets (H5Pset_chunk) instead of the default
(contiguous). If you need to maintain other tidbits of information

about

the datasets such as time of acquisition, sensor # (whatever), and

that

data is 'small' (<16kb), attach attributes (H5Acreate) to your

datasets

and overwrite those attributes as you would datasets (H5Aopen,

H5Awrite,

H5Aclose).

Mark

On Thu, 2010-03-25 at 13:11, kharrison@shensol.com wrote:

I am interested in using HDF5 to manage sensor data within a

continuous

Circular Database/File. I wish to define a database of a fixed size

to

manage a finite amount of historical data. When the database file is
full
(i.e. reach the defined capacity) I would like to begin overwriting

the

oldest data within the file.) This is for an application for a system
where I only care about the most recent data over a specific duration
with
obvious constraints on the amount of storage available.

Does HDF5 have such capability or is there a recommended
approach/suggestions anyone has?

Best Regards,
Kirk Harrison

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://***mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://**mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851

Kirk_Harrison · April 16, 2010, 7:58pm

Quincey,

Thanks for the tip. A quick read of "HDF5 Dataset Region References" does
look promising.

Would you say the main benefit of Region References is more direct (i.e.
efficient) construction of the related Hyperslabs upon a data retrieval?
Perhaps versus saving a start/stop element number within the index element
and having to build a Hyperslab region from that information alone?

Kirk

···

Hi Kirk,

On Apr 16, 2010, at 11:29 AM, Kirk Harrison wrote:

To account for possible gaps of data within the stream I need to have a
way
of indexing blocks of data within the (single) dataset that I write the
data
to. (I elected to use a fixed contiguous dataset approach as opposed to
a
dynamically sized one using Chunks so that I can better manage the
diskspace
and circular buffer.)

I am in the process of setting up an (dynamic/chunked) indexing dataset
to
access the dataset used to capture the datastream. What I envision is
each
record in the index table containing elements such as:
- Start_time
- Stop_time
- Num_Records
- Reference (??? See question 3 below)
Each index record would be use to describe a region in the continuous
dataset used to capture the streamed data (which would further be used
by a
client to set up hyperslabs to request specific groups of data.)

I am still in the process of learning about HDF5 Links. I was thinking I
might be able to simply have the index table contain soft links to the
stream dataset with possibly properties (Start_time, Stop_time,
Num_Records,
etc...)

With all of this being said:
1) Is there a better way to do this within HDF5 (i.e., some built-in
capability to index in this fashion which I have yet to discover)
2) Can links be even placed in a table like this (point to a specific
record
in a dataset)
3) What is recommended mechanism for "referencing" a particular record
within a dataset

I think the answer to all three questions is: you should use a dataset
region reference for this purpose
(http://www.hdfgroup.org/HDF5/doc/RM/RM_H5R.html#Reference-Create\).

Quincey

Kirk

-----Original Message-----
From: Mark Miller [mailto:miller86@llnl.gov]
Sent: Friday, March 26, 2010 3:14 PM
To: Kirk Harrison
Subject: RE: [Hdf-forum] HDF5 Circular Database

If you encounter serious performance issues at the I/O level, I'd be
interested to know and may have some suggestions for improvement if you
do.

Mark

On Fri, 2010-03-26 at 11:02, Kirk Harrison wrote:

Mark and Quincy,

Thanks! I will look into Hyperslabs as well. I finally located a
reference
under HDF5 Advanced Topics.
I have multiple streams of time series data that result from different

types

of processing from the system. The data differs such that I will
probably
try several approaches with each stream in an attempt to optimize
performance. In the past I have manually programmed this type of binary
file-based solution and am eager to see what capability and performance
I
can get out of HDF5 for this type of domain. (I also have an associate
independently evaluating MySQL for comparison.)

Kirk

-----Original Message-----
From: Mark Miller [mailto:miller86@llnl.gov]
Sent: Thursday, March 25, 2010 5:59 PM
To: kharrison@shensol.com
Cc: HDF Users Discussion List
Subject: Re: [Hdf-forum] HDF5 Circular Database

Well, I had envisioned your 'buffer' as being a collection of datasets.

You could just have a single dataset that is the 'buffer' and then
you'd
have to use hyperslabs or selections to write to just a portion of that
dataset (as Quincey already mentioned).

HTH

Mark

On Thu, 2010-03-25 at 14:03, kharrison@shensol.com wrote:

Mark,

I am new to HDF5 and still working my way through the Tutorials. It

looks

promising thus far, but have been concerned about the Circular
Database
implementation.
The dataset size will be static based upon the time duration for which
I
want to provide data lookup and the data output rate of the sensors. I
suppose what I need to figure out then, based on your approach, is how

to

"seek" to the appropriate location (record) within the dataset for
continued writing of the data. This is probably where your suggestion
of
adding an attribute (time of acquisition) comes into play.

Thanks for the reassurance and the tips,
Kirk

You should be able to do that pretty easily with HDF5.

If you are absolutely certain your datasets will never, ever change
in
size, you could create an 'empty' database by going through and

creating

N datasets (H5Dcreate) of desired size (H5Screate_simple) but not
actually writing anything to any of the datasets.

Then, as time evolves, you pick a particular dataset to open

(H5Dopen),

write to (writing afresh if the dataset has yet to be written to or
overwriting whats already there if it has already been written to --
makes no difference to the application. It just calls H5Dwrite) and
H5Dclose.

If you think you might want to be able to vary dataset size over
time,
use 'chunked' datasets (H5Pset_chunk) instead of the default
(contiguous). If you need to maintain other tidbits of information

about

the datasets such as time of acquisition, sensor # (whatever), and

that

data is 'small' (<16kb), attach attributes (H5Acreate) to your

datasets

and overwrite those attributes as you would datasets (H5Aopen,

H5Awrite,

H5Aclose).

Mark

On Thu, 2010-03-25 at 13:11, kharrison@shensol.com wrote:

I am interested in using HDF5 to manage sensor data within a

continuous

Circular Database/File. I wish to define a database of a fixed size

to

manage a finite amount of historical data. When the database file is
full
(i.e. reach the defined capacity) I would like to begin overwriting

the

oldest data within the file.) This is for an application for a
system
where I only care about the most recent data over a specific
duration
with
obvious constraints on the amount of storage available.

Does HDF5 have such capability or is there a recommended
approach/suggestions anyone has?

Best Regards,
Kirk Harrison

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://***mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://**mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Quincey_Koziol · April 19, 2010, 7:56pm

Hi Kirk,

Quincey,

Thanks for the tip. A quick read of "HDF5 Dataset Region References" does
look promising.

Would you say the main benefit of Region References is more direct (i.e.
efficient) construction of the related Hyperslabs upon a data retrieval?
Perhaps versus saving a start/stop element number within the index element
and having to build a Hyperslab region from that information alone?

Yes, definitely.

Quincey

···

On Apr 16, 2010, at 2:58 PM, kharrison@shensol.com wrote:

Kirk

Hi Kirk,

On Apr 16, 2010, at 11:29 AM, Kirk Harrison wrote:

To account for possible gaps of data within the stream I need to have a
way
of indexing blocks of data within the (single) dataset that I write the
data
to. (I elected to use a fixed contiguous dataset approach as opposed to
a
dynamically sized one using Chunks so that I can better manage the
diskspace
and circular buffer.)

I am in the process of setting up an (dynamic/chunked) indexing dataset
to
access the dataset used to capture the datastream. What I envision is
each
record in the index table containing elements such as:
- Start_time
- Stop_time
- Num_Records
- Reference (??? See question 3 below)
Each index record would be use to describe a region in the continuous
dataset used to capture the streamed data (which would further be used
by a
client to set up hyperslabs to request specific groups of data.)

I am still in the process of learning about HDF5 Links. I was thinking I
might be able to simply have the index table contain soft links to the
stream dataset with possibly properties (Start_time, Stop_time,
Num_Records,
etc...)

With all of this being said:
1) Is there a better way to do this within HDF5 (i.e., some built-in
capability to index in this fashion which I have yet to discover)
2) Can links be even placed in a table like this (point to a specific
record
in a dataset)
3) What is recommended mechanism for "referencing" a particular record
within a dataset

I think the answer to all three questions is: you should use a dataset
region reference for this purpose
(http://www.hdfgroup.org/HDF5/doc/RM/RM_H5R.html#Reference-Create\).

Quincey

Kirk

-----Original Message-----
From: Mark Miller [mailto:miller86@llnl.gov]
Sent: Friday, March 26, 2010 3:14 PM
To: Kirk Harrison
Subject: RE: [Hdf-forum] HDF5 Circular Database

If you encounter serious performance issues at the I/O level, I'd be
interested to know and may have some suggestions for improvement if you
do.

Mark

On Fri, 2010-03-26 at 11:02, Kirk Harrison wrote:

Mark and Quincy,

Thanks! I will look into Hyperslabs as well. I finally located a
reference
under HDF5 Advanced Topics.
I have multiple streams of time series data that result from different

types

of processing from the system. The data differs such that I will
probably
try several approaches with each stream in an attempt to optimize
performance. In the past I have manually programmed this type of binary
file-based solution and am eager to see what capability and performance
I
can get out of HDF5 for this type of domain. (I also have an associate
independently evaluating MySQL for comparison.)

Kirk

-----Original Message-----
From: Mark Miller [mailto:miller86@llnl.gov]
Sent: Thursday, March 25, 2010 5:59 PM
To: kharrison@shensol.com
Cc: HDF Users Discussion List
Subject: Re: [Hdf-forum] HDF5 Circular Database

Well, I had envisioned your 'buffer' as being a collection of datasets.

You could just have a single dataset that is the 'buffer' and then
you'd
have to use hyperslabs or selections to write to just a portion of that
dataset (as Quincey already mentioned).

HTH

Mark

On Thu, 2010-03-25 at 14:03, kharrison@shensol.com wrote:

Mark,

I am new to HDF5 and still working my way through the Tutorials. It

looks

promising thus far, but have been concerned about the Circular
Database
implementation.
The dataset size will be static based upon the time duration for which
I
want to provide data lookup and the data output rate of the sensors. I
suppose what I need to figure out then, based on your approach, is how

to

"seek" to the appropriate location (record) within the dataset for
continued writing of the data. This is probably where your suggestion
of
adding an attribute (time of acquisition) comes into play.

Thanks for the reassurance and the tips,
Kirk

You should be able to do that pretty easily with HDF5.

If you are absolutely certain your datasets will never, ever change
in
size, you could create an 'empty' database by going through and

creating

N datasets (H5Dcreate) of desired size (H5Screate_simple) but not
actually writing anything to any of the datasets.

Then, as time evolves, you pick a particular dataset to open

(H5Dopen),

write to (writing afresh if the dataset has yet to be written to or
overwriting whats already there if it has already been written to --
makes no difference to the application. It just calls H5Dwrite) and
H5Dclose.

If you think you might want to be able to vary dataset size over
time,
use 'chunked' datasets (H5Pset_chunk) instead of the default
(contiguous). If you need to maintain other tidbits of information

about

the datasets such as time of acquisition, sensor # (whatever), and

that

data is 'small' (<16kb), attach attributes (H5Acreate) to your

datasets

and overwrite those attributes as you would datasets (H5Aopen,

H5Awrite,

H5Aclose).

Mark

On Thu, 2010-03-25 at 13:11, kharrison@shensol.com wrote:

I am interested in using HDF5 to manage sensor data within a

continuous

Circular Database/File. I wish to define a database of a fixed size

to

manage a finite amount of historical data. When the database file is
full
(i.e. reach the defined capacity) I would like to begin overwriting

the

oldest data within the file.) This is for an application for a
system
where I only care about the most recent data over a specific
duration
with
obvious constraints on the amount of storage available.

Does HDF5 have such capability or is there a recommended
approach/suggestions anyone has?

Best Regards,
Kirk Harrison

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://***mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://**mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

HDF5 Circular Database