How to best save 2D data with time information

Hi all,
   I am doing simulation in which I need to keep track of time information
and 2d/3d data at each time step ( I may have more than one arrays). My
question is what is the best way to store such data. Should I keep 2
separate dataset, one to store time, and one to store 2d/3d data; or I can
combine them into a special dataset (which is I don't know)?
   Thanks a lot,

Tuan

Hi Tuan,

  why don't you put all datasets which belong to a specific time into a group, one group for each timestep, and attach time information (physical time, seconds, float attribute) as attribute to this group?

    Werner

···

On Thu, 08 Dec 2011 00:40:34 -0600, Hoang Trong Minh Tuan <hoangtrongminhtuan@gmail.com> wrote:

Hi all,
  I am doing simulation in which I need to keep track of time information and 2d/3d data at each time step ( I may have more than one arrays). My question is what is the best way to store such data. Should I keep 2 separate dataset, one to store time, and one to store >2d/3d data; or I can combine them into a special dataset (which is I don't know)? Thanks a lot,

Tuan

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Hi Dr. Werner,
   I'm doing the simulation of cells. In such case, one group is a snapshot
at a single time point of the system. As such, I will have tens of
thousands of such groups in a file; or maybe multiple files, each file
contains thousands of groups. Also, I want to generate the video from these
snapshots using IDL. Would your suggestion still be the reasonable approach
or should I do in a different way? . Thank you!

Bests,
Tuan

···

On Thu, Dec 8, 2011 at 2:28 AM, Werner Benger <werner@cct.lsu.edu> wrote:

**
Hi Tuan,

why don't you put all datasets which belong to a specific time into a
group, one group for each timestep, and attach time information (physical
time, seconds, float attribute) as attribute to this group?

   Werner

On Thu, 08 Dec 2011 00:40:34 -0600, Hoang Trong Minh Tuan < > hoangtrongminhtuan@gmail.com> wrote:

Hi all,
   I am doing simulation in which I need to keep track of time information
and 2d/3d data at each time step ( I may have more than one arrays). My
question is what is the best way to store such data. Should I keep 2
separate dataset, one to store time, and one to store 2d/3d data; or I can
combine them into a special dataset (which is I don't know)?
   Thanks a lot,

Tuan

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

here is one idea: I would discretize my time dimension (bin these) and just store the data in a multidimensional data set, perhaps with the outermost dimension being time. Then store the primary data in the inner dimension(s) one layer of which could be the time variable (as float32) if you really that much time precision. If you don't need time stored at a granularity beyond your bin interval width, you will save a lot of space and achieve better computational efficiency.
     Storing data this way would allow you very efficient compression rates as well, and lend itself to video animation later in whatever environment you want to do that. Just some thoughts.
Cheers,
Joe Glassy

···

Sent from my iPad

On Dec 8, 2011, at 5:01, Hoang Trong Minh Tuan <hoangtrongminhtuan@gmail.com> wrote:

Hi Dr. Werner,
   I'm doing the simulation of cells. In such case, one group is a snapshot at a single time point of the system. As such, I will have tens of thousands of such groups in a file; or maybe multiple files, each file contains thousands of groups. Also, I want to generate the video from these snapshots using IDL. Would your suggestion still be the reasonable approach or should I do in a different way? . Thank you!

Bests,
Tuan

On Thu, Dec 8, 2011 at 2:28 AM, Werner Benger <werner@cct.lsu.edu> wrote:
Hi Tuan,

why don't you put all datasets which belong to a specific time into a group, one group for each timestep, and attach time information (physical time, seconds, float attribute) as attribute to this group?

   Werner

On Thu, 08 Dec 2011 00:40:34 -0600, Hoang Trong Minh Tuan <hoangtrongminhtuan@gmail.com> wrote:

Hi all,
   I am doing simulation in which I need to keep track of time information and 2d/3d data at each time step ( I may have more than one arrays). My question is what is the best way to store such data. Should I keep 2 separate dataset, one to store time, and one to store 2d/3d data; or I can combine them into a special dataset (which is I don't know)?
   Thanks a lot,

Tuan

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Tuan,

   with that many time steps, you might want to organize the time hierarchically, like having a group of hundred time groups, so 100 x 100 time groups cover the 10.000 timesteps. It's probably inefficient to have 10.000 timesteps or more in the same group, though I don't have experience (yet) with that scenario. It would also be inefficient if all your datasets per time step are pretty small. It might be better in that case to use a multidimensional dataset with one varying dimension, and this dimension being the time, such that you can append data as it flows and you get new ones.

  I don't use IDL, so I don't know which constraints IDL would give on the HDF5 layout. If IDL is your primary target, it might be best to investigate what data layout IDL can handle best.

    Werner

···

On Thu, 08 Dec 2011 07:01:36 -0600, Hoang Trong Minh Tuan <hoangtrongminhtuan@gmail.com> wrote:

Hi Dr. Werner,
  I'm doing the simulation of cells. In such case, one group is a snapshot at a single time point of the system. As such, I will have tens of thousands of such groups in a file; or maybe multiple files, each file contains thousands of groups. Also, I want to generate the >video from these snapshots using IDL. Would your suggestion still be the reasonable approach or should I do in a different way? . Thank you!

Bests,
Tuan

On Thu, Dec 8, 2011 at 2:28 AM, Werner Benger <werner@cct.lsu.edu> wrote:

Hi Tuan,

why don't you put all datasets which belong to a specific time into a group, one group for each timestep, and attach time information (physical time, seconds, float attribute) as attribute to this group?

  Werner

On Thu, 08 Dec 2011 00:40:34 -0600, Hoang Trong Minh Tuan >> <hoangtrongminhtuan@gmail.com> wrote:

Hi all,
  I am doing simulation in which I need to keep track of time information and 2d/3d data at each time step ( I may have more than one arrays). My question is what is the best way to store such data. Should I keep 2 separate dataset, one to store time, and one to >>>store 2d/3d data; or I can combine them into a special dataset (which is I don't know)? Thanks a lot,

Tuan

--

___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Hi Dr. Werner,
   I'm doing the simulation of cells. In such case, one group is a snapshot at a single time point of the system. As such, I will have tens of thousands of such groups in a file; or maybe multiple files, each file contains thousands of groups. Also, I want to generate the video from these snapshots using IDL. Would your suggestion still be the reasonable approach or should I do in a different way? . Thank you!

Bests,
Tuan

Hi Tuan,

why don't you put all datasets which belong to a specific time into a group, one group for each timestep, and attach time information (physical time, seconds, float attribute) as attribute to this group?

   Werner

Hi all,
   I am doing simulation in which I need to keep track of time information and 2d/3d data at each time step ( I may have more than one arrays). My question is what is the best way to store such data. Should I keep 2 separate dataset, one to store time, and one to store 2d/3d data; or I can combine them into a special dataset (which is I don't know)?
   Thanks a lot,

Tuan

Hi,

what about having one large (chunked?) array with dimensions, say, NxNxNt, where Nt is the number of time steps? Keep a separate linear array with the times in them, and use the indices to connect the correct timestamp to the correct hyperslab/slice in the large array.

Cheers
Paul

···

On Dec 8, 2011, at 5:01, Hoang Trong Minh Tuan <hoangtrongminhtuan@gmail.com> wrote:

On Thu, Dec 8, 2011 at 2:28 AM, Werner Benger <werner@cct.lsu.edu> wrote:
On Thu, 08 Dec 2011 00:40:34 -0600, Hoang Trong Minh Tuan <hoangtrongminhtuan@gmail.com> wrote:

Hi Werner,
   I've just successfully created a HDF5 with multi-groups and
multi-datasets. I have another question: what is the best way to attach the
time information (or may be some others) to each dataset.

Tuan

···

On Thu, Dec 8, 2011 at 2:31 PM, Werner Benger <werner@cct.lsu.edu> wrote:

**
Hi Tuan,

  with that many time steps, you might want to organize the time
hierarchically, like having a group of hundred time groups, so 100 x 100
time groups cover the 10.000 timesteps. It's probably inefficient to have
10.000 timesteps or more in the same group, though I don't have experience
(yet) with that scenario. It would also be inefficient if all your datasets
per time step are pretty small. It might be better in that case to use a
multidimensional dataset with one varying dimension, and this dimension
being the time, such that you can append data as it flows and you get new
ones.

I don't use IDL, so I don't know which constraints IDL would give on the
HDF5 layout. If IDL is your primary target, it might be best to investigate
what data layout IDL can handle best.

   Werner

On Thu, 08 Dec 2011 07:01:36 -0600, Hoang Trong Minh Tuan < > hoangtrongminhtuan@gmail.com> wrote:

Hi Dr. Werner,
   I'm doing the simulation of cells. In such case, one group is a
snapshot at a single time point of the system. As such, I will have tens of
thousands of such groups in a file; or maybe multiple files, each file
contains thousands of groups. Also, I want to generate the video from these
snapshots using IDL. Would your suggestion still be the reasonable approach
or should I do in a different way? . Thank you!

Bests,
Tuan

On Thu, Dec 8, 2011 at 2:28 AM, Werner Benger <werner@cct.lsu.edu> wrote:

Hi Tuan,

why don't you put all datasets which belong to a specific time into a
group, one group for each timestep, and attach time information (physical
time, seconds, float attribute) as attribute to this group?

   Werner

On Thu, 08 Dec 2011 00:40:34 -0600, Hoang Trong Minh Tuan < >> hoangtrongminhtuan@gmail.com> wrote:

Hi all,
   I am doing simulation in which I need to keep track of time
information and 2d/3d data at each time step ( I may have more than one
arrays). My question is what is the best way to store such data. Should I
keep 2 separate dataset, one to store time, and one to store 2d/3d data; or
I can combine them into a special dataset (which is I don't know)?
   Thanks a lot,

Tuan

--

___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University
(CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Hi Tuan,

  if you have multiple datasets for the same time then it would be better to attach the time information to the common group where they are in.

Using a double-valued attribute called "time" would do in the most simple case. If you need a more advanced specification of time, for instance using units on the time scale, you could use a named type for this time unit where such global properties are defined. This named type would best go in a group independent from those time group, for instance a group without time attribute, or the group which contains those time groups.

Possibly you might also want some "reverse lookup" for each dataset's name, like a table on which time values this dataset is available, in case this changes and you don't have all datasets defined on all times. This could be done by another group, and subgroups for each dataset, and then using symbolic links to the actual data, or via some dataset that provides the same information as a table. Just symbolic links are more elegant, I don't think it's possible to make a dataset containing symbolic links, at most object references, but that's not the same.

     Werner

···

On Thu, 15 Dec 2011 10:28:28 -0600, Hoang Trong Minh Tuan <hoangtrongminhtuan@gmail.com> wrote:

Hi Werner,
  I've just successfully created a HDF5 with multi-groups and multi-datasets. I have another question: what is the best way to attach the time information (or may be some others) to each dataset.

Tuan

On Thu, Dec 8, 2011 at 2:31 PM, Werner Benger <werner@cct.lsu.edu> wrote:

Hi Tuan,

with that many time steps, you might want to organize the time hierarchically, like having a group of hundred time groups, so 100 x 100 time groups cover the 10.000 timesteps. It's probably inefficient to have 10.000 timesteps or more in the same group, though I >>don't have experience (yet) with that scenario. It would also be inefficient if all your datasets per time step are pretty small. It might be better in that case to use a multidimensional dataset with one varying dimension, and this dimension being the time, such that you >>can append data as it flows and you get new ones.

I don't use IDL, so I don't know which constraints IDL would give on the HDF5 layout. If IDL is your primary target, it might be best to investigate what data layout IDL can handle best.

  Werner

On Thu, 08 Dec 2011 07:01:36 -0600, Hoang Trong Minh Tuan >> <hoangtrongminhtuan@gmail.com> wrote:

Hi Dr. Werner,
  I'm doing the simulation of cells. In such case, one group is a snapshot at a single time point of the system. As such, I will have tens of thousands of such groups in a file; or maybe multiple files, each file contains thousands of groups. Also, I want to generate the >>>video from these snapshots using IDL. Would your suggestion still be the reasonable approach or should I do in a different way? . Thank you!

Bests,
Tuan

On Thu, Dec 8, 2011 at 2:28 AM, Werner Benger <werner@cct.lsu.edu> >>> wrote:

Hi Tuan,

why don't you put all datasets which belong to a specific time into a group, one group for each timestep, and attach time information (physical time, seconds, float attribute) as attribute to this group?

  Werner

On Thu, 08 Dec 2011 00:40:34 -0600, Hoang Trong Minh Tuan >>>> <hoangtrongminhtuan@gmail.com> wrote:

Hi all,
  I am doing simulation in which I need to keep track of time information and 2d/3d data at each time step ( I may have more than one arrays). My question is what is the best way to store such data. Should I keep 2 separate dataset, one to store time, and >>>>>one to store 2d/3d data; or I can combine them into a special dataset (which is I don't know)? Thanks a lot,

Tuan

--

___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Dr. Werner,
  Thanks a lot for your advice. Right now, each HDF file has some groups,
each group has 2 dataset, both correspond to the same time-step. So, based
on your suggestion, I think attributes (holding the time step information)
should attach to the group.
However, I want to quickly to read the time information into an array, so
I'm thinking of putting the time points into an array which belong to an
attribute of the root group. So, if the array is a[...], then if each group
has 10 datasets
a[1] is the time for dataset1 in group 1
a[2]...... dataset2 in group 1
...
a[11] is the time for dataset1 in group 2

Do you think that should be fine? Also, is there a limit for size of data
containing in the attributes, or at least a good threshold?

Thanks,
Tuan

···

On Thu, Dec 15, 2011 at 12:54 PM, Werner Benger <werner@cct.lsu.edu> wrote:

**
Hi Tuan,

if you have multiple datasets for the same time then it would be better
to attach the time information to the common group where they are in.

Using a double-valued attribute called "time" would do in the most simple
case. If you need a more advanced specification of time, for instance using
units on the time scale, you could use a named type for this time unit
where such global properties are defined. This named type would best go in
a group independent from those time group, for instance a group without
time attribute, or the group which contains those time groups.

Possibly you might also want some "reverse lookup" for each dataset's
name, like a table on which time values this dataset is available, in case
this changes and you don't have all datasets defined on all times. This
could be done by another group, and subgroups for each dataset, and then
using symbolic links to the actual data, or via some dataset that provides
the same information as a table. Just symbolic links are more elegant, I
don't think it's possible to make a dataset containing symbolic links, at
most object references, but that's not the same.

    Werner

On Thu, 15 Dec 2011 10:28:28 -0600, Hoang Trong Minh Tuan < > hoangtrongminhtuan@gmail.com> wrote:

Hi Werner,
   I've just successfully created a HDF5 with multi-groups and
multi-datasets. I have another question: what is the best way to attach the
time information (or may be some others) to each dataset.

Tuan

On Thu, Dec 8, 2011 at 2:31 PM, Werner Benger <werner@cct.lsu.edu> wrote:

Hi Tuan,

  with that many time steps, you might want to organize the time
hierarchically, like having a group of hundred time groups, so 100 x 100
time groups cover the 10.000 timesteps. It's probably inefficient to have
10.000 timesteps or more in the same group, though I don't have experience
(yet) with that scenario. It would also be inefficient if all your datasets
per time step are pretty small. It might be better in that case to use a
multidimensional dataset with one varying dimension, and this dimension
being the time, such that you can append data as it flows and you get new
ones.

I don't use IDL, so I don't know which constraints IDL would give on the
HDF5 layout. If IDL is your primary target, it might be best to investigate
what data layout IDL can handle best.

   Werner

On Thu, 08 Dec 2011 07:01:36 -0600, Hoang Trong Minh Tuan < >> hoangtrongminhtuan@gmail.com> wrote:

Hi Dr. Werner,
   I'm doing the simulation of cells. In such case, one group is a
snapshot at a single time point of the system. As such, I will have tens of
thousands of such groups in a file; or maybe multiple files, each file
contains thousands of groups. Also, I want to generate the video from these
snapshots using IDL. Would your suggestion still be the reasonable approach
or should I do in a different way? . Thank you!

Bests,
Tuan

On Thu, Dec 8, 2011 at 2:28 AM, Werner Benger <werner@cct.lsu.edu> wrote:

Hi Tuan,

why don't you put all datasets which belong to a specific time into a
group, one group for each timestep, and attach time information (physical
time, seconds, float attribute) as attribute to this group?

   Werner

On Thu, 08 Dec 2011 00:40:34 -0600, Hoang Trong Minh Tuan < >>> hoangtrongminhtuan@gmail.com> wrote:

Hi all,
   I am doing simulation in which I need to keep track of time
information and 2d/3d data at each time step ( I may have more than one
arrays). My question is what is the best way to store such data. Should I
keep 2 separate dataset, one to store time, and one to store 2d/3d data; or
I can combine them into a special dataset (which is I don't know)?
   Thanks a lot,

Tuan

--

___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University
(CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

--

___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University
(CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Hi Tuan,

  it might be more efficient to formulate such a time-array as a dataset in the root group rather than an attribute. Datasets don't have size limits, attributes have limitations as they are supposed to be small. Not sure how much it is, could be some 64k limitation.

This dataset might be a one-dimensional array of a compound structure, containing the floating-point value of the time and a string containing the corresponding group name. That way you can read this array quickly and access the group associated with it, independent on which naming convention is used for the group's name. Could even be some random combination of letters. Still, might be good to see this time-dataset as a "cache" for attributes that are stored in the group, so generating this time-dataset could also be a postprocessing step when scanning the groups in the file with time attributes. This might be more efficient than re-creating and appending this time-dataset when each new data set is added to the file, but this would need to be explore in practice. Iterating over groups is also pretty fast, even for large files, but depends how good/bad it will be in your use case.

   Werner

···

On Fri, 16 Dec 2011 00:44:52 -0600, Hoang Trong Minh Tuan <hoangtrongminhtuan@gmail.com> wrote:

Dr. Werner,
Thanks a lot for your advice. Right now, each HDF file has some groups, each group has 2 dataset, both correspond to the same time-step. So, based on your suggestion, I think attributes (holding the time step information) should attach to the group.However, I want to quickly to read the time information into an array, so I'm thinking of putting the time points into an array which belong to an attribute of the root group. So, if the array is a[...], then if each group has 10 datasets
a[1] is the time for dataset1 in group 1
a[2]...... dataset2 in group 1
...
a[11] is the time for dataset1 in group 2

Do you think that should be fine? Also, is there a limit for size of data containing in the attributes, or at least a good threshold?

Thanks,
Tuan

On Thu, Dec 15, 2011 at 12:54 PM, Werner Benger <werner@cct.lsu.edu> > wrote:

Hi Tuan,

if you have multiple datasets for the same time then it would be better to attach the time information to the common group where they are in.

Using a double-valued attribute called "time" would do in the most simple case. If you need a more advanced specification of time, for instance using units on the time scale, you could use a named type for this time unit where such global properties are defined. This >>named type would best go in a group independent from those time group, for instance a group without time attribute, or the group which contains those time groups.

Possibly you might also want some "reverse lookup" for each dataset's name, like a table on which time values this dataset is available, in case this changes and you don't have all datasets defined on all times. This could be done by another group, and subgroups for >>each dataset, and then using symbolic links to the actual data, or via some dataset that provides the same information as a table. Just symbolic links are more elegant, I don't think it's possible to make a dataset containing symbolic links, at most object references, >>but that's not the same.

   Werner

On Thu, 15 Dec 2011 10:28:28 -0600, Hoang Trong Minh Tuan >> <hoangtrongminhtuan@gmail.com> wrote:

Hi Werner,
  I've just successfully created a HDF5 with multi-groups and multi-datasets. I have another question: what is the best way to attach the time information (or may be some others) to each dataset.

Tuan

On Thu, Dec 8, 2011 at 2:31 PM, Werner Benger <werner@cct.lsu.edu> >>> wrote:

Hi Tuan,

with that many time steps, you might want to organize the time hierarchically, like having a group of hundred time groups, so 100 x 100 time groups cover the 10.000 timesteps. It's probably inefficient to have 10.000 timesteps or more in the same group, >>>>though I don't have experience (yet) with that scenario. It would also be inefficient if all your datasets per time step are pretty small. It might be better in that case to use a multidimensional dataset with one varying dimension, and this dimension being the time, >>>>such that you can append data as it flows and you get new ones.

I don't use IDL, so I don't know which constraints IDL would give on the HDF5 layout. If IDL is your primary target, it might be best to investigate what data layout IDL can handle best.

  Werner

On Thu, 08 Dec 2011 07:01:36 -0600, Hoang Trong Minh Tuan >>>> <hoangtrongminhtuan@gmail.com> wrote:

Hi Dr. Werner,
  I'm doing the simulation of cells. In such case, one group is a snapshot at a single time point of the system. As such, I will have tens of thousands of such groups in a file; or maybe multiple files, each file contains thousands of groups. Also, I want to generate >>>>>the video from these snapshots using IDL. Would your suggestion still be the reasonable approach or should I do in a different way? . Thank you!

Bests,
Tuan

On Thu, Dec 8, 2011 at 2:28 AM, Werner Benger <werner@cct.lsu.edu> >>>>> wrote:

Hi Tuan,

why don't you put all datasets which belong to a specific time into a group, one group for each timestep, and attach time information (physical time, seconds, float attribute) as attribute to this group?

  Werner

On Thu, 08 Dec 2011 00:40:34 -0600, Hoang Trong Minh Tuan >>>>>> <hoangtrongminhtuan@gmail.com> wrote:

Hi all,
  I am doing simulation in which I need to keep track of time information and 2d/3d data at each time step ( I may have more than one arrays). My question is what is the best way to store such data. Should I keep 2 separate dataset, one to store time, >>>>>>>and one to store 2d/3d data; or I can combine them into a special dataset (which is I don't know)? Thanks a lot,

Tuan

--

___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Thanks a lot Dr. Werner. This is very helpful.

Merry Christmas,
Tuan

···

On Fri, Dec 16, 2011 at 3:23 AM, Werner Benger <werner@cct.lsu.edu> wrote:

**
Hi Tuan,

it might be more efficient to formulate such a time-array as a dataset in
the root group rather than an attribute. Datasets don't have size limits,
attributes have limitations as they are supposed to be small. Not sure how
much it is, could be some 64k limitation.

This dataset might be a one-dimensional array of a compound structure,
containing the floating-point value of the time and a string containing the
corresponding group name. That way you can read this array quickly and
access the group associated with it, independent on which naming convention
is used for the group's name. Could even be some random combination of
letters. Still, might be good to see this time-dataset as a "cache" for
attributes that are stored in the group, so generating this time-dataset
could also be a postprocessing step when scanning the groups in the file
with time attributes. This might be more efficient than re-creating and
appending this time-dataset when each new data set is added to the file,
but this would need to be explore in practice. Iterating over groups is
also pretty fast, even for large files, but depends how good/bad it will be
in your use case.

  Werner

On Fri, 16 Dec 2011 00:44:52 -0600, Hoang Trong Minh Tuan < > hoangtrongminhtuan@gmail.com> wrote:

Dr. Werner,
  Thanks a lot for your advice. Right now, each HDF file has some groups,
each group has 2 dataset, both correspond to the same time-step. So, based
on your suggestion, I think attributes (holding the time step information)
should attach to the group.
However, I want to quickly to read the time information into an array, so
I'm thinking of putting the time points into an array which belong to an
attribute of the root group. So, if the array is a[...], then if each group
has 10 datasets
a[1] is the time for dataset1 in group 1
a[2]...... dataset2 in group 1
...
a[11] is the time for dataset1 in group 2

Do you think that should be fine? Also, is there a limit for size of data
containing in the attributes, or at least a good threshold?

Thanks,
Tuan

On Thu, Dec 15, 2011 at 12:54 PM, Werner Benger <werner@cct.lsu.edu>wrote:

Hi Tuan,

if you have multiple datasets for the same time then it would be better
to attach the time information to the common group where they are in.

Using a double-valued attribute called "time" would do in the most simple
case. If you need a more advanced specification of time, for instance using
units on the time scale, you could use a named type for this time unit
where such global properties are defined. This named type would best go in
a group independent from those time group, for instance a group without
time attribute, or the group which contains those time groups.

Possibly you might also want some "reverse lookup" for each dataset's
name, like a table on which time values this dataset is available, in case
this changes and you don't have all datasets defined on all times. This
could be done by another group, and subgroups for each dataset, and then
using symbolic links to the actual data, or via some dataset that provides
the same information as a table. Just symbolic links are more elegant, I
don't think it's possible to make a dataset containing symbolic links, at
most object references, but that's not the same.

    Werner

On Thu, 15 Dec 2011 10:28:28 -0600, Hoang Trong Minh Tuan < >> hoangtrongminhtuan@gmail.com> wrote:

Hi Werner,
   I've just successfully created a HDF5 with multi-groups and
multi-datasets. I have another question: what is the best way to attach the
time information (or may be some others) to each dataset.

Tuan

On Thu, Dec 8, 2011 at 2:31 PM, Werner Benger <werner@cct.lsu.edu> wrote:

Hi Tuan,

  with that many time steps, you might want to organize the time
hierarchically, like having a group of hundred time groups, so 100 x 100
time groups cover the 10.000 timesteps. It's probably inefficient to have
10.000 timesteps or more in the same group, though I don't have experience
(yet) with that scenario. It would also be inefficient if all your datasets
per time step are pretty small. It might be better in that case to use a
multidimensional dataset with one varying dimension, and this dimension
being the time, such that you can append data as it flows and you get new
ones.

I don't use IDL, so I don't know which constraints IDL would give on
the HDF5 layout. If IDL is your primary target, it might be best to
investigate what data layout IDL can handle best.

   Werner

On Thu, 08 Dec 2011 07:01:36 -0600, Hoang Trong Minh Tuan < >>> hoangtrongminhtuan@gmail.com> wrote:

Hi Dr. Werner,
   I'm doing the simulation of cells. In such case, one group is a
snapshot at a single time point of the system. As such, I will have tens of
thousands of such groups in a file; or maybe multiple files, each file
contains thousands of groups. Also, I want to generate the video from these
snapshots using IDL. Would your suggestion still be the reasonable approach
or should I do in a different way? . Thank you!

Bests,
Tuan

On Thu, Dec 8, 2011 at 2:28 AM, Werner Benger <werner@cct.lsu.edu>wrote:

Hi Tuan,

why don't you put all datasets which belong to a specific time into a
group, one group for each timestep, and attach time information (physical
time, seconds, float attribute) as attribute to this group?

   Werner

On Thu, 08 Dec 2011 00:40:34 -0600, Hoang Trong Minh Tuan < >>>> hoangtrongminhtuan@gmail.com> wrote:

Hi all,
   I am doing simulation in which I need to keep track of time
information and 2d/3d data at each time step ( I may have more than one
arrays). My question is what is the best way to store such data. Should I
keep 2 separate dataset, one to store time, and one to store 2d/3d data; or
I can combine them into a special dataset (which is I don't know)?
   Thanks a lot,

Tuan

--

___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University
(CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

--

___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University
(CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

--

___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University
(CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Hi Tuan,

Please take a look at the netCDF4 solution, this should give you a good idea how to implement it. Or if you like to have a pure HDF5 solution look at the H5DS (HDF Dimension Scales API), which is developed for situation you want to implement.

In practice, I always implement it in this way (and yes, I also prefer to have "time" as a dataset, not an attribute):

    /Group1/dataset1[T-dim,Y-dim,X-dim] <== notice that the dimensions
    in C sequence
             dataset2[T-dim,Y1-dim, X1-dim] <== Y-dim can be different
    from Y1-dim or X-dim can be different from X1-dim
             time[T-dim]
    /Group2/dataset3[T1-dim,Y-dim,X-dim]
             dataset4[T1-dim,Y-dim,X1-dim]
             time[T1-dim]
    ...

Greetings, Richard

···

On 12/16/2011 09:23 AM, Werner Benger wrote:

Hi Tuan,

it might be more efficient to formulate such a time-array as a dataset in the root group rather than an attribute. Datasets don't have size limits, attributes have limitations as they are supposed to be small. Not sure how much it is, could be some 64k limitation.

This dataset might be a one-dimensional array of a compound structure, containing the floating-point value of the time and a string containing the corresponding group name. That way you can read this array quickly and access the group associated with it, independent on which naming convention is used for the group's name. Could even be some random combination of letters. Still, might be good to see this time-dataset as a "cache" for attributes that are stored in the group, so generating this time-dataset could also be a postprocessing step when scanning the groups in the file with time attributes. This might be more efficient than re-creating and appending this time-dataset when each new data set is added to the file, but this would need to be explore in practice. Iterating over groups is also pretty fast, even for large files, but depends how good/bad it will be in your use case.

  Werner

On Fri, 16 Dec 2011 00:44:52 -0600, Hoang Trong Minh Tuan > <hoangtrongminhtuan@gmail.com> wrote:

    Dr. Werner,
      Thanks a lot for your advice. Right now, each HDF file has some
    groups, each group has 2 dataset, both correspond to the same
    time-step. So, based on your suggestion, I think attributes
    (holding the time step information) should attach to the group.
     However, I want to quickly to read the time information into an
    array, so I'm thinking of putting the time points into an array
    which belong to an attribute of the root group. So, if the array
    is a[...], then if each group has 10 datasets
    a[1] is the time for dataset1 in group 1
    a[2]...... dataset2 in group 1
    ...
    a[11] is the time for dataset1 in group 2

    Do you think that should be fine? Also, is there a limit for size
    of data containing in the attributes, or at least a good threshold?

    Thanks,
    Tuan

    On Thu, Dec 15, 2011 at 12:54 PM, Werner Benger > <werner@cct.lsu.edu <mailto:werner@cct.lsu.edu>> wrote:

        Hi Tuan,

         if you have multiple datasets for the same time then it would
        be better to attach the time information to the common group
        where they are in.

        Using a double-valued attribute called "time" would do in the
        most simple case. If you need a more advanced specification of
        time, for instance using units on the time scale, you could
        use a named type for this time unit where such global
        properties are defined. This named type would best go in a
        group independent from those time group, for instance a group
        without time attribute, or the group which contains those time
        groups.

        Possibly you might also want some "reverse lookup" for each
        dataset's name, like a table on which time values this dataset
        is available, in case this changes and you don't have all
        datasets defined on all times. This could be done by another
        group, and subgroups for each dataset, and then using symbolic
        links to the actual data, or via some dataset that provides
        the same information as a table. Just symbolic links are more
        elegant, I don't think it's possible to make a dataset
        containing symbolic links, at most object references, but
        that's not the same.

            Werner

        On Thu, 15 Dec 2011 10:28:28 -0600, Hoang Trong Minh Tuan > <hoangtrongminhtuan@gmail.com > <mailto:hoangtrongminhtuan@gmail.com>> wrote:

            Hi Werner,
               I've just successfully created a HDF5 with multi-groups
            and multi-datasets. I have another question: what is the
            best way to attach the time information (or may be some
            others) to each dataset.

            Tuan

            On Thu, Dec 8, 2011 at 2:31 PM, Werner Benger > <werner@cct.lsu.edu <mailto:werner@cct.lsu.edu>> wrote:

                 Hi Tuan,

                  with that many time steps, you might want to
                organize the time hierarchically, like having a group
                of hundred time groups, so 100 x 100 time groups cover
                the 10.000 timesteps. It's probably inefficient to
                have 10.000 timesteps or more in the same group,
                though I don't have experience (yet) with that
                scenario. It would also be inefficient if all your
                datasets per time step are pretty small. It might be
                better in that case to use a multidimensional dataset
                with one varying dimension, and this dimension being
                the time, such that you can append data as it flows
                and you get new ones.

                 I don't use IDL, so I don't know which constraints
                IDL would give on the HDF5 layout. If IDL is your
                primary target, it might be best to investigate what
                data layout IDL can handle best.

                   Werner

                On Thu, 08 Dec 2011 07:01:36 -0600, Hoang Trong Minh > Tuan <hoangtrongminhtuan@gmail.com > <mailto:hoangtrongminhtuan@gmail.com>> wrote:

                    Hi Dr. Werner,
                       I'm doing the simulation of cells. In such
                    case, one group is a snapshot at a single time
                    point of the system. As such, I will have tens of
                    thousands of such groups in a file; or maybe
                    multiple files, each file contains thousands of
                    groups. Also, I want to generate the video from
                    these snapshots using IDL. Would your suggestion
                    still be the reasonable approach or should I do in
                    a different way? . Thank you!

                    Bests,
                    Tuan

                    On Thu, Dec 8, 2011 at 2:28 AM, Werner Benger > <werner@cct.lsu.edu <mailto:werner@cct.lsu.edu>> > wrote:

                        Hi Tuan,

                         why don't you put all datasets which belong
                        to a specific time into a group, one group for
                        each timestep, and attach time information
                        (physical time, seconds, float attribute) as
                        attribute to this group?

                           Werner

                        On Thu, 08 Dec 2011 00:40:34 -0600, Hoang > Trong Minh Tuan <hoangtrongminhtuan@gmail.com > <mailto:hoangtrongminhtuan@gmail.com>> wrote:

                            Hi all,
                               I am doing simulation in which I need
                            to keep track of time information and
                            2d/3d data at each time step ( I may have
                            more than one arrays). My question is what
                            is the best way to store such data. Should
                            I keep 2 separate dataset, one to store
                            time, and one to store 2d/3d data; or I
                            can combine them into a special dataset
                            (which is I don't know)?
                               Thanks a lot,

                            Tuan

                        -- ___________________________________________________________________________
                        Dr. Werner Benger Visualization Research
                        Laboratory for Creative Arts and Technology (LCAT)
                        Center for Computation & Technology at
                        Louisiana State University (CCT/LSU)
                        211 Johnston Hall, Baton Rouge, Louisiana 70803
                        Tel.: +1 225 578 4809
                        <tel:%2B1%20225%20578%204809> Fax.: +1 225
                        578-5362 <tel:%2B1%20225%20578-5362>

                -- ___________________________________________________________________________
                Dr. Werner Benger Visualization Research
                Laboratory for Creative Arts and Technology (LCAT)
                Center for Computation & Technology at Louisiana State
                University (CCT/LSU)
                211 Johnston Hall, Baton Rouge, Louisiana 70803
                Tel.: +1 225 578 4809 <tel:%2B1%20225%20578%204809>
                Fax.: +1 225 578-5362 <tel:%2B1%20225%20578-5362>

        -- ___________________________________________________________________________
        Dr. Werner Benger Visualization Research
        Laboratory for Creative Arts and Technology (LCAT)
        Center for Computation & Technology at Louisiana State
        University (CCT/LSU)
        211 Johnston Hall, Baton Rouge, Louisiana 70803
        Tel.: +1 225 578 4809 <tel:%2B1%20225%20578%204809> Fax.: +1
        225 578-5362 <tel:%2B1%20225%20578-5362>

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org