dumping particle trajectories

Pierre_de_Buyl1 · March 24, 2011, 7:35pm

Hello,

I would like to make an additional suggestion.

With some colleagues, we set on to devise a specification on how a HDF5 should
be laid out for data of particle-based simulations. The specification is called
H5MD and is found here: http://research.colberg.org/projects/molsim/

This is, for now, only a specification and not a library, but I think that it
provides a good basis for molecular simulations while being useful to other kind
of simulations.

To handle varying number of particles, it is possible to store the data in a
[T][N][D] dataset (T is the number of timesteps, N the number of particles and D
the number of spatial dimensions.) in which a chunk size is defined along the
particle-wise axis. That way, you can take N to be N_max, the maximum number of
particles, and the space taken on disk will be zero for the non-written-to
chunks.

I hope it helps and welcome comments!

Pierre de Buyl

···

Wed, 16 Mar 2011 07:15:00 -0700
Yngve,
especially if the number of particles might change over time, using 1D
arrays might be more appropriate, possibly combined with index lookup arrays
that allows to identify particles at T0 to T1 and nice versa. I'm using
such a 1D layout for particles and particle trajectories as part of my
F5 library, here is a coding example on how to write particle positions
with some fields given on them (it's all HDF5-based):
http://svn.origo.ethz.ch/wsvn/f5/doc/Particles_8c-example.html
It's inefficient only very few particles because the overhead on
the metadata structure is then more prominent, but for millions
of particles that would be well. I haven't tried this structure yet
with a million timesteps, which would lead to a million groups then.
I would assume HDF5 is able to handle such a situation well, but
it could make sense to bundle groups of similar timesteps hierarchically,
too.

On Wed, 16 Mar 2011 08:29:27 -0500, Yngve Inntjore Levinsen :

Yes of course Francesc, I was thinking float = half of 64bit instead of 4x 8bit I was thinking that
it might be beneficial to keep the size in powers of 2, so that is why I chose 1024 and not 1000. I keep
it as a variable so I can easily change it.
Werner, I was thinking that I should eventually move to a sequence of 1D arrays, but it requires
slightly more rewriting. The number of lines I have to write depends on whether or not the particle is
still alive. I am starting out with an equal amount of particles, but have no means to know if I need to
write the position of a given particle 0 times or one million times. Typically I have something like 1
million timesteps, but I do not write down trajectories all the time (when is dependent on the Monte
Carlo so no way to know in advance)
Ideally I would've written all analysis into the code itself so I didn't have to write the trajectories
all the time (I have not made this choice!), but that requires too much work for me to handle at the
moment. Using HDF5 will reduce the storage space needed by about a factor 6 from my estimates, improve
precision, and significantly reduce CPU hours needed as well. This is already a great improvement!
Cheers,
Yngve
On Wednesday 16 March 2011 02:09:36 PM Werner Benger wrote:

Hi,
what's the reason for using a 2D extendable dataset instead of a sequence
of 1D arrays
in a group, using one group per time step? How many particles and time
steps do you
have typically? I assume in your case the number of particles is constant
over time?
Cheers,
Werner
On Wed, 16 Mar 2011 03:52:10 -0500, Yngve Inntjore Levinsen >>> <> wrote:
> Dear hierarchical people,
>
> I have currently converted a piece of code from using a simple ascii
> format for output into using HDF5. What the code does is at every
> iteration dumping some information about particle
> energy/trajectory/position to the ascii file (this is a particle
> tracking code).
>
> Initially I then did the same with the HDF5 library, having a unlimited
> row dimension in a 2D array and using h5extend_f to extend by one
> element each time and writing a hyperslab of one row to the file. As
> some (perhaps most) of you might have guessed or know already, this was
> a rather bad idea. The file (without compression) was about the same
> size as the ascii file (but obviously with higher precision), and
> reading the file in subsequent analysis was at least an order of
> magnitude slower.
>
> I then realized that I probably needed to write less frequently and
> rather keeping a semi-large hyperslab in memory. I chose a hyperslab of > 1000 rows, but otherwise
using the same procedure. This seems to be both
> fast and with compression creating quite a bit smaller file. I tried
> even larger slabs, but did not see any speed improvement in my initial
> testing
>
> My question really was just if there are some recommended ways to do
> this? I would imagine I am not the first that want to use HDF5 in this
> way, dumping some data at every iteration of a given simulation, without
> having to keep it all in memory until the end?
>
> Thanks for all explanations/suggestions/experiences related to this
> problem you can provide me so I can make the best design choices in my
> program!
>
> Cheers,
> Yngve

-----------------------------------------------------------
Pierre de Buyl
Physique des Systèmes Complexes et Mécanique Statistique - Université Libre de Bruxelles
Chemical Physics Theory Group - University of Toronto
web: http://homepages.ulb.ac.be/~pdebuyl/
-----------------------------------------------------------

Mark_Howison · March 24, 2011, 7:51pm

.... and I'll throw in one more suggestion, the H5Part library:

http://vis.lbl.gov/Research/H5Part/

which allows you to quickly and easily dump out particle data into an HDF5 file.

The data model is the same one Werner suggested: each timestep has its
own group, and the particles are stored as 1D arrays within those
groups. You can have different numbers of particles in each timestep.

For each iteration, you would do something like:

file = H5PartOpenFile("particles.h5", H5_O_WRONLY, MPI_COMM_WORLD);

(for loop) {
H5PartSetStep(file, i);
H5PartSetNumParticles(file, nparticles);
H5PartWriteDataFloat64(file, "x", x);
H5PartWriteDataFloat64(file, "y", y);
H5PartWriteDataFloat64(file, "z", z);
H5PartWriteDataFloat64(file, "px", px);
H5PartWriteDataFloat64(file, "py", py);
H5PartWriteDataFloat64(file, "pz", pz);
}

H5PartCloseFile(file);

Hope that helps,
Mark

···

On Thu, Mar 24, 2011 at 3:35 PM, Pierre de Buyl <pdebuyl@chem.utoronto.ca> wrote:

Hello,

I would like to make an additional suggestion.

With some colleagues, we set on to devise a specification on how a HDF5
should
be laid out for data of particle-based simulations. The specification is
called
H5MD and is found here: http://research.colberg.org/projects/molsim/

This is, for now, only a specification and not a library, but I think that
it
provides a good basis for molecular simulations while being useful to other
kind
of simulations.

To handle varying number of particles, it is possible to store the data in a
[T][N][D] dataset (T is the number of timesteps, N the number of particles
and D
the number of spatial dimensions.) in which a chunk size is defined along
the
particle-wise axis. That way, you can take N to be N_max, the maximum number
of
particles, and the space taken on disk will be zero for the non-written-to
chunks.

I hope it helps and welcome comments!

Pierre de Buyl

Wed, 16 Mar 2011 07:15:00 -0700
Yngve,
especially if the number of particles might change over time, using 1D
arrays might be more appropriate, possibly combined with index lookup
arrays
that allows to identify particles at T0 to T1 and nice versa. I'm using
such a 1D layout for particles and particle trajectories as part of my
F5 library, here is a coding example on how to write particle positions
with some fields given on them (it's all HDF5-based):
http://svn.origo.ethz.ch/wsvn/f5/doc/Particles_8c-example.html
It's inefficient only very few particles because the overhead on
the metadata structure is then more prominent, but for millions
of particles that would be well. I haven't tried this structure yet
with a million timesteps, which would lead to a million groups then.
I would assume HDF5 is able to handle such a situation well, but
it could make sense to bundle groups of similar timesteps hierarchically,
too.

On Wed, 16 Mar 2011 08:29:27 -0500, Yngve Inntjore Levinsen :

Yes of course Francesc, I was thinking float = half of 64bit instead of
4x 8bit I was thinking that
it might be beneficial to keep the size in powers of 2, so that is why I
chose 1024 and not 1000. I keep
it as a variable so I can easily change it.
Werner, I was thinking that I should eventually move to a sequence of 1D
arrays, but it requires
slightly more rewriting. The number of lines I have to write depends on
whether or not the particle is
still alive. I am starting out with an equal amount of particles, but
have no means to know if I need to
write the position of a given particle 0 times or one million times.
Typically I have something like 1
million timesteps, but I do not write down trajectories all the time
(when is dependent on the Monte
Carlo so no way to know in advance)
Ideally I would've written all analysis into the code itself so I didn't
have to write the trajectories
all the time (I have not made this choice!), but that requires too much
work for me to handle at the
moment. Using HDF5 will reduce the storage space needed by about a factor
6 from my estimates, improve
precision, and significantly reduce CPU hours needed as well. This is
already a great improvement!
Cheers,
Yngve
On Wednesday 16 March 2011 02:09:36 PM Werner Benger wrote:

Hi,
what's the reason for using a 2D extendable dataset instead of a
sequence
of 1D arrays
in a group, using one group per time step? How many particles and time
steps do you
have typically? I assume in your case the number of particles is
constant
over time?
Cheers,
Werner
On Wed, 16 Mar 2011 03:52:10 -0500, Yngve Inntjore Levinsen >>>> <> wrote:
> Dear hierarchical people,
>
> I have currently converted a piece of code from using a simple ascii
> format for output into using HDF5. What the code does is at every
> iteration dumping some information about particle
> energy/trajectory/position to the ascii file (this is a particle
> tracking code).
>
> Initially I then did the same with the HDF5 library, having a
> unlimited
> row dimension in a 2D array and using h5extend_f to extend by one
> element each time and writing a hyperslab of one row to the file. As
> some (perhaps most) of you might have guessed or know already, this
> was
> a rather bad idea. The file (without compression) was about the same
> size as the ascii file (but obviously with higher precision), and
> reading the file in subsequent analysis was at least an order of
> magnitude slower.
>
> I then realized that I probably needed to write less frequently and
> rather keeping a semi-large hyperslab in memory. I chose a hyperslab
> of > 1000 rows, but otherwise
using the same procedure. This seems to be both
> fast and with compression creating quite a bit smaller file. I tried
> even larger slabs, but did not see any speed improvement in my initial
> testing
>
> My question really was just if there are some recommended ways to do
> this? I would imagine I am not the first that want to use HDF5 in this
> way, dumping some data at every iteration of a given simulation,
> without
> having to keep it all in memory until the end?
>
> Thanks for all explanations/suggestions/experiences related to this
> problem you can provide me so I can make the best design choices in my
> program!
>
> Cheers,
> Yngve

-----------------------------------------------------------
Pierre de Buyl
Physique des Systèmes Complexes et Mécanique Statistique - Université Libre
de Bruxelles
Chemical Physics Theory Group - University of Toronto
web: http://homepages.ulb.ac.be/~pdebuyl/
-----------------------------------------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Yngve_Inntjore_Levi1 · March 25, 2011, 2:33pm

Hi,

Thanks to both of you for very interesting suggestions which I am quite sure I wouldn't have found on my own! I'll look into both of them as soon as I have time.

There is another issue (this is "stupidity" of the code, not HDF5) that it actually only tracks 64 particles at the time. I think it was some kind of memory limitations back in the days (the code is still F77 for the most part). That means that I don't actually have all particles at the given timestep at the same time in the simulation, so I get.. well many dimensions quite quickly.

I'll keep you posted about my outcome!

Cheers,
Yngve

···

On Thursday 24 March 2011 08:51:43 PM Mark Howison wrote:

.... and I'll throw in one more suggestion, the H5Part library:

http://vis.lbl.gov/Research/H5Part/

which allows you to quickly and easily dump out particle data into an HDF5 file.

The data model is the same one Werner suggested: each timestep has its
own group, and the particles are stored as 1D arrays within those
groups. You can have different numbers of particles in each timestep.

For each iteration, you would do something like:

file = H5PartOpenFile("particles.h5", H5_O_WRONLY, MPI_COMM_WORLD);

(for loop) {
H5PartSetStep(file, i);
H5PartSetNumParticles(file, nparticles);
H5PartWriteDataFloat64(file, "x", x);
H5PartWriteDataFloat64(file, "y", y);
H5PartWriteDataFloat64(file, "z", z);
H5PartWriteDataFloat64(file, "px", px);
H5PartWriteDataFloat64(file, "py", py);
H5PartWriteDataFloat64(file, "pz", pz);
}

H5PartCloseFile(file);

Hope that helps,
Mark

On Thu, Mar 24, 2011 at 3:35 PM, Pierre de Buyl > <pdebuyl@chem.utoronto.ca> wrote:
> Hello,
>
> I would like to make an additional suggestion.
>
> With some colleagues, we set on to devise a specification on how a HDF5
> should
> be laid out for data of particle-based simulations. The specification is
> called
> H5MD and is found here: http://research.colberg.org/projects/molsim/
>
> This is, for now, only a specification and not a library, but I think that
> it
> provides a good basis for molecular simulations while being useful to other
> kind
> of simulations.
>
> To handle varying number of particles, it is possible to store the data in a
> [T][N][D] dataset (T is the number of timesteps, N the number of particles
> and D
> the number of spatial dimensions.) in which a chunk size is defined along
> the
> particle-wise axis. That way, you can take N to be N_max, the maximum number
> of
> particles, and the space taken on disk will be zero for the non-written-to
> chunks.
>
> I hope it helps and welcome comments!
>
> Pierre de Buyl
>
>
>> Wed, 16 Mar 2011 07:15:00 -0700
>> Yngve,
>> especially if the number of particles might change over time, using 1D
>> arrays might be more appropriate, possibly combined with index lookup
>> arrays
>> that allows to identify particles at T0 to T1 and nice versa. I'm using
>> such a 1D layout for particles and particle trajectories as part of my
>> F5 library, here is a coding example on how to write particle positions
>> with some fields given on them (it's all HDF5-based):
>> http://svn.origo.ethz.ch/wsvn/f5/doc/Particles_8c-example.html
>> It's inefficient only very few particles because the overhead on
>> the metadata structure is then more prominent, but for millions
>> of particles that would be well. I haven't tried this structure yet
>> with a million timesteps, which would lead to a million groups then.
>> I would assume HDF5 is able to handle such a situation well, but
>> it could make sense to bundle groups of similar timesteps hierarchically,
>> too.
>>
>> On Wed, 16 Mar 2011 08:29:27 -0500, Yngve Inntjore Levinsen :
>>
>>> Yes of course Francesc, I was thinking float = half of 64bit instead of
>>> 4x 8bit I was thinking that
>>> it might be beneficial to keep the size in powers of 2, so that is why I
>>> chose 1024 and not 1000. I keep
>>> it as a variable so I can easily change it.
>>> Werner, I was thinking that I should eventually move to a sequence of 1D
>>> arrays, but it requires
>>> slightly more rewriting. The number of lines I have to write depends on
>>> whether or not the particle is
>>> still alive. I am starting out with an equal amount of particles, but
>>> have no means to know if I need to
>>> write the position of a given particle 0 times or one million times.
>>> Typically I have something like 1
>>> million timesteps, but I do not write down trajectories all the time
>>> (when is dependent on the Monte
>>> Carlo so no way to know in advance)
>>> Ideally I would've written all analysis into the code itself so I didn't
>>> have to write the trajectories
>>> all the time (I have not made this choice!), but that requires too much
>>> work for me to handle at the
>>> moment. Using HDF5 will reduce the storage space needed by about a factor
>>> 6 from my estimates, improve
>>> precision, and significantly reduce CPU hours needed as well. This is
>>> already a great improvement!
>>> Cheers,
>>> Yngve
>>> On Wednesday 16 March 2011 02:09:36 PM Werner Benger wrote:
>>>
>>>> Hi,
>>>> what's the reason for using a 2D extendable dataset instead of a
>>>> sequence
>>>> of 1D arrays
>>>> in a group, using one group per time step? How many particles and time
>>>> steps do you
>>>> have typically? I assume in your case the number of particles is
>>>> constant
>>>> over time?
>>>> Cheers,
>>>> Werner
>>>> On Wed, 16 Mar 2011 03:52:10 -0500, Yngve Inntjore Levinsen > >>>> <> wrote:
>>>> > Dear hierarchical people,
>>>> >
>>>> > I have currently converted a piece of code from using a simple ascii
>>>> > format for output into using HDF5. What the code does is at every
>>>> > iteration dumping some information about particle
>>>> > energy/trajectory/position to the ascii file (this is a particle
>>>> > tracking code).
>>>> >
>>>> > Initially I then did the same with the HDF5 library, having a
>>>> > unlimited
>>>> > row dimension in a 2D array and using h5extend_f to extend by one
>>>> > element each time and writing a hyperslab of one row to the file. As
>>>> > some (perhaps most) of you might have guessed or know already, this
>>>> > was
>>>> > a rather bad idea. The file (without compression) was about the same
>>>> > size as the ascii file (but obviously with higher precision), and
>>>> > reading the file in subsequent analysis was at least an order of
>>>> > magnitude slower.
>>>> >
>>>> > I then realized that I probably needed to write less frequently and
>>>> > rather keeping a semi-large hyperslab in memory. I chose a hyperslab
>>>> > of > 1000 rows, but otherwise
>>>> using the same procedure. This seems to be both
>>>> > fast and with compression creating quite a bit smaller file. I tried
>>>> > even larger slabs, but did not see any speed improvement in my initial
>>>> > testing
>>>> >
>>>> > My question really was just if there are some recommended ways to do
>>>> > this? I would imagine I am not the first that want to use HDF5 in this
>>>> > way, dumping some data at every iteration of a given simulation,
>>>> > without
>>>> > having to keep it all in memory until the end?
>>>> >
>>>> > Thanks for all explanations/suggestions/experiences related to this
>>>> > problem you can provide me so I can make the best design choices in my
>>>> > program!
>>>> >
>>>> > Cheers,
>>>> > Yngve
>>>>
>>>>
>
>
> -----------------------------------------------------------
> Pierre de Buyl
> Physique des Systèmes Complexes et Mécanique Statistique - Université Libre
> de Bruxelles
> Chemical Physics Theory Group - University of Toronto
> web: http://homepages.ulb.ac.be/~pdebuyl/
> -----------------------------------------------------------
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

werner · March 25, 2011, 2:45pm

Hi Yngve,

this situation seems to call for an hyperslab of 64 particles each.

It becomes more complex though if the number of particles changes over
time and you need to store connectivity information across timesteps
(which particles split up, which one merge). Basically such functionality
would be supported optionally in the F5 model.

Werner

···

On Fri, 25 Mar 2011 09:33:47 -0500, Yngve Inntjore Levinsen <yngve.inntjore.levinsen@cern.ch> wrote:

Hi,

Thanks to both of you for very interesting suggestions which I am quite sure I wouldn't have found on my own! I'll look into both of them as soon as I have time.

There is another issue (this is "stupidity" of the code, not HDF5) that it actually only tracks 64 particles at the time. I think it was some kind of memory limitations back in the days (the code is still F77 for the most part). That means that I don't actually have all particles at the given timestep at the same time in the simulation, so I get.. well many dimensions quite quickly.

I'll keep you posted about my outcome!

Cheers,
Yngve

On Thursday 24 March 2011 08:51:43 PM Mark Howison wrote:

.... and I'll throw in one more suggestion, the H5Part library:

http://vis.lbl.gov/Research/H5Part/

which allows you to quickly and easily dump out particle data into an HDF5 file.

The data model is the same one Werner suggested: each timestep has its
own group, and the particles are stored as 1D arrays within those
groups. You can have different numbers of particles in each timestep.

For each iteration, you would do something like:

file = H5PartOpenFile("particles.h5", H5_O_WRONLY, MPI_COMM_WORLD);

(for loop) {
H5PartSetStep(file, i);
H5PartSetNumParticles(file, nparticles);
H5PartWriteDataFloat64(file, "x", x);
H5PartWriteDataFloat64(file, "y", y);
H5PartWriteDataFloat64(file, "z", z);
H5PartWriteDataFloat64(file, "px", px);
H5PartWriteDataFloat64(file, "py", py);
H5PartWriteDataFloat64(file, "pz", pz);
}

H5PartCloseFile(file);

Hope that helps,
Mark

On Thu, Mar 24, 2011 at 3:35 PM, Pierre de Buyl >> <pdebuyl@chem.utoronto.ca> wrote:
> Hello,
>
> I would like to make an additional suggestion.
>
> With some colleagues, we set on to devise a specification on how a HDF5
> should
> be laid out for data of particle-based simulations. The specification is
> called
> H5MD and is found here: http://research.colberg.org/projects/molsim/
>
> This is, for now, only a specification and not a library, but I think that
> it
> provides a good basis for molecular simulations while being useful to other
> kind
> of simulations.
>
> To handle varying number of particles, it is possible to store the data in a
> [T][N][D] dataset (T is the number of timesteps, N the number of particles
> and D
> the number of spatial dimensions.) in which a chunk size is defined along
> the
> particle-wise axis. That way, you can take N to be N_max, the maximum number
> of
> particles, and the space taken on disk will be zero for the non-written-to
> chunks.
>
> I hope it helps and welcome comments!
>
> Pierre de Buyl
>
>> Wed, 16 Mar 2011 07:15:00 -0700
>> Yngve,
>> especially if the number of particles might change over time, using 1D
>> arrays might be more appropriate, possibly combined with index lookup
>> arrays
>> that allows to identify particles at T0 to T1 and nice versa. I'm using
>> such a 1D layout for particles and particle trajectories as part of my
>> F5 library, here is a coding example on how to write particle positions
>> with some fields given on them (it's all HDF5-based):
>> http://svn.origo.ethz.ch/wsvn/f5/doc/Particles_8c-example.html
>> It's inefficient only very few particles because the overhead on
>> the metadata structure is then more prominent, but for millions
>> of particles that would be well. I haven't tried this structure yet
>> with a million timesteps, which would lead to a million groups then.
>> I would assume HDF5 is able to handle such a situation well, but
>> it could make sense to bundle groups of similar timesteps hierarchically,
>> too.
>>
>> On Wed, 16 Mar 2011 08:29:27 -0500, Yngve Inntjore Levinsen :
>>
>>> Yes of course Francesc, I was thinking float = half of 64bit instead of
>>> 4x 8bit I was thinking that
>>> it might be beneficial to keep the size in powers of 2, so that is why I
>>> chose 1024 and not 1000. I keep
>>> it as a variable so I can easily change it.
>>> Werner, I was thinking that I should eventually move to a sequence of 1D
>>> arrays, but it requires
>>> slightly more rewriting. The number of lines I have to write depends on
>>> whether or not the particle is
>>> still alive. I am starting out with an equal amount of particles, but
>>> have no means to know if I need to
>>> write the position of a given particle 0 times or one million times.
>>> Typically I have something like 1
>>> million timesteps, but I do not write down trajectories all the time
>>> (when is dependent on the Monte
>>> Carlo so no way to know in advance)
>>> Ideally I would've written all analysis into the code itself so I didn't
>>> have to write the trajectories
>>> all the time (I have not made this choice!), but that requires too much
>>> work for me to handle at the
>>> moment. Using HDF5 will reduce the storage space needed by about a factor
>>> 6 from my estimates, improve
>>> precision, and significantly reduce CPU hours needed as well. This is
>>> already a great improvement!
>>> Cheers,
>>> Yngve
>>> On Wednesday 16 March 2011 02:09:36 PM Werner Benger wrote:
>>>
>>>> Hi,
>>>> what's the reason for using a 2D extendable dataset instead of a
>>>> sequence
>>>> of 1D arrays
>>>> in a group, using one group per time step? How many particles and time
>>>> steps do you
>>>> have typically? I assume in your case the number of particles is
>>>> constant
>>>> over time?
>>>> Cheers,
>>>> Werner
>>>> On Wed, 16 Mar 2011 03:52:10 -0500, Yngve Inntjore Levinsen >> >>>> <> wrote:
>>>> > Dear hierarchical people,
>>>> >
>>>> > I have currently converted a piece of code from using a simple ascii
>>>> > format for output into using HDF5. What the code does is at every
>>>> > iteration dumping some information about particle
>>>> > energy/trajectory/position to the ascii file (this is a particle
>>>> > tracking code).
>>>> >
>>>> > Initially I then did the same with the HDF5 library, having a
>>>> > unlimited
>>>> > row dimension in a 2D array and using h5extend_f to extend by one
>>>> > element each time and writing a hyperslab of one row to the file. As
>>>> > some (perhaps most) of you might have guessed or know already, this
>>>> > was
>>>> > a rather bad idea. The file (without compression) was about the same
>>>> > size as the ascii file (but obviously with higher precision), and
>>>> > reading the file in subsequent analysis was at least an order of
>>>> > magnitude slower.
>>>> >
>>>> > I then realized that I probably needed to write less frequently and
>>>> > rather keeping a semi-large hyperslab in memory. I chose a hyperslab
>>>> > of > 1000 rows, but otherwise
>>>> using the same procedure. This seems to be both
>>>> > fast and with compression creating quite a bit smaller file. I tried
>>>> > even larger slabs, but did not see any speed improvement in my initial
>>>> > testing
>>>> >
>>>> > My question really was just if there are some recommended ways to do
>>>> > this? I would imagine I am not the first that want to use HDF5 in this
>>>> > way, dumping some data at every iteration of a given simulation,
>>>> > without
>>>> > having to keep it all in memory until the end?
>>>> >
>>>> > Thanks for all explanations/suggestions/experiences related to this
>>>> > problem you can provide me so I can make the best design choices in my
>>>> > program!
>>>> >
>>>> > Cheers,
>>>> > Yngve
>>>>
>
> -----------------------------------------------------------
> Pierre de Buyl
> Physique des Systèmes Complexes et Mécanique Statistique - Université Libre
> de Bruxelles
> Chemical Physics Theory Group - University of Toronto
> web: http://homepages.ulb.ac.be/~pdebuyl/
> -----------------------------------------------------------
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362