Chunk cache

Hello,
I'm working on a project for storing some scientific data with HDF5 in
chunked datasets. Some days ago I decided to test with different settings
for the chunk cache but there is no difference in the program performance.
I have done the following:
1. I have created a test file containing one dataset with size of
10000x5000 and double data type.
2. In a separate application I open the file and after that I call
H5Pset_cache with different parameters.
3. After reading N (i.e. 10000) times the data (i.e. 20x30, 50x50 or so) in
random place in the dataset with different cache parameters there is no
difference in the execution time.

I have checked the documentation and the h5pmem.c example and for now I
can't figure what may be wrong.

I will appreciate any help or examples retarding the chunk cache.

Thanks in advance.

Hello,

Your data set is quite small (400 MB), so the OS can keep all file
blocks in memory. Hence you will see not much difference. Try one that
does not fit in memory.
The chunk shape is an important factor for the performance for very
large data sets. You should try to match it as closely as possible to
the common access patterns to your data.

Cheers,
Ger

Alexander Tzokev <tzokev@gmail.com> 8/19/2013 11:12 AM >>>

Hello,

I'm working on a project for storing some scientific data with HDF5 in
chunked datasets. Some days ago I decided to test with different
settings for the chunk cache but there is no difference in the program
performance.

I have done the following:

1. I have created a test file containing one dataset with size of
10000x5000 and double data type.

2. In a separate application I open the file and after that I call
H5Pset_cache with different parameters.

3. After reading N (i.e. 10000) times the data (i.e. 20x30, 50x50 or
so) in random place in the dataset with different cache parameters there
is no difference in the execution time.

I have checked the documentation and the h5pmem.c example and for now I
can't figure what may be wrong.

I will appreciate any help or examples retarding the chunk cache.

Thanks in advance.

Dear list,

we have some use cases (in robotics and medical instrumentation) in which
we want to use HDF5 to communicate compound data structures between
different sub-systems. The idea is to use the "memory driver" (H5FD_CORE)
with standard HDF5 file operations to write a compound data structure on
one device, to send over the raw buffer to another device, where the
compound data structure is read from the received buffer. (Our use cases
target "fast" communication with relatively "small" data structures, at
least in the context of many HDF5 applications such as HPC.)

Some questions I have in this context:
- where can we find how many bytes exactly the in-memory HDF5 compound data
   structure occupies?
- how can we make sure that all HDF5 data (raw data + meta data) is stored contiguously?
-do we have to use one buffer for the raw data and one for the meta data?
- how do we make sure that the sending over of the raw buffer data does not
   lead to possible problems with differences in, say, little endian and big
   endian systems?

We have already experimented with the H5FDdsm project
  <https://hpcforge.org/projects/h5fddsm/>
but this is based on a full MPI middleware, which is often too big and slow
for some of our "realtime" use cases. The examples that come with this
project "work", but the documentation is not really very clear about how
exactly they solve the above-mentioned problems of ours. (I am convinced
that their code _does_ solve our problem, but I just don't find how...)

Any information or pointers to code snippets are highly appreciated! Thanks!

Best regards,

Herman Bruyninckx

Hi, Herman!

  I think the HDF5 file format specification [1] may help you.

  By the way, what processors (PIC, DSP, ARM, etc.) and compilers (CCS
PIC-C, RTAI, etc.) do your robotics/medical instruments (will) use?
Also, how do your devices communicate (I2C, CAN, SPI, TCP/IP, etc.) ?

  I'm quite interested in hearing HDF5 implementation success stories
in real-time embedded systems as you described in your use case.

  Regards,

[1] http://www.hdfgroup.org/HDF5/doc/H5.format.html

···

--
HDF: Software that Powers Science

On Tue, Aug 20, 2013 at 2:13 AM, Herman Bruyninckx <Herman.Bruyninckx@mech.kuleuven.be> wrote:

Dear list,

we have some use cases (in robotics and medical instrumentation) in which
we want to use HDF5 to communicate compound data structures between
different sub-systems. The idea is to use the "memory driver" (H5FD_CORE)
with standard HDF5 file operations to write a compound data structure on
one device, to send over the raw buffer to another device, where the
compound data structure is read from the received buffer. (Our use cases
target "fast" communication with relatively "small" data structures, at
least in the context of many HDF5 applications such as HPC.)

Some questions I have in this context:
- where can we find how many bytes exactly the in-memory HDF5 compound data
  structure occupies?
- how can we make sure that all HDF5 data (raw data + meta data) is stored
contiguously?
-do we have to use one buffer for the raw data and one for the meta data?
- how do we make sure that the sending over of the raw buffer data does not
  lead to possible problems with differences in, say, little endian and big
  endian systems?

We have already experimented with the H5FDdsm project
<https://hpcforge.org/projects/h5fddsm/>
but this is based on a full MPI middleware, which is often too big and slow
for some of our "realtime" use cases. The examples that come with this
project "work", but the documentation is not really very clear about how
exactly they solve the above-mentioned problems of ours. (I am convinced
that their code _does_ solve our problem, but I just don't find how...)

Any information or pointers to code snippets are highly appreciated! Thanks!

Best regards,

Herman Bruyninckx

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Hi, Herman!

I think the HDF5 file format specification [1] may help you.

I thought that too, but I got lost in the huge amount of detail in that
documentation :frowning:

By the way, what processors (PIC, DSP, ARM, etc.) and compilers (CCS
PIC-C, RTAI, etc.) do your robotics/medical instruments (will) use?
Also, how do your devices communicate (I2C, CAN, SPI, TCP/IP, etc.) ?

We want to use a multitude of processor architectures (starting with PC,
ARM and FPGA) and a multitude of communication "middleware" (starting with
TCP/IP, UDP and EtherCat)

I'm quite interested in hearing HDF5 implementation success stories
in real-time embedded systems as you described in your use case.

Me too :slight_smile:

···

On Wed, 21 Aug 2013, H. Joe Lee wrote:

Regards,

[1] http://www.hdfgroup.org/HDF5/doc/H5.format.html
--
HDF: Software that Powers Science

On Tue, Aug 20, 2013 at 2:13 AM, Herman Bruyninckx > <Herman.Bruyninckx@mech.kuleuven.be> wrote:

Dear list,

we have some use cases (in robotics and medical instrumentation) in which
we want to use HDF5 to communicate compound data structures between
different sub-systems. The idea is to use the "memory driver" (H5FD_CORE)
with standard HDF5 file operations to write a compound data structure on
one device, to send over the raw buffer to another device, where the
compound data structure is read from the received buffer. (Our use cases
target "fast" communication with relatively "small" data structures, at
least in the context of many HDF5 applications such as HPC.)

Some questions I have in this context:
- where can we find how many bytes exactly the in-memory HDF5 compound data
  structure occupies?
- how can we make sure that all HDF5 data (raw data + meta data) is stored
contiguously?
-do we have to use one buffer for the raw data and one for the meta data?
- how do we make sure that the sending over of the raw buffer data does not
  lead to possible problems with differences in, say, little endian and big
  endian systems?

We have already experimented with the H5FDdsm project
<https://hpcforge.org/projects/h5fddsm/>
but this is based on a full MPI middleware, which is often too big and slow
for some of our "realtime" use cases. The examples that come with this
project "work", but the documentation is not really very clear about how
exactly they solve the above-mentioned problems of ours. (I am convinced
that their code _does_ solve our problem, but I just don't find how...)

Any information or pointers to code snippets are highly appreciated! Thanks!

Best regards,

Herman Bruyninckx

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

You have to pay attention to the chunk size and access pattern.
E.g. a 3-dim dataset with shape [1000,1000,1000] can be accessed in many
ways. E.g. successive xy-planes, xz-planes or yz-planes. The chunk shape
can be defined such that a certain pattern is favoured. Say you always
access successive xz-planes, it makes sense to define a chunk shape
[c1,1,c3]. However, if you sometimes also access xy-planes, such a chunk
shape is very bad because you need 1000 chunks to get all data of the
y-axis and it is the question if you have sufficient memory to keep all
those chunks (to service the next xy-plane without rereading all those
chunks).
So if you have different access patterns, the chunk shape has to be a
compromise such that all patterns can be serviced reasonably well.

I attach a C++ test program I wrote some time ago to test various access
patterns. I hope it is clear enough.

Ger

Alexander Tzokev 08/22/13 7:58 AM >>>

Hello Ger,

thank you for the reply. We noticed that the dataset size is small and
fits into memory and performed new experiments with 8GB single dataset.
The RAM on the test workstation is 4GB.

Unfortunately the results are the same. There is no difference in
execution time and IO operations with different values for mdc_nelmst,
rdcc_nelemts, rdcc_nbytes and rdcc_w0.

If needed I can post the source code for the example program and some
test results.

tHDF5.cc (13.3 KB)

Hello Ger,
thank you for the reply. We noticed that the dataset size is small and fits
into memory and performed new experiments with 8GB single dataset. The RAM
on the test workstation is 4GB.

Unfortunately the results are the same. There is no difference in execution
time and IO operations with different values for mdc_nelmst, rdcc_nelemts,
rdcc_nbytes and rdcc_w0.

If needed I can post the source code for the example program and some test
results.