How many chunks is not considered large?

Tuan · August 13, 2010, 4:16am

Hi all
I read some performance issues and it's said that large number of chunk
can increase the B-tree which eventually decrease the performance. Also, the
default cache is the smaller of 1MB or 512 chunks.
My situation is that my simulation program will produce a large data, let
say 1-2GB. So, I want to use HDF5 with chunk and SZIP. To achieve a good
enough performance, may I ask you what is the recommend (1) chunk size, (2)
cache size. The data is written to SAS harddrive. Please let me know any
other issues that I should notice.

Thank you,
Tuan

Eryk · September 17, 2010, 7:54am

Hi,

I do struggle with similar questions. I have datasets with dimensions
1 and 2, were d1 can very from 10 - 20000k
and d2 veries from 2k - 700k.

I tried various chunk sizes 128, 256, 512, and did various test with

H5Pset_cache(fileaccessProp_, 1, nr_loaded_chunks , bytes ,
preemption_policy_ );

// for instance I configure:
nr_loaded_chunks = static_cast<size_t>( (maxdim_ / minchdim + 10 ) * 1.2 );
bytes = static_cast<hsize_t>( 1.2 * sT * maxdim_ * maxchdim );

with minchdim, maxchdim = 128, 256, 512 respectively, and sT size of
storage type in bytes.

I played around with various chunk dimensionsion and was changing file
access properties (the multiplicator 1.2 ).
Problem is that, although I am able to optimize the access for a
particular dataset dimension e.g. 700x150k
The performance will break down for a dataset of different size, it
can get much worse even for a dataset of smaller size.
e.g. 500x70k.

The other problem I have is. If I created an dataset and closed the
file with some optimized set of access properties how can I retrieve
them to use them when opening the file?

Eryk

···

On 13 August 2010 06:16, Hoang Trong Minh Tuan <hoangtrongminhtuan@gmail.com> wrote:

Hi all
I read some performance issues and it's said that large number of chunk
can increase the B-tree which eventually decrease the performance. Also, the
default cache is the smaller of 1MB or 512 chunks.
My situation is that my simulation program will produce a large data, let
say 1-2GB. So, I want to use HDF5 with chunk and SZIP. To achieve a good
enough performance, may I ask you what is the recommend (1) chunk size, (2)
cache size. The data is written to SAS harddrive. Please let me know any
other issues that I should notice.

Thank you,
Tuan

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Witold Eryk Wolski

Heidmark str 5
D-28329 Bremen
tel.: 04215261837