Hi all
I read some performance issues and it's said that large number of chunk
can increase the B-tree which eventually decrease the performance. Also, the
default cache is the smaller of 1MB or 512 chunks.
My situation is that my simulation program will produce a large data, let
say 1-2GB. So, I want to use HDF5 with chunk and SZIP. To achieve a good
enough performance, may I ask you what is the recommend (1) chunk size, (2)
cache size. The data is written to SAS harddrive. Please let me know any
other issues that I should notice.
// for instance I configure:
nr_loaded_chunks = static_cast<size_t>( (maxdim_ / minchdim + 10 ) * 1.2 );
bytes = static_cast<hsize_t>( 1.2 * sT * maxdim_ * maxchdim );
with minchdim, maxchdim = 128, 256, 512 respectively, and sT size of
storage type in bytes.
I played around with various chunk dimensionsion and was changing file
access properties (the multiplicator 1.2 ).
Problem is that, although I am able to optimize the access for a
particular dataset dimension e.g. 700x150k
The performance will break down for a dataset of different size, it
can get much worse even for a dataset of smaller size.
e.g. 500x70k.
The other problem I have is. If I created an dataset and closed the
file with some optimized set of access properties how can I retrieve
them to use them when opening the file?
Hi all
I read some performance issues and it's said that large number of chunk
can increase the B-tree which eventually decrease the performance. Also, the
default cache is the smaller of 1MB or 512 chunks.
My situation is that my simulation program will produce a large data, let
say 1-2GB. So, I want to use HDF5 with chunk and SZIP. To achieve a good
enough performance, may I ask you what is the recommend (1) chunk size, (2)
cache size. The data is written to SAS harddrive. Please let me know any
other issues that I should notice.