Read Performance of chunked data set.

Eryk · April 28, 2009, 3:59pm

Dear list members,

I'm trying to tune the performance of an HDF Datatable.

The Hdf5 file contains just this one table.

File access properties i use are:
H5::FileAccPropList caccprop;
caccprop.setCache(0 , 1e7 , 1e10 , 1.);

Data set properties are:
DataSpace with maximal dimensions 10k,150k
cparms.setChunk( 2, {2048,2048{} );
cparms.setSzip(H5_SZIP_NN_OPTION_MASK,16);
H5::DataSet dataset = file.createDataSet( "profileSpectra" ,
H5::PredType::NATIVE_INT, mspace1, cparms);

I fill the dataset with 10k arrays of length 150k.
Time used to write and compress the data is 65s 1.08333min and I am
more than happy with this performance.

However, when it comes to reading the data column and row wise I the
time is worse.
The file access properties are the same as above (caccprop.setCache(0
, 1e7 , 1e10 , 1.)

reading 150k column vectors of length 10k
914s 15.2333min

reading 10k row vectors of length 150k
644s 10.7333min

To loop to read the columns looks like this:

int nrrow = 10000; // number of rows
int column[10000]; // buffer for column to be read
for(int i = 0 ; i < 150000; ++i)//read 1000 columns
{
      hsize_t offset[2] = { 0, i };
      hsize_t count[2] = { nrrow, 1 };
      /*
      * Define hyperslab and read.
      */
      filespace.selectHyperslab( H5S_SELECT_SET, count, offset );
      dataset.read( column, H5::PredType::NATIVE_INT, mspace2, filespace );
}

By looking at the memory usage of my program it seems that the while
writing the chunks are kept in the cache (about 1gb memory usage)
while that is not the case while reading.

I bet that the read performance could be tuned with
H5::DSetMemXferPropList. However, I have no clue how to configure it?

Would you please make suggestions how to improve the read performance?

Thank you
Regards
Eryk

···

--
Witold Eryk Wolski

Heidmark str 5
D-28329 Bremen
tel.: 04215261837

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Read Performance of chunked data set.