Reading a compressed 4D data with SDreaddata

sudipta_sarkar · March 29, 2013, 3:53pm

Dear All,
I noticed that trying to read a 4D dataset that has been compressed using
GZIP level 4 compression, using SDreaddata and line by line takes a
woefully long time. While if I uncompress the same data using hrepack and
then try to read it again the same way, this time it can be read in a snap!!
I am just wondering if anyone else has come across this same issue and if
there's a way to speed up reading compressed datasets that I may be
missing.
Thanks

epourmal · March 31, 2013, 4:20pm

Hello,

If I understand correctly you are reading an SDS by rows.

If the SDS is not chunked, the HDF4 library has to read the whole dataset, uncompress, and then return the requested subset (e.g., one row). If you are reading N rows, this will be done N times. Try to read the whole dataset to an application buffer instead of sub-setting it by rows. The application will need to retrieve the subset from the buffer.

When the SDS is chunked, performance may be bad for the same reason (HDF4 is uncompressing the chunk(s) again and again if they are not in cache). In this case you may try to tune the application by setting HDF4 chunk cache parameters with the SDsetchunkcache function http://www.hdfgroup.org/release4/doc/RefMan_html/RM_Section_II_SD.html#wp516520

To find if an SDS is chunked call the SDgetchunkinfo function http://www.hdfgroup.org/release4/doc/RefMan_html/RM_Section_II_SD.html#wp442153.

[ Based on your description I doubt that the dataset in question is chunked, because by default, the HDF4 chunk cache will have the whole rows of chunks, and performance shouldn't be bad at all ... but just in case...]

Elena

···

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Mar 29, 2013, at 10:53 AM, sudipta sarkar wrote:

Dear All,
I noticed that trying to read a 4D dataset that has been compressed using GZIP level 4 compression, using SDreaddata and line by line takes a woefully long time. While if I uncompress the same data using hrepack and then try to read it again the same way, this time it can be read in a snap!!
I am just wondering if anyone else has come across this same issue and if there's a way to speed up reading compressed datasets that I may be missing.
Thanks

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Reading a compressed 4D data with SDreaddata