1-D Datasets: Will Chunking Help?

Hello,

I'm a newcomer to the HDF5 World. I have tried to compare the performance
of our existing binary file i/o against HDF5 and I'm seeing modest
improvements in speed with HDF5.

The next step for me is to experiment with advanced HDF5 topics like
chunking and compression. Based on what I read in the HDF5 documentation,
chunking can come in handy when one knows the access patterns of their
dataset ahead of time. In my case, my dataset is entirely composed of
one-dimensional, double precision float arrays. Most of these arrays would
be of the same size, but some of them will considerably be smaller than
most of the other arrays. For any given read, I would need to read a single
1D array in its entirety. Given my scenario, I feel, I wouldn't gain any
performance improvement by using the chunking technique.

Is my analysis correct? If not, please help me understand how chunking will
help my cause.

Appreciate your help,
MDH.

Welcome to HDF!

If you would like to use compression, you will need to enable chunking. Advantage of course is that you'll need less space on disk. If performance is critical though, you'll want to test what effect chunking (and different compression filters) have. Compression requires a certain amount of CPU overhead, but you may see performance gains overall because less disk I/o will be needed. (on the other hand, you may need more seeks because chunks for a given dataset may not be contiguous/sequential...). As they say, "your milage may vary", so try out some different options and see what works best.

Regards,
John Readey
HDFGroup

ยทยทยท

From: mdhlogins <mdhlogins@gmail.com<mailto:mdhlogins@gmail.com>>
Reply-To: "hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>" <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Monday, October 20, 2014 at 12:18 AM
To: "hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>" <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] 1-D Datasets: Will Chunking Help?

Hello,

I'm a newcomer to the HDF5 World. I have tried to compare the performance of our existing binary file i/o against HDF5 and I'm seeing modest improvements in speed with HDF5.

The next step for me is to experiment with advanced HDF5 topics like chunking and compression. Based on what I read in the HDF5 documentation, chunking can come in handy when one knows the access patterns of their dataset ahead of time. In my case, my dataset is entirely composed of one-dimensional, double precision float arrays. Most of these arrays would be of the same size, but some of them will considerably be smaller than most of the other arrays. For any given read, I would need to read a single 1D array in its entirety. Given my scenario, I feel, I wouldn't gain any performance improvement by using the chunking technique.

Is my analysis correct? If not, please help me understand how chunking will help my cause.

Appreciate your help,
MDH.