Does HDF5 support random access for compressed data?

xiebopublic · January 30, 2013, 3:09am

Hello,

I am a HDF5 newbie. Sorry if this is a stupid question.

I know HDF5 supports compression. But does it support random access for compressed data?

For example, a compressed dataset has 3 million samples, and I just want to read 100 samples (1,000,000 ~ 1,000,099), not the entire 3 million samples. How could HDF5 fast identify the logic chunk for the 100 samples (1,000,000 ~ 1,000,099) and then decompress it?

Best Regards,
Bo Xie

gheber · January 30, 2013, 2:04pm

Bo, how are you? This is a fair question and the answer is, yes.

Have a look at section 5.4.5 in the User's Guide
(http://www.hdfgroup.org/HDF5/doc/UG),
but here's the gist:

Compression is only supported with so-called chunked dataset layouts.
That means that your dataset is broken up into chunks or tiles of
a size that you determine at creation time. HDF5 internally maintains
a chunk index that allows it to quickly retrieve (and compress/decompress)
just the chunk(s) affected by an I/O operation.

For the subtleties of chunking you should check out Elena's talk

http://www.hdfgroup.org/pubs/presentations/HDF5-EOSXIII-Advanced-Chunking.pd
f

G.

···

From: Hdf-forum [mailto:hdf-forum-bounces@hdfgroup.org] On Behalf Of
xiebopublic@gmail.com
Sent: Tuesday, January 29, 2013 9:09 PM
To: hdf-forum
Subject: [Hdf-forum] Does HDF5 support random access for compressed data?

Hello,

I am a HDF5 newbie. Sorry if this is a stupid question.

I know HDF5 supports compression. But does it support random access for
compressed data?

For example, a compressed dataset has 3 million samples, and I just want to
read 100 samples (1,000,000 ~ 1,000,099), not the entire 3 million samples.
How could HDF5 fast identify the logic chunk for the 100 samples (1,000,000
~ 1,000,099) and then decompress it?

Best Regards,
Bo Xie

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Does HDF5 support random access for compressed data?