I am a HDF5 newbie. Sorry if this is a stupid question.
I know HDF5 supports compression. But does it support random access for compressed data?
For example, a compressed dataset has 3 million samples, and I just want to read 100 samples (1,000,000 ~ 1,000,099), not the entire 3 million samples. How could HDF5 fast identify the logic chunk for the 100 samples (1,000,000 ~ 1,000,099) and then decompress it?
Compression is only supported with so-called chunked dataset layouts.
That means that your dataset is broken up into chunks or tiles of
a size that you determine at creation time. HDF5 internally maintains
a chunk index that allows it to quickly retrieve (and compress/decompress)
just the chunk(s) affected by an I/O operation.
For the subtleties of chunking you should check out Elena's talk
I am a HDF5 newbie. Sorry if this is a stupid question.
I know HDF5 supports compression. But does it support random access for
compressed data?
For example, a compressed dataset has 3 million samples, and I just want to
read 100 samples (1,000,000 ~ 1,000,099), not the entire 3 million samples.
How could HDF5 fast identify the logic chunk for the 100 samples (1,000,000
~ 1,000,099) and then decompress it?