What do you want to see in "HDF5 2.0"?


Actually, we are almost there after new indexing for chunked datasets were introduced in 1.10.0.

Current APIs and programming model still require to use H5Pset_chunk but this call could be omitted if there is only one chunk (i.e, contiguos storage). Then compression can also be used on “contiguous” dataset.

I couldn’t convince Quincey to introduce this change in 1.10.0 and the rest is history.




  • Set filters with ID code strings, not numbers.
  • Set filter parameters with string keyword arguments.
  • Official registry for filter ID code strings. The current registry is a good start, if made official for existing ID code strings, not just the numbers.


The use of contiguous storage is well entrenched in the HDF5 universe. It has certain advantages such as plain simplicity, and optimal subset access on local storage. I would prefer that support for contiguous is sustained. If compression is desired, just go to chunked storage, as intended by design.


Why should contiguous storage not just be a simple case of chunked storage? Contiguous storage is basically just chunked storage with a single chunk and no filters, no?


Could you give an example of an ID code string?


I think I understand and agree with the spirit of this request. Its much easier to remember “gzip” or maybe “lzma2” as the identifier for a filter than “032105”. That said, can’t this already be achieved by adding a layer on top of the existing interface that keeps a mapping between strings and numbers? I don’t think the table would ever get so large that a linear search of it would have a negative performance impact. And, I honestly don’t think this needs to wait for an HDF5-2.0 or for THG to implement it to make it happen. It may already be implemented somewhere in the world :wink:


“Filter ID code strings”. That is a mouthful, sorry, but I was trying to be complete in more than one way. I am referring to the “Name” column in the HDF5 registry, in addition to simple names for the built-in filters. Here are a few examples.

gzip, n-bit, scale-offset, shuffle, BZIP2, LPC-Rice


You mean like h5py? https://docs.h5py.org/en/stable/high/dataset.html#filter-pipeline

dset = f.create_dataset("zipped", (100, 100), compression="gzip")


Yes, like that, but build the name registry into the HDF5 core library, so that standard filter names are available and guaranteed, outside of python.