What do you want to see in "HDF5 2.0"?


#21

Actually, we are almost there after new indexing for chunked datasets were introduced in 1.10.0.

Current APIs and programming model still require to use H5Pset_chunk but this call could be omitted if there is only one chunk (i.e, contiguos storage). Then compression can also be used on “contiguous” dataset.

I couldn’t convince Quincey to introduce this change in 1.10.0 and the rest is history.

Elena


#22

Requests:

  • Set filters with ID code strings, not numbers.
  • Set filter parameters with string keyword arguments.
  • Official registry for filter ID code strings. The current registry is a good start, if made official for existing ID code strings, not just the numbers.

#23

The use of contiguous storage is well entrenched in the HDF5 universe. It has certain advantages such as plain simplicity, and optimal subset access on local storage. I would prefer that support for contiguous is sustained. If compression is desired, just go to chunked storage, as intended by design.


#24

Why should contiguous storage not just be a simple case of chunked storage? Contiguous storage is basically just chunked storage with a single chunk and no filters, no?


#25

Could you give an example of an ID code string?


#26

I think I understand and agree with the spirit of this request. Its much easier to remember “gzip” or maybe “lzma2” as the identifier for a filter than “032105”. That said, can’t this already be achieved by adding a layer on top of the existing interface that keeps a mapping between strings and numbers? I don’t think the table would ever get so large that a linear search of it would have a negative performance impact. And, I honestly don’t think this needs to wait for an HDF5-2.0 or for THG to implement it to make it happen. It may already be implemented somewhere in the world :wink:


#27

“Filter ID code strings”. That is a mouthful, sorry, but I was trying to be complete in more than one way. I am referring to the “Name” column in the HDF5 registry, in addition to simple names for the built-in filters. Here are a few examples.

gzip, n-bit, scale-offset, shuffle, BZIP2, LPC-Rice


#28

You mean like h5py? https://docs.h5py.org/en/stable/high/dataset.html#filter-pipeline

dset = f.create_dataset("zipped", (100, 100), compression="gzip")

#29

Yes, like that, but build the name registry into the HDF5 core library, so that standard filter names are available and guaranteed, outside of python.