Compare HSDS with local h5 files error: Chunks not allowed for scalar datasets

Hi,

I’m comparing identical files, one uploaded to my local drive (Jupyter Lab), and the other after it has been loaded in hsds.

There are many differences being reported, even though the datasets haven’t been modified, as the HSDIFF command prints many line e.g.
“'dataset: </GEOSCIENCE/Data/{ff1a431e-6cef-466a-9e0d-dd2cb665830e}/Visible> and </GEOSCIENCE/Data/{ff1a431e-6cef-466a-9e0d-dd2cb665830e}/Visible>”’

then eventually throws an error: “ValueError: Chunks not allowed for scalar datasets.”

Could you point me to documentation on how to use this tool properly? I was hoping to implement partial upload of data differences and some version control from a local desktop client using HSDS, and H5PYD. Full error is below:

‘’‘dataset: </GEOSCIENCE/Data/{ff1a431e-6cef-466a-9e0d-dd2cb665830e}/Visible> and </GEOSCIENCE/Data/{ff1a431e-6cef-466a-9e0d-dd2cb665830e}/Visible>
dataset: </GEOSCIENCE/Data/{ff7a193c-ea77-437c-a6dc-379c73fa75b9}/Visible> and </GEOSCIENCE/Data/{ff7a193c-ea77-437c-a6dc-379c73fa75b9}/Visible>
Traceback (most recent call last):
File “/opt/conda/bin/hsdiff”, line 8, in
sys.exit(main())
File “/opt/conda/lib/python3.9/site-packages/h5pyd/_apps/hsdiff.py”, line 559, in main
rc = diff_file(fin, fout, verbose=verbose, nodata=nodata, noattr=noattr, quiet=quiet)
File “/opt/conda/lib/python3.9/site-packages/h5pyd/_apps/hsdiff.py”, line 364, in diff_file
fin.visititems(object_diff_helper)
File “/opt/conda/lib/python3.9/site-packages/h5py/_hl/group.py”, line 612, in visititems
return h5o.visit(self.id, proxy)
File “h5py/_objects.pyx”, line 54, in h5py._objects.with_phil.wrapper
File “h5py/_objects.pyx”, line 55, in h5py._objects.with_phil.wrapper
File “h5py/h5o.pyx”, line 355, in h5py.h5o.visit
File “h5py/h5o.pyx”, line 302, in h5py.h5o.cb_obj_simple
File “/opt/conda/lib/python3.9/site-packages/h5py/_hl/group.py”, line 611, in proxy
return func(name, self[name])
File “/opt/conda/lib/python3.9/site-packages/h5pyd/_apps/hsdiff.py”, line 352, in object_diff_helper
diff_dataset(obj, ctx)
File “/opt/conda/lib/python3.9/site-packages/h5pyd/_apps/hsdiff.py”, line 303, in diff_dataset
it = ChunkIterator(src)
File “/opt/conda/lib/python3.9/site-packages/h5pyd/_apps/chunkiter.py”, line 81, in init
self._layout = guess_chunk(self._shape, None, dset.dtype.itemsize)
File “/opt/conda/lib/python3.9/site-packages/h5pyd/_apps/chunkiter.py”, line 35, in guess_chunk
raise ValueError(“Chunks not allowed for scalar datasets.”)
ValueError: Chunks not allowed for scalar datasets. ‘’’

Thanks for reporting this!
I think some recent changes in h5pyd caused a regressions when using the hsdiff utility.
I have what should be a fix checked into h5pyd in the master branch now. Can you give it a try? If you are using HDF Lab, just run this as your first cell:

import sys
!{sys.executable} -m pip install git+https://git@github.com/HDFGroup/h5pyd --upgrade

Let me know how this works with your files. Even with this fix I suspect there some areas where hsdiff will report differences – for example with object reference datasets. But let’s see how it goes.

Hi John,

Thanks for the quick reply!

Yes, upgrading the h5pyd fixed this issue. The process runs without an error now.

Awesome! Let us know if you run into any other issues.