Build a 1D DataSet UNLIMITED with h5py

Hye,
While the C++ interface allows it, the Python h5py interface does not seem to allow the creation of a 1D unlimited array.
Moreover all the examples of unlimited tables are given with at least dimension like: maxshape = (None, 3).
I have the possibility of doing (None, 1) but this remains a declaration of a bi-dim table.
I cannot use this workaround because it does not meet the characteristics of the format described for VTK (VTK File Formats - VTK documentation).
Did I miss something? Is there another way to declare a 1D array unlimited?
Thank you in advance for any help you can give me.

Did you try maxshape=(None,)?

Yes.

parent.create_dataset(name=child, dtype='i8', shape=(1),
                                    dtype='i8', maxshape=(None), chunks=True)
parent_add_child.resize(2)
>> RuntimeError: Unable to set dataset extent (dimension cannot exceed the existing maximal size (new: 2 max: 1))

I think I understand the general logic (since I get there in 2D maxshape=(None, 1)).

I think there is a wrong interpretation in dataset.py of h5py line 64:

tmp_shape = maxshape if maxshape is not None else shape

when maxshape=(None) is equivalent to maxshape=None (undefined) then we use shape value.

>>> m=(None)
>>> m is None
True
>>> m=(None,1)
>>> m is None
False
>>> m=[None]
>>> m is None
False
>>> m=[None,1]
>>> m is None
False

The documentation (book, examples and other) never addresses the case of an unlimited 1D array, hence my question here. ;(

The solution could be this, after looking at the h5py moduel code :

parent.create_dataset(name=child, dtype='i8', shape=(1),
                                    dtype='i8', maxshape=(h5py.UNLIMITED), chunks=True)
parent_add_child.resize(2)

I will see if the propagation of this solution in my code is compliant.

Thank’s, @ajelenak

Hi @jacques-bernard.leki ,

The code below:

import h5py


with h5py.File('test.h5', mode='w') as h5f:
    h5f.create_dataset(
        name='child', dtype='i8', shape=(2,), maxshape=(None,), chunks=True)

Produces a 1D dataset with unlimited size.

$ h5dump -p test.h5
GROUP "/" {
   DATASET "child" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SIMPLE { ( 2 ) / ( H5S_UNLIMITED ) }
      STORAGE_LAYOUT {
         CHUNKED ( 2 )
         SIZE 0
      }
      FILTERS {
         NONE
      }
      FILLVALUE {
         FILL_TIME H5D_FILL_TIME_ALLOC
         VALUE  H5D_FILL_VALUE_DEFAULT
      }
      ALLOCATION_TIME {
         H5D_ALLOC_TIME_INCR
      }
      DATA {
      (0): 0, 0
      }
   }
}
}
1 Like

Hi @ajelenak,

Oups!

Indeed, my error is pythonesque:

(2,) and (None,)

is not the same as

(2) and (None)

if I modify your example:

>>> with h5py.File('test.h5', mode='w') as h5f:
...     dset = h5f.create_dataset(
...         name='child', dtype='i8', shape=(1), maxshape=(None), chunks=True)
...     dset.resize([4])
...     dset[3] = 2

it’s wrong:

RuntimeError: Unable to set dataset extent (dimension cannot exceed the existing maximal size (new: 4 max: 1))

But, this

>>> with h5py.File('test.h5', mode='w') as h5f:
...     dset = h5f.create_dataset(
...         name='child', dtype='i8', shape=(1,), maxshape=(None,), chunks=True)
...     dset.resize([4])
...     dset[3] = 2

it’s good!

A trapping python subtlety that I didn’t know about.
I was complaining that None and (None) was the same thing whereas with this comma (None,), it is no longer considered the same. Well seen.

I’m really sorry, I didn’t understand your comment before.

Thank you for your help.

PS: This also allows me to understand why when I passed a tuple of a single element without putting a comma, I got feedback that the object was not iterable. I couldn’t understand this subtlety. I find it tricky that Python allows you to modify the tuple type of an element because it is not mutable!
Many thanks for this understanding.