Traceback (most recent call last):
File “./create_hdf5.py”, line 14, in main
ds = h5f.create_dataset(‘/var2’, dtype=dt, data=(buf))
File “/python3.6/site-packages/h5py/_hl/group.py”, line 148, in create_dataset
dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
File “/python3.6/site-packages/h5py/_hl/dataset.py”, line 89, in make_new_dset
tid = h5t.py_create(dtype, logical=1)
File “h5py/h5t.pyx”, line 1629, in h5py.h5t.py_create
File “h5py/h5t.pyx”, line 1653, in h5py.h5t.py_create
File “h5py/h5t.pyx”, line 1680, in h5py.h5t.py_create
File “h5py/h5t.pyx”, line 1586, in h5py.h5t._c_compound
File “h5py/h5t.pyx”, line 1653, in h5py.h5t.py_create
File “h5py/h5t.pyx”, line 1685, in h5py.h5t.py_create
File “h5py/h5t.pyx”, line 1477, in h5py.h5t._c_array
File “h5py/_objects.pyx”, line 54, in h5py._objects.with_phil.wrapper
File “h5py/_objects.pyx”, line 55, in h5py._objects.with_phil.wrapper
File “h5py/h5t.pyx”, line 330, in h5py.h5t.array_create
ValueError: Zero-sized dimension specified (zero-sized dimension specified)
When h5py exception text comes with two-part messages like this one it typically means that is an error raised by the HDF5 library. I get the following for your example and h5py code from the master branch:
HDF5-DIAG: Error detected in HDF5 (1.13.1) thread 0:
#000: /Users/ajelenak/Documents/h5py/hdf5/src/H5Tarray.c line 102 in H5Tarray_create2(): zero-sized dimension specified
major: Invalid arguments to routine
minor: Bad value
Traceback (most recent call last):
File "/Users/ajelenak/Documents/h5py/trt.py", line 9, in <module>
ds = f.create_dataset('/var2', dtype=dt, data=buf)
File "/Users/ajelenak/Documents/h5py/h5py/h5py/_hl/group.py", line 161, in create_dataset
dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
File "/Users/ajelenak/Documents/h5py/h5py/h5py/_hl/dataset.py", line 88, in make_new_dset
tid = h5t.py_create(dtype, logical=1)
File "h5py/h5t.pyx", line 1663, in h5py.h5t.py_create
File "h5py/h5t.pyx", line 1687, in h5py.h5t.py_create
File "h5py/h5t.pyx", line 1714, in h5py.h5t.py_create
File "h5py/h5t.pyx", line 1620, in h5py.h5t._c_compound
File "h5py/h5t.pyx", line 1687, in h5py.h5t.py_create
File "h5py/h5t.pyx", line 1719, in h5py.h5t.py_create
File "h5py/h5t.pyx", line 1511, in h5py.h5t._c_array
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5t.pyx", line 347, in h5py.h5t.array_create
ValueError: Zero-sized dimension specified (zero-sized dimension specified)
It is in fact possible to create a compound dataset with zero-sized dimension. It could be that your code as an issue or, less likely, h5py has a bug.
Here is a Python script that demonstrates such compound using HDFql (do not know much about h5py - sorry):
# import HDFql module (make sure it can be found by the Python interpreter)
import HDFql
# create an HDF5 file named 'test.h5' and use (i.e. open) it
HDFql.execute("CREATE AND USE FILE test.h5")
# create a compound dataset with zero-sized dimension
HDFql.execute("CREATE DATASET dset AS COMPOUND(m1 AS INT, m2 AS FLOAT)(0)")
After running this script, you should have a file named test.h5 containing a compound dataset named dset. When running h5dump on it, the output is as follows:
I was actually hoping for the followings from h5dump. In this example, there are 2 two-dimensional arrays in the compound data type. The second has a zero-sized dimension. I am not familiar with HDFql. Could you try to create such an example?
Just tried with HDFql and, unfortunately, it seems to be not possible to create a member (of a compound dataset) with a zero-sized dimension. The following error message is returned when trying to do this:
HDF5-DIAG: Error detected in HDF5 (1.8.22) thread 0:
#000: H5Tarray.c line 126 in H5Tarray_create2(): zero-sized dimension specified
major: Invalid arguments to routine
minor: Bad value
HDF5-DIAG: Error detected in HDF5 (1.8.22) thread 0:
#000: H5Tcompound.c line 350 in H5Tinsert(): not a datatype
major: Invalid arguments to routine
minor: Inappropriate type
Probably there is a reason that explains why it is possible to create a dataset with a zero-sized dimension while it is not for a member (of a compound dataset) but not sure what it could be.
In my case, a simulation program creates a file containing many 2D datasets organized into groups. The size of 2nd dimension of the datasets can be larger than or equal to zero. The simulation output file is used in a successive ML application, where all datasets will be read into numpy arrays or torch tensors. I would like to change the storage layout to use compound data types, i.e. one compound data type for each group.
Without knowing what kind of data those datasets store and what is the physical meaning of their dimensions it is difficult to offer any suggestion.
The HDF5 array datatype cannot allow zero-sized dimensions because how many bytes each dataset element takes is critical information and there cannot be resizing array datatype afterward.
Here is an example output of a compound data type. In this case, no array has zero-sized dimension. In other cases, some arrays may have zero-sized dimension, for example, edge_index_3d_u can be of dimension size [2][0]. Whether an array has zero-sized dimensions is determined by the simulation. The structure of the compound type remains the same for all datasets, except for the 2nd dimension size of each array.
Thanks for sharing your output’s format. It is quite unconventional for me. I’d turn each of the H5T_ARRAY fields into a separate HDF5 dataset – /0/edge_index_3d_u, /0/edge_index_3d_v, etc. – because this enables you to resize the datasets based on the simulation results. And h5py gives you a NumPy array for each of the datasets. Seems like you have an ephemeral need for the data in HDF5 between simulations and ML training.
Storing the data with zero-sized dimension as separate datasets will require to add lots of checks and increase the total number of datasets in a file. I prefer not to add such overheads. Maybe I should submit a feature request in HDF5 github repo?
@wkliao, from your example train.h5, it looks like your simulation generates arrays of many different sizes. I suggest the natural way to organize this in HDF5 is to use HDF5 groups containing datasets, not compound data types containing arrays. Groups and datasets will support the zero-size arrays that you need.
My program has already been using the group/dataset approach.
Switching to compound data type was suggested by @steven
in this discussion which can improve the I/O performance for me.