First post, so please excuse my ignorance. I can’t seem to find an answer to my problem through searching. Can someone steer me in the right direction?
I have a simple script that runs in parallel through mpi4py. I create a random length array between 2 and 10 elements long of integers (also random) in the range (1,100). Being that this is in parallel (ran from terminal using mpirun -np ...
), each rank contains its own random length array. I then create an hdf5 file and create a group for each rank, and dataset for each group (rank). I set their shape to be the maximum of all the rank’s array lengths. Then the datasets get data assigned to the 0->length indices and the rest is left as zeros. I then resize the datasets so that the padding zeros are no longer kept.
Prior to closing the file, I print out the rank number, max data length, length of array on rank, the data array, and the dataset (after being resized). The output is shown below:
user123:~> mpirun -np 4 python3 demo_h5.py
rank max_length length data dataset
3 8 8 [69 87 84 17 25 17 9 70] [69 87 84 17 25 17 9 70]
2 8 8 [69 70 1 70 89 71 15 84] [69 70 1 70 89 71 15 84]
0 8 4 [55 45 50 12] [55 45 50 12]
1 8 5 [23 23 23 39 78] [23 23 23 39 78]
The Issue
I then take a look at the saved file with h5dump to see if the datasets are the same as printed above. Note that rank 0 should have a shape of (4,) but does not. It’s still padded with zeros. My python script is attached bottom of this post for reference, along with my version info.
Help?
I must be missing something. I’m attempting to write varying length arrays in parallel. I don’t want them padded. Surely there’s a straightforward way to do this.
user123:~> h5dump parallel_test.hdf5
HDF5 “parallel_test.hdf5” {
GROUP “/” {
GROUP “rank0” {
DATASET “data” {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 8 ) / ( H5S_UNLIMITED ) }
DATA {
(0): 55, 45, 50, 12, 0, 0, 0, 0
} } }
GROUP “rank1” {
DATASET “data” {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 5 ) / ( H5S_UNLIMITED ) }
DATA {
(0): 23, 23, 23, 39, 78
} } }
GROUP “rank2” {
DATASET “data” {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 8 ) / ( H5S_UNLIMITED ) }
DATA {
(0): 69, 70, 1, 70, 89, 71, 15, 84
} } }
GROUP “rank3” {
DATASET “data” {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 8 ) / ( H5S_UNLIMITED ) }
DATA {
(0): 69, 87, 84, 17, 25, 17, 9, 70
} } } } }
>>> print(h5py.version.info)
Summary of the h5py configuration
<--------------------------------->
h5py 3.10.0
HDF5 1.12.0
Python 3.9.12 (main, May 6 2022, 16:05:36)
[GCC 8.5.0 20210514 (Red Hat 8.5.0-4)]
sys.platform linux
sys.maxsize 9223372036854775807
numpy 1.26.1
cython (built with) 0.29.36
numpy (built against) 1.19.3
HDF5 (built against) 1.12.0
demo_h5.py (820 Bytes)