Sharing dataspace identifiers between multiple datasets

I wrote a code creating two extendible datasets and a corresponding code reading them (below only call statements are shown).

As you can see, dataspace identifiers are shared between the two datasets just to reduce the number of dataspace identifiers.

Specifically, in creating two extendible datasets,
(a) dataspace identifier (“space_unlimited”)
(b) property list (“prop_chunk”)
© dataspace identifier (“space_written”)
(d) dataspace identifier (“space_extended”)
are shared, and in reading the two extendible datasets,
(e) dataspace identifier (“space_mem”)
(f) dataspace identifier (“space_slab”)
are shared.

So far, the code seems to work as I expect, but I’m not confident about sharing dataspace identifiers.
Could you let me know whether my coding is correct? If not, could you let me know proper coding?

(1) creating two extendible datasets

CALL h5Pcreate_f(H5P_DATASET_CREATE_F, prop_chunk, hdferr)
CALL h5Pset_chunk_f(prop_chunk, 1, dims, hdferr)
CALL h5Screate_simple_f(1, dims, space_unlimited, hdferr, [H5S_UNLIMITED_F])
CALL h5Dcreate_f(grp_trans, 'a1', H5T_IEEE_F64LE, space_unlimited, dset_a1, hdferr, prop_chunk)
CALL h5Dcreate_f(grp_trans, 'a2', H5T_IEEE_F64LE, space_unlimited, dset_a2, hdferr, prop_chunk)
CALL h5Sclose_f(space_unlimited, hdferr)
CALL h5Pclose_f(prop_chunk, hdferr)

CALL h5screate_simple_f(1, [size_written], space_written, hdferr)

CALL h5Dset_extent_f(dset_a1, [size], hdferr)
CALL h5Dset_extent_f(dset_a2, [size], hdferr)
CALL h5Dget_space_f(dset_a1, space_extend, hdferr)
CALL h5Dget_space_f(dset_a2, space_extend, hdferr)

      CALL h5Sselect_hyperslab_f(space_extend, H5S_SELECT_SET_F, [offset], [1_HID_T], hdferr, BLOCK=[size_written])

      CALL h5Dwrite_f(dset_a1, H5T_IEEE_F64LE, a1, [size_written], hdferr, space_written, space_extend)
      CALL h5Dwrite_f(dset_a2, H5T_IEEE_F64LE, a2, [size_written], hdferr, space_written, space_extend)

      CALL h5Sclose_f(space_extend, hdferr)

(2) read the two extendible datasets

CALL h5Dopen_f(grp_trans, 'a1', data_a1, hdferr)
CALL h5Dopen_f(grp_trans, 'a2', data_a2, hdferr)
CALL h5Dget_space_f(data_a1, space_slab, hdferr)
CALL h5Dget_space_f(data_a2, space_slab, hdferr)

CALL h5Screate_simple_f(1, [block_read], space_mem, hdferr)

   CALL h5Sselect_hyperslab_f(space_slab, H5S_SELECT_SET_F, [offset], [1_HID_T], hdferr, [1_HID_T], [block_read])
   CALL h5Dread_f(data_a1, H5T_IEEE_F64LE, a1, [block_read], hdferr, space_mem, space_slab)
   CALL h5Dread_f(data_a2, H5T_IEEE_F64LE, a2, [block_read], hdferr, space_mem, space_slab)

A dataspace is just a piece of metadata that has no reference to any particular dataset or in-memory representation of a dataset value. Think of it as a rectilinear wire-frame or lattice.
You can use it as a “template” to create datasets by specifying what kind of elements you’d like to attach to the sites or nodes of that lattice. You can also use it to coordinate I/O between arrays in memory and/or datasets in HDF5 files. For that you can select or highlight certain sites or nodes, e.g., in a hyperslab selection, to define the scope of the operation. In other words, you can manipulate dataspaces and selections (patterns!) on their own. If, in your code, there is an opportunity to reuse dataspaces (handles), by all means, you should consider doing that.

Of course, as with anything shared, you make assumptions about the state of that space (its extent, a selection, etc.) and “associated” datasets or in-memory structures. It’s easy to forget those assumptions and they may not be obvious to someone else modifying your code. This can lead to obscure bugs that are hard and time-consuming to find. From a performance perspective, unless you are dealing with complex selections on large dataspaces, the overhead from maintaining multiple dataspaces and handles is negligible (as long as you don’t create memory leaks by not closing handles).


1 Like