The NROWS attribute and H5TB interface


I'm writing this because I'd like to know your position on the NROWS
attribute in the H5TB High Level interface.

It happens that, in the current H5TB interface, NROWS is only set during
a H5TBdelete_record() operation. However, other routines that change
the number of rows did not update properly this attribute, as for one,
H5TBappend_records(). This last routine (and others) asks
H5TBget_table_info() for getting the number of rows in a table, and it
looks first at the NROWS and if it exists, it honors it and return its
value. So, if you call H5TBdelete_record() and then
H5TBappend_records(), the NROWS counter gets stalled, causing problems
in the next operations (append or more deletions).

For example, in the attached example, the output is:

$ ./bug-NROWS
$ h5ls -rd bug-NROWS.h5
/streams Group
/streams/time Dataset {2/Inf}
        (0) {1, 0.1, 0.1}, {3, 0.1, 0.3}

when it should be:

$ h5ls -rd bug-NROWS.h5
/streams Group
/streams/time Dataset {3/Inf}
        (0) {1, 0.1, 0.1}, {2, 0.1, 0.2}, {3, 0.1, 0.3}

[I'm using HDF5 1.8.1 here, but after some ocular inspection I'd say
that the problem remains in HDF5 1.8.2]

My guess is that this NROWS parameter comes from the age of pre-HDF5
1.8.x series, where the H5Dset_extent() did not work properly.
However, now that H5Dset_extent() works as intended with HDF5 1.8.x, an
elegant solution for the problem would be to get rid of the NROWS
attribute completely. In fact, the PyTables library does not trust
NROWS anymore, although it tries to keep NROWS up-to-date mainly for
HDF5-HL compatibility (which funnily enough, only contributed to its
in-compatibility, due to the way NROWS is treated in H5TB :-/).

If removing NROWS is not viable (for some reason that I can't think off
now), H5TB should be fixed in order to honor the NROWS where

Finally, now that I'm suggesting to get rid of NROWS, what about other
attributes in H5TB, like FIELD_N_NAME and FIELD_N_FILL that are
accessible from plain HDF5 calls (mmh, not sure now about the fill
value). Wouldn't be better if this meta-info will not be duplicated
for table objects (and avoiding sync'ing problems like the NROWS)?
There are people out there trusting NROWS, FIELD_N_NAME or
FIELD_N_FILL? Incidentally, PyTables is still using FIELD_N_FILL, but
I hope I can get rid of this if these attributes finally disappear.


bug-NROWS.cpp (2.36 KB)


Francesc Alted