Hi HDF5-users,
I am new to HDF5 but an experienced C++ programmer. Having worked with many
mature open-source libraries I note a few things about the HDF5 C++ API -
please correct me where I am wrong.
I am aware the workarounds exist for all the issues I raise but I am
simply trying to bring out from experience, areas where I believe the current
HDF5 C++ API clashes with expectations and certain ideal(ized) design
philosophy (IMHO).
But before I start, please let me express my great appreciation for HDF5
as a scalable, cross-platform, open-source standard for large-volume
computational data storage and transfer, and my gratitude for making it
available as a free download.
ISSUE 1: Excessive/Inappropriate use of TRY-CATCH
hdf5-utils.cpp (1.98 KB)
hdf5-utils.h (2.11 KB)
···
-----------------------------------------------
We are forced to use try-catch blocks like if-else blocks - there is a
conspicious absence of query functions for checking to see if a group or
dataset exists - instead, we have to call the openGroup or openDataSet
functions and trap the exception if that fails.
There are a few issues that this creates:
1. It forces an alternate programming approach on otherwise conventional, more
meaningful and readable coding - exceptions now are not just for exceptions
(eg. does the absence of a dataset in an existence query really comprise an
exception or just an expected failure case when the producer and consumer of
the data just happen to be different? )
2. It disallows the use of compiler flags like -fno-exceptions in g++ because
the library is dependent on exceptions to guarantee correctness. Exceptions
cause the compiler to include a heavier runtime into the linked executable
and can have performance implications even if the actual code doesn't use
these features (what the compiler can infer about our code from it's static
analysis is limited). Therefore, by including HDF5 to store partial
calculation results in my nested loops, I am forced to switch
from -fno-exceptions to -fexceptions and risk introducing "excess baggage"
which I could shed previously. This has a cascade effect on my whole code.
In a nutshell, introducing HDF5 into my code has caused a minor
architecture change to my whole code.
ISSUE 2: ACCESSORS AND QUERIES FOR OBJECT (TYPES)
----------------------------------------------
1. The v1.6.x API allowed querying the type of the object. This allows
switch-case-break blocks to take actions on each sub-item of a group
depending on the case. For example, it is easy to write a (graphical) HDF5
file-browser with such API. IIUC, with v1.8.x, some functions like
CommonFG::getObjtypeByName() are deprecated. But, achieving the above example
use case will now involve a whole bunch of try-catch blocks, each trying to
open a different possible type. For example,
try { Group subgroup = group.openGroup(name); /* do something*/ }
catch(Exception const& ex) {}
try { DataSet ds = group.openDataSet(name); /* do something*/ }
catch(Exception const& ex) {}
Here, if openGroup succeeds, an openDataSet() attempt will still be performed
unless we use extra flags and if() conditions possibly with goto statements.
An equivalent switch-case block is more readable and encloses a logical unit
of code that performs a well-defined function, namely, branching of control.
2. To address the above case, it might make sense to introduce different
iterators as in STL. For example, Group::group_iterator,
Group::dataset_iterator, DataSet::attribute_iterator (?)
These iterators obviate the need to manually apply filters to identify each
child of a parent group. So if there is a need to identify just the datasets
at the current level, the Group::dataset_iterator would help.
ISSUE 3: WRITE API FOR DATASETS
-------------------------------------
1. Once a DataSet object is instantiated with a DataType and DataSpace, the
common-case write of the dataset would normally involve the same datatype
with which it was created. Why do we need to re-mention it during write()?
Understandably, this would help with conversions (I don't know much about
HDF5 conversions). If this is the case, ideally there should *also* be a
write() member function that takes just one parameter - the pointer to the
data buffer - because all other information including DataType is inferrable
from the dataset object. As a beginner I was perplexed till I caught
the "conversions" keyword.
Same thing with read() - if the DataType in-file is not convertible to the
DataType of the DataSet object on which read() is being called then this
would comprise an exception.
2. Writing strings, currently, is a little involved. There could be
convenience functions named "writeString" or even just "write" that take one
string arg. A beginner is faced with questions about fixed-length vs
variable-length vs character-array (with or without the trailing '\0'?)
3. Similarly, writing single integers or floats could be supported using
functions named writeInt(), writeFloat(), writeUInt() etc. which would be
useful for attributes and would hide PredType::NATIVE_INT from a beginner.
Also, I imagine NATIVE_<TYPE> is commonly used so such convenience functions
would allow rapid development without a large learning before first use.
4. Using the type-traits<> template-based techniques along with partial
specialization as in STL and BOOST libraries, it is possible to write short,
simple code that could permit one polymorphic function, say
template<typename T>
void writeAtom(H5::Group & g, T const& t, string const& name);
to write different common atomic types like float, int, string etc. To
illustrate this I am attaching .h and .cpp files where the functions
{write,read}_hdf5_scalar_attribute() are implemented in this way.
ISSUE 4: STANDARD API for COMPLEX TYPES
---------------------------------------
It is quite common to use complex<float> or complex<double> in mathematical
calculations so it would be nice to have predefined datatypes for these.
Since FORTRAN, C99 and C++ all support complex with up-to long double
precision at the language level, HDF5-support would make life so much easier.
ISSUE 5: H5File API
------------------------
1. Is there a requirement for CommonFG to be a base-class at all? Can't all
included operations be collapsed into just the Group class? To do this with a
file object, just retrieve the root group using file.openGroup("/") and then
work simply with groups. To annotate the H5File itself with meta-info,
provide a separate API. Class hierarchies should represent meaningful
relationships between parents and progeny. The root group in a file is not
the file itself and CommonFG is required only when we mix up the two
definitions (IIUC, IMHO).
2. The H5File contructor supports some H5F_ACC_? parameters that
H5File::open() fails with. This is not documented in the DOXYGEN-generated
API. This is forcing me to include a whole bunch of code within a try-catch
block simply because the H5File object must now be created inside the block
instead of simply using the open() member function - and is therefore visible
only inside the try-catch block!
IMHO, H5File should follow a model similar to ifstream and ofstream for the
open() and close() functions - while a constructor performs an open(), the
latter can also be performed separately with the same H5F_ACC_? flags.
Thanks,
Manoj Rajagopalan
PhD Candidate, EECS (CSE)
University of Michigan, Ann Arbor