Currently, we are working on building a library on top of parallel version of HDF5 library. In this library we make extensive use of the data transform functionality which allows us to apply a unit conversion when for example writing a physical quantity to disk. For example an user has in memory the velocity in meter / sec while on disk he/she wants it to be in feet / sec.
While looking into the code, I noticed there is a clear distinction between the case when the data transform function is set or not. At least that is my impression. In the case the data transform is not set, the code makes use of optimized parallel I/O routines assuming that the other conditions, like no data type conversion, are satisfied too. If I understand the code correctly, this means that for a trivial data transform function (expression = “x”) the code does not make use of these optimized parallel I/O routines.
So my question is there a way to make use of these optimized parallel I/O routines in the case the expression = “x”? If not, it may be worthwhile to extend the code for this case by making for example H5Pset_data_transform() more sophisticated or create a function to unset the previously set data transform function. In my small test, I noticed a significant performance difference when using the optimized routines compared to non-optimized ones.