Some of the datasets I have contain compound types with vlen fields. When I read these datasets, HDF5 creates new conversion 'paths' to convert between the file types and memory types involved. HDF5 caches these paths (see struct H5T_g defined in H5T.c).
I've finally traced a memory 'leak' in my application to the unbounded growth of the conversion path cache. HDF5 treats vlen types as different if they come from different files, so I get a new set of conversion paths for every file I open, even if the types are actually identical.
That would be fine, except that I can't find a way to get rid of the cached paths when I close a file. There is a function provided for removing paths, H5Tunregister(pers, name, src_id, dst_id, func), but it does not work for compound types because of the way that the pers parameter is handled. If I pass H5T_PERS_HARD, no compound type conversions are removed because H5T_path_t.is_hard is set to false by H5T_path_find() when it falls back on the compound->compound soft conversion and generates a new path. Alternately, if I pass H5T_PERS_DONTCARE or H5T_PERS_SOFT, H5Tunregister() removes the default compound->compound soft conversion and I can't read any more datasets because the library can't create conversion paths for them.
Incidentally, I also discovered that the way the type comparison function determines file identity depends on pointers that are left dangling when a file is closed, which both complicated my minimum reproduction of the problem and also undermines the file identity check. (When the same allocation is re-used from the free list for a different file, types can compare as the same even when they are from different files, which is contrary to the intent of the code.)
I have attached a small program that reproduces the problem. It takes one argument, which is a path at which it can write a temporary file. To run it does require a custom build of HDF5 so that the test program can read the size of the path table. (Or alternately, you can comment out the relevant parts of the test program and inspect H5T_g.npaths with a debugger.)
Has anyone else encountered and/or found a solution for this problem? I am already patching my own HDF5 builds to get Unicode file name support on Windows, so if I have to make code changes it's not the end of the world.
Thanks,
Matthew Xavier
Hdf5TypePathLeak.c (8.29 KB)