Slow attributes, even slower with 1.8+ file version

sludtke · October 1, 2020, 2:51pm

We have been doing some cross testing with other formats recently, and noticed that reading attributes was extremely slow. We’ve been using HDF5 as a file format for 15+ years (eman2.org) and most of our performance testing was done long ago. In a typical HDF file we put anywhere from 1000 - 100,000 data sets, and each data set will have 20-100 small attributes associated with it. Most of these are simple numbers, a few are strings. We use the HDF5 library directly in C, so have very precise control over what we are doing.

Anyway, in our test, we read in all of the attributes from all of the data sets in the file, but none of the actual data. I then ran this through valgrind to see where all the time was being spent. 98% of the time is within the HDF library, and 2/3 of this time is spent in H5A_open_by_idx(). Digging deeper, it appears that every call to this function is calling H5A_build_compact_table() and then releasing it, and this accounts for pretty much the entire problem.

My interpretation is that every time I open a single attribute it reads in and builds the entire index of attributes (including a qsort and other things), finds the single element I asked for, then releases the whole thing again. I looked for solutions in the library, but even if I used the H5A iteration routine, I would still have to call open on each attribute.

After reading in the forums here a bit, I found that others had had issues with attributes starting with 1.8, but that 1.8 also had an alternative storage scheme using B-trees, which should improve things, so I added:
H5Pset_libver_bounds(accprop, H5F_LIBVER_18, H5F_LIBVER_LATEST);
to my code. Alas, reading a file written with this set actually took 5X longer than reading without this option. I also tried updating the HDF library to 1.10 (we usually use the latest patch of 1.8), but that didn’t help either.

Is there anything I can do? My goal is to read in the entire set of attributes, so I don’t care if I open by name or by index, or any other method. I’m just trying to avoid the silly overhead of reading and building the table for every single attribute read.

Thanks in advance for any suggestions at all!

koziol · October 1, 2020, 6:52pm

Try using “native” iteration order for the H5Aiterate call.

sludtke · October 1, 2020, 8:30pm

Yeah, tried that this morning, though H5Aiterate seems useless, since it doesn’t actually open the attribute. The problem is when you call H5Aopen, it does its own internal call to H5A_compact_build_table so it can iterate over the entries to find the one you are asking to open. You can avoid the sort by using native order on open by index, but it still needs to build and free the table for every single attribute it opens, and the sort is only about 10% of the time. Building the table over and over for every attribute is just insane, but I can’t find any way around it…

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Slow attributes, even slower with 1.8+ file version