I’m reporting a potential bug. The situation is a bit complicated, and I’m also short of time to investigate further.
I have a program that updates a large number of files (~10000) in each run that each contains a few simple datasets built with lz4, shuffle, and fletcher32 filters. Originally, h5py and hdf5plugin were built against hdf5 1.10. It’s been working fine for over a year with heavy updates that essentially expand some datasets with new records and also update some existing records. The files are open with “swmr_mode” set to True.
Lately, I set up a new environment with hdf5 1.12, h5py 3.7.0, and hdf5plugin 3.3.1. The program works most of time but then corrupts certain records while updating existing records while the newly expanded records are fine. This happens randomly over different files at different times. If I fix the errors and rerun, the program will not fail at the same place. I originally thought this was due to hdf5 library version changes, so I downgraded to hdf5 1.10 while keeping the same h5py and hdf5plugin. The same random corruption persisted. After many days of frustration and hunting bugs in my own programs without any success, I gave up and downgrade the h5py to the same version as the old environment. But that didn’t solve the problem, either. Finally, I downgraded hdf5plugin to 2.2.0, the same version I used in the old environment. The problem was gone and has been fine a few week even after I reinstalled h5py 3.7.0.
To recap, it seems that somewhere after hdf5plugin 2.2.0 some bugs were introduced, lz4 filter in particular (I had to abandon blosc at version 2.2.0 because it sometimes corrupt files especially after a programming error is encountered in python scripts). Currently, the good combination known to me is hdf5 1.10, h5py 3.7.0, and hdf5plugin 2.2.0. I suspect that hdf5 1.12 is also fine, but I didn’t test.
I know that this may not be very helpful in fixing the bugs, but I’m reasonably sure that a bug contributed to the issues outlined in the above. One thing that I’m not sure of the significance is that the original files were created with an older version of hdf5plugin (2.2.0 in this case), while they are updated in the new environments. Another observation is that an almost identical program, which updates a far smaller number of files (a few hundreds in each run) has never experienced the same problem within any of the software combinations mentioned in the above. It seems that a large number of files touched seem to be a trigger.