Checking about corruption issues with hdf5 core driver

swapnilojha · April 1, 2026, 5:59pm

I am writing some string data(S75) to datasets and using core driver with lzf compression. I am using python with h5py 3.15 and hdf5 2.0.0.
Occasionally, what I have seen is that the chunk in the h5 file overwrites another chunk and hence when we read this overwritten chunk back, the lzf filter complains. I am not doing anything fancy, I open the file using core driver, write the data using ds[index] = some_data and then the file is flushed back. The data is moved around here and there depending upon the order that needs to be maintained.

An example of offset and sizes for some chunks:

chunk_index: 8 dtype: |S75
offset: 3333932 size: 70786
chunk_index: 9 dtype: |S75
offset: 3404718 size: 73350
chunk_index: 10 dtype: |S75
offset: 3478068 size: 72137
chunk_index: 11 dtype: |S75
offset: 3550205 size: 70829
chunk_index: 12 dtype: |S75
offset: 3550205 size: 37129

The chunks 11 and 12 have same offset but different size, reading chunk 11 fails but reading chunk 12 works. I’ve not been able to reproduce this outside. It happens once in a while as part of some jobs which are writing to the h5 file.

I wanted to check if something like this is possible and is a valid scenario? I think not. What could be the possible reasons for this?

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Checking about corruption issues with hdf5 core driver