Reading from a HDF5 file using h5py fails

Hello,

I suddenly started having troubles reading data from a relatively large HDF5 file (156 GB). In the past, I could read the file without issues but now I get the following error raised from h5py:

OSError: Can't read data (file read failed: 
time = Thu Jun  9 10:40:52 2022, 
filename = '../path/to/my_file.h5', 
file descriptor = 10, 
errno = 5, 
error message = 'Input/output error', 
buf = 0x7f77b1f422e0, 
total read size = 12139376, 
bytes this sub-read = 12139376, 
bytes actually read = 18446744073709551615, 
offset = 42224603136)

I’m only getting this error when I try to read certain datasets. Other datasets load fine.

I could find this thread where someone was facing a similar error but then they realised that it only affected them when they were trying to read a file from a network location. In my case, I have the file stored locally on an SSD (ext4 filesystem).

Is it possible that my file is corrupt? Do you have any suggestions on what I could check or try?

Thanks for help!

Here’s another case but again only when using a network storage.

It seems that in my case, the error was caused by a corrupt file. I re-wrote my file by a back-up version and now I’m not getting the error anymore. Hopefully it stays that way.

Coming back after a few months…

Recently this error came back and now it seems to re-occur. I keep “solving” the problem by replacing the corrupt(?) HDF5 file by a backup version. But after a while the problem returns.

I observe the error on the same machine as before so it’s possible that my hard drive has some HW problem.

Previously I was only able to find posts online about the same problem but only when data was stored on network drives. This post talks about the same problem when data is stored on NTFS SSD (Windows system).

Can you tell us a little more about the environment? OS? HDF5/h5py version?
What’s the output of

h5stat --enable-error-stack <HDF5 file>

on one of your “troublemakers?” What happens if you run this in a loop (100 or 1000 times)? Does it fail randomly?

G.