I suddenly started having troubles reading data from a relatively large HDF5 file (156 GB). In the past, I could read the file without issues but now I get the following error raised from h5py:
OSError: Can't read data (file read failed:
time = Thu Jun 9 10:40:52 2022,
filename = '../path/to/my_file.h5',
file descriptor = 10,
errno = 5,
error message = 'Input/output error',
buf = 0x7f77b1f422e0,
total read size = 12139376,
bytes this sub-read = 12139376,
bytes actually read = 18446744073709551615,
offset = 42224603136)
I’m only getting this error when I try to read certain datasets. Other datasets load fine.
I could find this thread where someone was facing a similar error but then they realised that it only affected them when they were trying to read a file from a network location. In my case, I have the file stored locally on an SSD (ext4 filesystem).
Is it possible that my file is corrupt? Do you have any suggestions on what I could check or try?
Recently this error came back and now it seems to re-occur. I keep “solving” the problem by replacing the corrupt(?) HDF5 file by a backup version. But after a while the problem returns.
I observe the error on the same machine as before so it’s possible that my hard drive has some HW problem.
Previously I was only able to find posts online about the same problem but only when data was stored on network drives. This post talks about the same problem when data is stored on NTFS SSD (Windows system).