Error: Read failed

hahnbdo · March 17, 2022, 5:44pm

Hello everyone,

we use HDF5 format to store large amounts of simulation results.
Our h5 files can contain many folders and many records.

When reading the results with our post-processing software, the error shown below occurs.
The error is not 100% reproducible and occurs randomly. Sometimes it works, sometimes 99 out of 100 attempts fail.
We compile the C / C++ sources from the HDF homepage with msvc-14.1 and use them in our C / C++ application.

Any help or hints on a solution would be appreciated.
Many thanks!

HDF5-DIAG: Error detected in HDF5 (1.8.16) thread 0:
  #000: D:\Third_Party\CMake-hdf5-1.8.16d\hdf5-1.8.16\src\H5O.c line 657 in H5Oget_info_by_name(): object not found
    major: Symbol table
    minor: Object not found
  #001: D:\Third_Party\CMake-hdf5-1.8.16d\hdf5-1.8.16\src\H5Gloc.c line 747 in H5G_loc_info(): can't find object
    major: Symbol table
    minor: Object not found
  #002: D:\Third_Party\CMake-hdf5-1.8.16d\hdf5-1.8.16\src\H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #003: D:\Third_Party\CMake-hdf5-1.8.16d\hdf5-1.8.16\src\H5Gtraverse.c line 641 in H5G_traverse_real(): traversal operator failed
    major: Symbol table
    minor: Callback failed
  #004: D:\Third_Party\CMake-hdf5-1.8.16d\hdf5-1.8.16\src\H5Gloc.c line 702 in H5G_loc_info_cb(): can't get object info
    major: Symbol table
    minor: Can't get value
  #005: D:\Third_Party\CMake-hdf5-1.8.16d\hdf5-1.8.16\src\H5O.c line 2792 in H5O_get_info(): unable to load object header
    major: Object header
    minor: Unable to protect metadata
  #006: D:\Third_Party\CMake-hdf5-1.8.16d\hdf5-1.8.16\src\H5O.c line 1685 in H5O_protect(): unable to load object header
    major: Object header
    minor: Unable to protect metadata
  #007: D:\Third_Party\CMake-hdf5-1.8.16d\hdf5-1.8.16\src\H5AC.c line 1262 in H5AC_protect(): H5C_protect() failed.
    major: Object cache
    minor: Unable to protect metadata
  #008: D:\Third_Party\CMake-hdf5-1.8.16d\hdf5-1.8.16\src\H5C.c line 3574 in H5C_protect(): can't load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #009: D:\Third_Party\CMake-hdf5-1.8.16d\hdf5-1.8.16\src\H5C.c line 7954 in H5C_load_entry(): unable to load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #010: D:\Third_Party\CMake-hdf5-1.8.16d\hdf5-1.8.16\src\H5Ocache.c line 328 in H5O_load(): unable to read object header data
    major: Object header
    minor: Read failed
  #011: D:\Third_Party\CMake-hdf5-1.8.16d\hdf5-1.8.16\src\H5Fio.c line 120 in H5F_block_read(): read through metadata accumulator failed
    major: Low-level I/O
    minor: Read failed
  #012: D:\Third_Party\CMake-hdf5-1.8.16d\hdf5-1.8.16\src\H5Faccum.c line 214 in H5F__accum_read(): driver read request failed
    major: Low-level I/O
    minor: Read failed
  #013: D:\Third_Party\CMake-hdf5-1.8.16d\hdf5-1.8.16\src\H5FDint.c line 211 in H5FD_read(): driver read request failed
    major: Virtual File Layer
    minor: Read failed
  #014: D:\Third_Party\CMake-hdf5-1.8.16d\hdf5-1.8.16\src\H5FDsec2.c line 707 in H5FD_sec2_read(): file read failed: time = Thu Mar 17 08:50:36 2022
, filename = 'D:/projects/results.h5', file descriptor = 3, errno = 22, error message = 'Invalid argument', buf = 000002185665CFF8, total read size = 328, bytes this sub-read = 328, bytes actually read = 18446744073709551615, offset = 81288
    major: Low-level I/O
    minor: Read failed

gheber · March 18, 2022, 12:18am

Is there a particular reason why you are using HDF5 1.8.16? The latest 1.8.x release is 1.8.22.

Are you saying that this happens with the same file, i.e., most of the time it reads w/o issues, but there are occasional read failures?

Best, G.

hahnbdo · March 18, 2022, 9:21am

After we upgraded to a newer version, the non-ASCII characters in filenames no longer worked, so we switched back to a version without this problem. However, this did not solve the reading problem.

The error is not file, hardware, or user specific. It tends to occur more often with large files.

gheber · March 18, 2022, 2:54pm

On what kind of file system/storage do your files live? G.

hahnbdo · March 18, 2022, 4:05pm

We run our software on a Windows 10 PC with standard file system NTFS 3.1.

gheber · March 18, 2022, 7:12pm

When these errors occur, what do the HDF5 path names of the objects you are trying to read look like?

Are you using UTF-8 encoded link names?

Do your HDF5 files contain external links?

hahnbdo · March 21, 2022, 10:02am

Our path names to data sets look like:

I could provide you an example file. I would not like to post it in a public forum.

We intentionally used only ASCII characters for the links.
We do not use external links.

gheber · March 21, 2022, 3:42pm

Would you please contact our Helpdesk for us to get a sample file?

In the meantime, a few more questions:

How big is a typical file? (1 GB, 10 GB, 100GB?)
You are seeing this behavior on different machines running Windows, right? Have you seen this behavior in non-Windows environments?
Does the behavior occur with random objects, or with particular objects (path names)?

Thanks, G.

hahnbdo · March 21, 2022, 7:21pm

The usual file size is between 1 GB and 15 GB.

It occurs on different Windows machines. There is no Linux version of our software. We have tested it only on Windows.

The error seems to occur randomly, even with the same file or data set.
At least we haven’t found out any particular pattern regarding, for example, the path.

gheber · March 22, 2022, 1:48pm

For files that fit into available RAM, could you try to reproduce the error with the core VFD? Instead of passing H5P_DEFAULT as the file access property list, create a file access property list fapl via hid_t fapl = H5Pcreate(H5P_FILE_ACCESS) and set H5Pset_fapl_core(fapl, 4194304, 1)? That way the file system would be out of the way, and we’d get another data point. OK?

Best, G.

hahnbdo · March 23, 2022, 2:34pm

I have tried it out. It actually seems to work. The error no longer occurs, even with large files.
As expected, the memory requirements of the application become much larger.
Do you see any way to get it to work without loading the entire file?

gheber · March 23, 2022, 7:12pm

Thank you for trying this & finally some good news. I think we still want to get to the bottom of the original problem, but let’s see if we can find a tentative solution.

What portion of the data in a file do you usually read? (<30%, <60%, most of it?)
- Are you only reading data or are you making updates, or writing new datasets?
Do your files have the same profile (the same structure and layout)? In other words, are the HDF5 path names and attribute names predictable, rather than discovering them at runtime?
Are the data that you are reading compressed or otherwise (HDF5-)filtered?

Best, G.

hahnbdo · March 23, 2022, 8:01pm

Our routines read only small amounts of data at a time per request, often probably only 1% of the data. Sometimes there can be a lot of small requests in a row to the h5 file. We have asked ourselves before if this can cause problems.

We have a separate read and write interface. The problem occurs only when reading. We open the HDF file only to read data when the problem occurs.

The files all follow the same basic structure. Not all files always contain all possible data.
The reader always checks which data is present. There can also be failed read attempts, which is OK and then handled.
Names of paths and attributes are predictable. They are not random.

On the topic of filtering, I need to ask. I will send information as soon as I know.

hahnbdo · March 24, 2022, 11:44am

We use the zlib compression that is part of the HDF download package.
I sent an example file to the support E-Mail. Maybe you can see there if something is off regarding compression.

One additional remark:
Sometimes even the first read attempt of a data set fails. These data sets, that are read at the beginning, are always present and have identical structure.

gheber · March 24, 2022, 11:57am

Are you using a 64-bit version of Windows? Is the executable a 64-bit or a 32-bit binary?

gheber · March 24, 2022, 12:02pm

Does the problem occur only with a specific layout? For example, your screenshot shows a dataset w/ compact layout, where there is no compression. Or does it occur only with datasets where you use (zlib) compression?

hahnbdo · March 24, 2022, 1:16pm

We only have a 64-bit version of the software.

I checked the miscellaneous data set information. Filters, compression and fill value are always NONE.
I haven´t programmed this part myself, so I was not sure.

The storage layout is COMPACT, CHUNKED or CONTIGUOUS. However, I remember read problems with all of these data sets.

gheber · March 24, 2022, 5:07pm

I just checked w/ our help desk, but it appears that we haven’t received any file samples. Would you mind sending it again to help@hdfgroup.org? Thanks, G.

gheber · March 26, 2022, 2:06am

OK, I got the sample file. I took a quick peek and can’t see anything unusual.

Just to be sure: Your NTFS file system is mounted locally, i.e., we are not talking about a Windows SMB share, right?

Your description sounds like a race condition, where the file descriptor gets clobbered.

G.

hahnbdo · March 28, 2022, 3:07pm

We use the software only on normal Windows PCs. The HDF files are read from the local hard drive. We do not use anything unusual. I do not exclude that the corporate IT department of our company sets some settings. But I think it is unlikely.

Since the problem occurs with all users, the suspicion so far has been rather that it has something to do with the software. For example, options set when opening data sets. Could you think of something that could cause such behavior?

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Error: Read failed