H5clear error: h5tools_fopen

I am trying to use the h5clear utility to fix an HDF5 file that won’t open in the HDFView utility.
The file was corrupted during an interruption in a long write operation.

When I tried to use h5clear, I received the error:

  • h5clear error: h5tools_fopen

Can anyone advise how I can troubleshoot/fix this.

Thanks.

Paul, what’s the error that you are getting from HDFView? What does h5dump -pBH <filename> return? h5clear performs very few specific tasks, none of which has anything to do with file corruption. G.

Hi Gerd,

From h5dump: h5dump error: unable to open file "test.h5"
From HDFView: Error opening file test.h5
From HDFCompass:

INFO    pydap.request:39 > Opening file:///C:\Users\HIAPRC\Documents\CH149\One_off_code\hdf5_check\test.h5.dds
DEBUG   hdf_compass.opendap_model.model.can_handle:61 > able to handle file:///C:\Users\HIAPRC\Documents\CH149\One_off_code\hdf5_check\test.h5? no
DEBUG   hdf_compass.asc_model.model.can_handle:74 > able to handle file:///C:\Users\HIAPRC\Documents\CH149\One_off_code\hdf5_check\test.h5? no, missing .asc extension
Traceback (most recent call last):
  File "c:\users\john\hdf-compass\hdf_compass\compass_viewer\frame.py", line 199, in on_file_open
  File "c:\users\john\hdf-compass\hdf_compass\compass_viewer\frame.py", line 223, in open_url
  File "c:\users\john\hdf-compass\hdf_compass\compass_viewer\viewer.py", line 175, in can_open_store
  File "c:\users\john\hdf-compass\hdf_compass\bag_model\model.py", line 88, in can_handle
  File "site-packages\hydroffice\bag\base.py", line 31, in is_bag
  File "site-packages\h5py\_hl\files.py", line 272, in __init__
  File "site-packages\h5py\_hl\files.py", line 92, in make_fid
  File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (C:\aroot\work\h5py\_objects.c:2587)
  File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (C:\aroot\work\h5py\_objects.c:2546)
  File "h5py\h5f.pyx", line 76, in h5py.h5f.open (C:\aroot\work\h5py\h5f.c:1821)
IOError: Unable to open file (Bad object header version number)

Could you please send the output of

h5dump --enable-error-stack 2 -pBH <filename> 

?

The HDFCompass output IOError: Unable to open file (Bad object header version number) suggests a corrupt object header. If that were true it’d be time for h5debug.

h5dump was unable to open the file.

(py38) PS C:\Users\HIAPRC\Documents\CH149\One_off_code\hdf5_check> h5dump --enable-error-stack 2 -pBH test.h5
h5dump error: unable to open file "2"

I tried some other options with h5dump and the error: h5dump error: unable to open file "test.h5" was raised every time.

I then tried h5debug, which unfortunately was also unable to open the file.

(py38) PS C:\Users\HIAPRC\Documents\CH149\One_off_code\hdf5_check> h5debug test.h5
cannot open file

Thanks.

My mistake. I meant

h5dump --enable-error-stack=2 -pBH <filename> 

Also, what about

od -c <filename> | head -n 50

G.

Sorry, I am on windows and could not find a way to execute an octal dump of the file.
I used the powershell command: get-content test.h5 -TotalCount 50 | format-hex > hex_dump_test.txt
If this doesn’t work for you, I will carry out a more thorough search to get the dump in octal format.

hex_dump_test.txt (123.3 KB)

Here is the output for h5dump --enable-error-stack=2 -pBH test.h5 > h5dump_output.txt

h5dump_output.txt (8.9 KB)

Thanks

Thanks for that. The file looks pretty messed up. The binary dump shows and the h5dump error stack confirms that there isn’t a root group to be found. That’s the A (alpha) of an HDF5 file and without it, you don’t have a canonical thread to pick up. There are potentially a few loose ends scattered around the file, which you might be able to pick up by searching for OHDR, HEAP, etc., but this is a costly and uncertain proposition. I’m not sure if the root group was never written, or if it got overwritten (by another process?) since there are unusually many zeros at the beginning of the file (following the signature).

How did you get to this state in the first place? G.

1 Like

I work in a building which suffers from power outages a few times a year.
I wanted to try and simulate a power outage while an HDF5 file was being written.
I wrote a Python program that would create large datasets of 2M rows.
I stepped through the program letting it create a three datasets and then as it was creating the fourth dataset I killed program execution.
I tried it again today and was unable to duplicate the results. I could open the file but the write operation that I interrupted resulted in no dataset or groups being written.
I guess my original issue was just a fluke.

Thanks for your assistance.

If you have any more questions I am happy to help.

Here is the program if you are curious:

import h5py
import numpy as np

def test_corrupted_h5():

    arry = np.random.random((2000000, 100))

    # write one dataset and close it
    with h5py.File("test.h5", "w") as f_store:
        group = f_store.create_group("149999/2000/VRM_DATA")
        group.create_dataset("my_data", data=arry)

    with h5py.File("test.h5", "a") as f_store:
        group = f_store.create_group("/149999/2001/VRM_DATA")
        group.create_dataset("my_data", data=arry)

    with h5py.File("test.h5", "a") as f_store:
        group = f_store.create_group("/149999/2002/VRM_DATA")
        group.create_dataset("my_data", data=arry)

    with h5py.File("test.h5", "a") as f_store:
        group = f_store.create_group("/149999/2003/VRM_DATA")
        group.create_dataset("my_data", data=arry)

I am in a similar situation. An HDF5 file got damaged, presumably because the software which was writing it was killed in the middle of the writing. h5dump cannot open the file. I have run od -c | head -n 50 , and it has given

0000000 211 H D F \r \n 032 \n \0 \0 \0 \0 \0 \b \b \0
0000020 004 \0 020 \0 001 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000040 377 377 377 377 377 377 377 377 \0 \b \0 \0 \0 \0 \0 \0
0000060 377 377 377 377 377 377 377 377 \0 \0 \0 \0 \0 \0 \0 \0
0000100 ` \0 \0 \0 \0 \0 \0 \0 001 \0 \0 \0 \0 \0 \0 \0
0000120 210 \0 \0 \0 \0 \0 \0 \0 250 002 \0 \0 \0 \0 \0 \0
0000140 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
*
The lines following after *, numbered 0013460-0014700, contain a variety of not-\0 characters.

Is there a possibility that the file can be recovered, or would it be easier to just create it anew and discard the damaged file entirely?

Thank you.
Best wishes.