Requesting update on blocker bug HDFFV-10300


#1

Please see https://jira.hdfgroup.org/browse/HDFFV-10300

And also the original mailing list thread here: HDF lib incompatible with HDF file spec? (slightly mangled in the Discourse transition).

Frustratingly, it’s still not possible for outsiders to comment on bugs in JIRA, so I’m posting here.

Can someone give an update on what happened with this bug? It was reported in 2017 and marked as Priority: Blocker, but there has been no activity on it since then.

For all I can see, @Markus found a serious issue in the HDF5 library that makes it corrupt files that was not written by the HDF5 library itself. Possibly due to it assuming it was itself who wrote it and making too liberal assumptions about the physical file layout.

I had a look at the file Markus provided (sizeoptimized.h5), and from what I can see this is a valid HDF5 file. It passes checks using h5check:

[estan@newton hdf5bug]$ h5check-inst/bin/h5check -v2 sizeoptimized.h5 
VERBOSE is true:verbose # = 2

VALIDATING sizeoptimized.h5 according to library version 1.8.0 

FOUND super block signature
VALIDATING the super block at physical address 0...
Validating version 0/1 superblock...
INITIALIZING filters ...
VALIDATING the object header at logical address 96...
VALIDATING version 1 object header...
Version 1 object header encountered
VALIDATING the local heap at logical address 184...
FOUND local heap signature.
VALIDATING version 1 btree at logical address 136...
FOUND version 1 btree signature.
VALIDATING the Symbol table node at logical address 304...
FOUND Symbol table node signature.
VALIDATING the object header at logical address 432...
VALIDATING version 1 object header...
Version 1 object header encountered
VALIDATING the object header at logical address 720...
VALIDATING version 1 object header...
Version 1 object header encountered
VALIDATING the object header at logical address 992...
VALIDATING version 1 object header...
Version 1 object header encountered
No non-compliance errors found
[estan@newton hdf5bug]$

It’s possible to list the file:

[estan@newton hdf5bug]$ h5ls -r sizeoptimized.h5 
/                        Group
/test1                   Dataset {10/Inf}
/test2                   Dataset {1}
/test3                   Dataset {10/Inf}
[estan@newton hdf5bug]$

And dump out a dataset, say /test1:

[estan@newton hdf5bug]$ h5dump -d /test1 sizeoptimized.h5 
HDF5 "sizeoptimized.h5" {
DATASET "/test1" {
   DATATYPE  H5T_COMPOUND {
      H5T_IEEE_F32LE "valuef";
      H5T_IEEE_F64LE "valued";
   }
   DATASPACE  SIMPLE { ( 10 ) / ( H5S_UNLIMITED ) }
   DATA {
   (0): {
         0,
         0
      }, {
         0,
         0
      },
   (2): {
         0,
         0
      }, {
         0,
         0
      },
   (4): {
         0,
         0
      }, {
         0,
         0
      },
   (6): {
         0,
         0
      }, {
         0,
         0
      },
   (8): {
         0,
         0
      }, {
         0,
         0
      }
   }
}
}
[estan@newton hdf5bug]$

I’ve also browsed around the file using h5debug, and I can’t see anything suspicious, though the tool is not very convenient and I didn’t check every single number.

So this file produced by @Markus embedded code looks like an OK HDF5 file.

However, run the following program on it, which simply uses the HDF5 library to add another compound dataset /test4 to the file, and it gets silently corrupted:

/*
 * Adds a /test4 compound dataset to the file given on command line.
 */
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

#include <hdf5.h>

typedef struct {
    short v1;
    float v2;
} sensor_t;

int main(int argc, char *argv[]) {
    hid_t file = H5Fopen(argv[1], H5F_ACC_RDWR, H5P_DEFAULT);
    assert(file >= 0);

    hid_t memtype = H5Tcreate(H5T_COMPOUND, sizeof(sensor_t));
    assert(H5Tinsert(memtype, "v1", HOFFSET(sensor_t, v1), H5T_NATIVE_SHORT) >= 0);
    assert(H5Tinsert(memtype, "v2", HOFFSET(sensor_t, v2), H5T_NATIVE_FLOAT) >= 0);

    hid_t filetype = H5Tcreate(H5T_COMPOUND, 6);
    assert(H5Tinsert(filetype, "v1", 0, H5T_STD_I16LE) >= 0);
    assert(H5Tinsert(filetype, "v2", 2, H5T_IEEE_F32LE) >= 0);

    hsize_t dims[1] = {1};
    hsize_t max_dims[1] = {H5S_UNLIMITED};
    hid_t space = H5Screate_simple(1, dims, max_dims);
    assert(space >= 0);

    hid_t dcpl = H5Pcreate(H5P_DATASET_CREATE);
    assert(dcpl >= 0);
    hsize_t chunk[1] = {6};
    assert(H5Pset_chunk(dcpl, 1, chunk) >= 0);

    hid_t dset = H5Dcreate(file, "/test4", filetype, space, H5P_DEFAULT, dcpl, H5P_DEFAULT);
    assert(dset >= 0);

    sensor_t data[1];
    data[0].v1 = 1;
    data[0].v2 = 2.0;
    assert(H5Dwrite(dset, memtype, H5S_ALL, H5S_ALL, H5P_DEFAULT, data) >= 0);

    assert(H5Dclose(dset) >= 0);
    assert(H5Sclose(space) >= 0);
    assert(H5Tclose(filetype) >= 0);
    assert(H5Fclose(file) >= 0);

    return 0;
}
[estan@newton hdf5bug]$ gcc -Lhdf5-inst/lib -o add_dataset -Ihdf5-inst/include add_dataset.c -lhdf5
[estan@newton hdf5bug]$ ./add_dataset sizeoptimized.h5 
[estan@newton hdf5bug]$ h5check-inst/bin/h5check -v2 sizeoptimized.h5   
VERBOSE is true:verbose # = 2

VALIDATING sizeoptimized.h5 according to library version 1.8.0 

FOUND super block signature
VALIDATING the super block at physical address 0...
Validating version 0/1 superblock...
INITIALIZING filters ...
VALIDATING the object header at logical address 96...
VALIDATING version 1 object header...
Version 1 object header encountered
VALIDATING the local heap at logical address 184...
FOUND local heap signature.
VALIDATING version 1 btree at logical address 136...
FOUND version 1 btree signature.
VALIDATING the Symbol table node at logical address 304...
FOUND Symbol table node signature.
VALIDATING the object header at logical address 432...
VALIDATING version 1 object header...
***Error***
Object Header:corrupt object header at addr 681
Object Header:corrupt object header at addr 674
Object Header:corrupt object header at addr 667
Object Header:corrupt object header at addr 660
Object Header:corrupt object header at addr 653
Object Header:corrupt object header at addr 646
Object Header:corrupt object header at addr 639
Object Header:corrupt object header at addr 632
Object Header:corrupt object header at addr 625
Object Header:corrupt object header at addr 618
Object Header:corrupt object header at addr 611
Object Header:corrupt object header at addr 604
Object Header:corrupt object header at addr 597
Object Header:corrupt object header at addr 590
Object Header:corrupt object header at addr 583
Object Header:corrupt object header at addr 576
Object Header:corrupt object header at addr 569
Object Header:corrupt object header at addr 562
Object Header:corrupt object header at addr 555
Object Header:corrupt object header at addr 548
Object Header:corrupt object header at addr 541
Object Header:corrupt object header at addr 534
Object Header:corrupt object header at addr 527
Object Header:corrupt object header at addr 520
Object Header:corrupt object header at addr 513
Object Header:corrupt object header at addr 506
Object Header:corrupt object header at addr 499
Object Header:corrupt object header at addr 492
Object Header:corrupt object header at addr 485
Object Header:corrupt object header at addr 478
Object Header:corrupt object header at addr 471
Version 1 Object Header:Bad version number at addr 432; Value decoded: 32
***End of Error messages***
***Error***
Errors found when decoding message at addr 1208
Dataspace Message v.1:Corrupt flags at addr 1210
***End of Error messages***
VALIDATING the object header at logical address 720...
VALIDATING version 1 object header...
Version 1 object header encountered
VALIDATING the object header at logical address 992...
VALIDATING version 1 object header...
Version 1 object header encountered
VALIDATING the object header at logical address 1280...
VALIDATING version 1 object header...
Version 1 object header encountered
VALIDATING version 1 btree at logical address 1552...
FOUND version 1 btree signature.
Non-compliance errors found
[estan@newton hdf5bug]$

The above corruption does not happen if this program is run against a file that was produced by the HDF5 library itself.

(In his own testing @Markus used HDFView, but the above program is the minimal equivalent)

Could someone from the HDF5 group please look at this bug? The last comment in JIRA mentioned that this would be brought up in a SE meeting on 16 October 2017, but after that it has been silent.

In my tests above, I was using HDF5 1.10.5 and h5check 2.0.1, both compiled from Git.

I’m suprised that this issue has not been given more attention, since it’s a silent data loss bug. I would say that it prevents people from implementing their own HDF5 writers, since they now have to fear that what they write will be destroyed if the file is later extended using the offical HDF5 library.

@Barbara_Jones @epourmal @koziol


#2

I added the following debug printout to the code that flushes the metadata cache:

diff --git a/src/H5C.c b/src/H5C.c
index 288f3db7a1..71ec201d73 100644
--- a/src/H5C.c
+++ b/src/H5C.c
@@ -5841,6 +5841,7 @@ H5C__flush_ring(H5F_t *f, H5C_ring_t ring, unsigned flags)
                     protected_entries++;
                 } /* end if */
                 else {
+                    printf("flushing %s to addr %ld size %ld\n", entry_ptr->type->name, entry_ptr->addr, entry_ptr->size);
                     if(H5C__flush_single_entry(f, entry_ptr, (flags | H5C__DURING_FLUSH_FLAG)) < 0)
                         HGOTO_ERROR(H5E_CACHE, H5E_CANTFLUSH, FAIL, "Can't flush entry")

The result when running my test program:

[estan@newton hdf5bug]$ ./add_dataset sizeoptimized.h5 
flushing Superblock to addr 0 size 96
flushing v1 B-tree to addr 136 size 544
flushing local heap prefix to addr 184 size 120
flushing Symbol table node to addr 304 size 328
flushing object header to addr 1280 size 272
flushing v1 B-tree to addr 1552 size 2096
flushing Superblock to addr 0 size 96
[estan@newton hdf5bug]$

As you can see, the writeout of the v1 B-tree of size 544 to address 136 (which I believe is the root group symbol table) will overwrite the (pre-existing) local heap of size 120 at address 184. I’ve verified that the flush actually results in a write (H5FD_sec2_write gets called).

As to why that B-tree cache entry has gotten an addr / size which overlaps a preexisting piece of data, I haven’t figured out yet. Perhaps someone familiar with the code can have a look?


#3

Elvis,

I apologize. The issue fell through the cracks due to our bandwidth. We have to work on our customers’ projects to keep HDF5 going and we are not able to address the JIRA issues not related to the projects in a timely manner.

The investigation you and Markus provided is very helpful. I’ll try to find resources to look into the issue, but cannot promise.

Elena


#4

Alright, no worries Elena, I was mostly curious what had happened. I found the issue interesting since not many people attempt to write alternative writers/readers of HDF5. And I understand that commercial priorities govern what issues you can take on.

I can try to debug further in my spare time, but no promises from my end either :slight_smile: (the HDF5 code is a vast landscape…)


#5

Elvis and all,

We really appreciate the effort of making alternative HDF5 implementations. The current library became extremely “heavy”. I am sure, many HDF5 users including some of our customers will appreciate HDF5 optimized (or “lite”) version.

The first alternative implementation of HDF5 reader (pure Java reader from Unidata) discovered several issues in the File Format Spec that were fixed. This is another opportunity for us to enhance our documentation and to address issues in the HDF5 library and tools.

Thanks again for bringing our attention to this problem.

Elena


#6

@epourmal I saw that the issue (https://jira.hdfgroup.org/browse/HDFFV-10300) is now marked as Done. Do you know which commit fixed it? I’m curious what the resolution was.


#7

Hello!

The ticket will not be fixed. It was concluded that it was a user error. Please see the latest update to the ticket. Yesterday I accidentally moved the ticket without updating it first.

Thank you!

Elena


#8

Many thanks for the ticket update @epourmal