Hi,
We've been occasionally seeing HDF5 read failures in our production
environment (using HDF5 1.8.4, C++ packet table API) so are attempting to
upgrade to 1.8.10 in the hope that it might fix things. Unfortunately the
problem appears to now be worse ...
To give you an example of the kind of weirdness we're seeing, we have a
particular file with the following header (as per h5dump):
HDF5 "HotSpot_FX_filtered_NZDUSD-TheoreticalQuote.h5" {
GROUP "/" {
DATASET "TheoreticalQuote" {
DATATYPE H5T_COMPOUND {
H5T_STD_I64LE "TimeStamp";
H5T_IEEE_F64LE "BidPrice";
H5T_IEEE_F64LE "AskPrice";
H5T_IEEE_F64LE "Volume";
H5T_IEEE_F64LE "LastInputBidPrice";
H5T_IEEE_F64LE "LastInputAskPrice";
}
DATASPACE SIMPLE { ( 28851988 ) / ( H5S_UNLIMITED ) }
}
}
}
As you can see this file (150MB in size, compressed) has ~28M records. If
we try to read a few records at the end, we succeed:
$ h5dump --dataset TheoreticalQuote -s 28851970 -c 5
HotSpot_FX_filtered_NZDUSD-TheoreticalQuote.h5 | tail -15
0.83743,
0.83745
},
(28851974): {
3564222274822547,
0.83743,
0.83745,
nan,
0.83743,
0.83745
}
}
}
}
}
If we try to read a large set of records (300K) in the middle, we also
succeed, but only sometimes!:
$ h5dump --dataset TheoreticalQuote -s 15000000 -c 300000
HotSpot_FX_filtered_NZDUSD-TheoreticalQuote.h5 | tail -15
0.82127,
0.82144
},
(15299999): {
3558294916506950,
0.82127,
0.82144,
nan,
0.82127,
0.82144
}
}
}
}
}
Trying a different starting point, we don't get an error per se, but where
are the results?
$ h5dump --dataset TheoreticalQuote -s 14700000 -c 300000
HotSpot_FX_filtered_NZDUSD-TheoreticalQuote.h5 | tail -15
H5T_IEEE_F64LE "Volume";
H5T_IEEE_F64LE "LastInputBidPrice";
H5T_IEEE_F64LE "LastInputAskPrice";
}
DATASPACE SIMPLE { ( 28851988 ) / ( H5S_UNLIMITED ) }
SUBSET {
START ( 14700000 );
STRIDE ( 1 );
COUNT ( 300000 );
BLOCK ( 1 );
DATA {
}
}
}
}
Finally, these peculiarities probably suggest a subtly corrupt file and
explain why our application using the packet table API fails to read this
particular file at this offset, as per our log:
2012-Dec-12 08:18:58.656324[0x00007faae7fff700]: DEBUG:
dataStoreLib.BufferedFile(NZDUSD): reading from file
/home/ligerdemo/data/HotSpot/FX/filtered/NZDUSD/HotSpot_FX_filtered_NZDUSD-TheoreticalQuote.h5,
earliest first = true, page *start index = 14700000, page end index =
15000000*, start index = 14700000, end index = 28920777
2012-Dec-12 08:18:58.662190[0x00007faae7fff700]: ERROR: HDF5: seq: 0 file:
/home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Zdeflate.c function:
H5Z_filter_deflate line: 125 desc: inflate() failed
2012-Dec-12 08:18:58.662214[0x00007faae7fff700]: ERROR: HDF5: seq: 1 file:
/home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Z.c function:
H5Z_pipeline line: 1120 desc: filter returned failure during read
2012-Dec-12 08:18:58.662220[0x00007faae7fff700]: ERROR: HDF5: seq: 2 file:
/home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Dchunk.c function:
H5D__chunk_lock line: 2766 desc: data pipeline read failed
2012-Dec-12 08:18:58.662225[0x00007faae7fff700]: ERROR: HDF5: seq: 3 file:
/home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Dchunk.c function:
H5D__chunk_read line: 1735 desc: unable to read raw data chunk
2012-Dec-12 08:18:58.662229[0x00007faae7fff700]: ERROR: HDF5: seq: 4 file:
/home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Dio.c function:
H5D__read line: 449 desc: can't read data
2012-Dec-12 08:18:58.662242[0x00007faae7fff700]: ERROR: HDF5: seq: 5 file:
/home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Dio.c function:
H5Dread line: 174 desc: can't read data
2012-Dec-12 08:18:58.662257[0x00007faae7fff700]: CRITICAL: File::File:
Failed to get records between indexes *14700000, 14999999* from file
/home/ligerdemo/data/HotSpot/FX/filtered/NZDUSD/HotSpot_FX_filtered_NZDUSD-TheoreticalQuote.h5
Things to note:
1. The "corrupt" file in question was originally created using the HDF5
1.8.4 API, and is now being read/ appended using HDF5 1.8.10.
2. Our application tries to read this file using the 1.8.10 API
3. The h5dump utility used above is an old version - 1.8.4 - though I do
not think this is relevant due to the application failing to read also.
My basic question is - has anyone seen this kind of invisible file
corruption before, and if so do you know what might cause this? Also, I'm
wondering if perhaps we're not shutting down / closing files correctly,
which is causing these corruption problems .... right now our code
constructs a H5::CompType object, a H5::H5File object, and a FL_PacketTable
object in that order per file, then destructs in the reverse order .... is
that sufficient or should we be calling a global shutdown routine as well?
Any help on this would be very, very appreciated.
Thanks