I have an application which has been processing 100's of GB of raw data and generating HDF5 file daily for many years now. I recently attempted to upgrade it from using hdf5-1.8.3 to hdf5-1.8.13, and have been encountering errors with a traceback like the following:
HDF5-DIAG: Error detected in HDF5 (1.8.13) thread 1126189376:
#000: H5D.c line 369 in H5Dopen2(): can't open dataset
major: Dataset
minor: Unable to initialize object
#001: H5Dint.c line 1147 in H5D_open(): not found
major: Dataset
minor: Object not found
#002: H5Dint.c line 1247 in H5D__open_oid(): unable to register type
major: Dataset
minor: Unable to register new atom
#003: H5I.c line 895 in H5I_register(): can't insert ID node into skip list
major: Object atom
minor: Unable to insert object
#004: H5SL.c line 995 in H5SL_insert(): can't create new skip list node
major: Skip Lists
minor: Unable to insert object
#005: H5SL.c line 687 in H5SL_insert_common(): can't insert duplicate key
major: Skip Lists
minor: Unable to insert object
Unfortunately, the error is somewhat non-deterministic - it happens in 100% of the runs of the application, but not always on the same dataset of the same file each time. The one thing that is repeatable is that it only occurs after several hours (and some 50+ GB of data, written into some thousands of HDF5 files) into the run, making it rather difficult to isolate into a simple test case!
My application is multithreaded, and the library is built with threading enabled (see full libhdf5.settings below). The application itself is written in C++, but uses only the HDF5 C library.
The general application structure is that data is written into several thousand newly-created HDF5 files. Within each file there are 100s - 1000s of groups (one level deep), each of which has about 10 datasets. Every group and every dataset has anywhere from 1 to 10 attributes (almost all are single variable-length string scalar values), and contains a one-dimensional array of integer data (chunked and compressed with SZIP). Each file is created in a single thread (synchronized at the application level), but then multiple threads are used to create each group, each dataset within each group, the attributes on the groups and datasets, and the data for each dataset, depending on the libhdf5 global mutex for synchronization. However, never will more than one thread attempt to access or modify any single object within the HDF5 file except for the file object itself.
The above skip list error has only ever occurred while opening a dataset prior to creating an attribute on that dataset. Thus, I am quite confident that only a single thread could ever be opening any particular dataset, and in any case, the global mutex in libhdf5 should make H5Fopen2() entirely threadsafe anyway.
Note that I never close an HDF5 file until it is complete, and once closed, the file is never re-opened for additional updates.
I have had to revert to hdf5-1.8.3 for the time being, but any guidance/assistance in resolving this issue with hdf5-1.8.13 would be appreciated. >From searching the forum, I have encountered only one other report of a problem which appears to be either identical or related: http://hdf-forum.184993.n3.nabble.com/H5SL-insert-common-can-t-insert-duplicate-key-td4026817.html. There does not appear to have been any resolution to that issue, which happened to occur with hdf5-1.8.11, so apparently it's been around for a while.
Sincerely,
Stephen Pope
SUMMARY OF THE HDF5 CONFIGURATION
···
=================================
General Information:
-------------------
HDF5 Version: 1.8.13
Configured on: Wed Jun 11 16:25:23 MDT 2014
Configured by: scp@nxscp at Prediction Company, Santa Fe, NM, USA
Configure mode: production
Host system: x86_64-unknown-linux-gnu
Uname information: Linux nxscp 2.6.18-348.el5 #1 SMP Tue Jan 8 17:53:53 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
Byte sex: little-endian
Libraries: shared
Installation point: /apps/prediction/thirdparty/hdf5-1.8.13/build.opt.x86_64.rhel5.gcc4
Compiling Options:
------------------
Compilation Mode: production
C Compiler: /usr/local/pkg/gcc-4.8.1/bin/gcc ( gcc (GCC) 4.8.1)
CFLAGS: -march=core2 -mtune=corei7-avx -pthread
H5_CFLAGS: -std=c99 -pedantic -Wall -Wextra -Wundef -Wshadow -Wpointer-arith -Wbad-function-cast -Wcast-qual -Wcast-align -Wwrite-strings -Wconversion -Waggregate-return -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wredundant-decls -Wnested-externs -Winline -Wfloat-equal -Wmissing-format-attribute -Wmissing-noreturn -Wpacked -Wdisabled-optimization -Wformat=2 -Wunreachable-code -Wendif-labels -Wdeclaration-after-statement -Wold-style-definition -Winvalid-pch -Wvariadic-macros -Winit-self -Wmissing-include-dirs -Wswitch-default -Wswitch-enum -Wunused-macros -Wunsafe-loop-optimizations -Wc++-compat -Wstrict-overflow -Wlogical-op -Wlarger-than=2048 -Wvla -Wsync-nand -Wframe-larger-than=16384 -Wpacked-bitfield-compat -Wstrict-overflow=5 -Wjump-misses-init -Wunsuffixed-float-constants -Wdouble-promotion -Wsuggest-attribute=const -Wtrampolines -Wstack-usage=8192 -Wvector-operation-performance -Wsuggest-attribute=pure -Wsuggest-attribute=noreturn -Wsuggest-attribute=format -O3 -fomit-frame-pointer -finline-functions
AM_CFLAGS:
CPPFLAGS:
H5_CPPFLAGS: -D_POSIX_C_SOURCE=199506L -DNDEBUG -UH5_DEBUG_API
AM_CPPFLAGS: -I/home/scp/svn/szip/build.opt.x86_64.rhel5.gcc4/include -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_BSD_SOURCE
Shared C Library: yes
Static C Library: no
Statically Linked Executables: no
LDFLAGS:
H5_LDFLAGS:
AM_LDFLAGS: -L/home/scp/svn/szip/build.opt.x86_64.rhel5.gcc4/lib
Extra libraries: -lpthread -lsz -lz -lrt -ldl -lm
Archiver: ar
Ranlib: ranlib
Debugged Packages:
API Tracing: no
Languages:
----------
Fortran: no
C++: no
Features:
---------
Parallel HDF5: no
High Level library: yes
Threadsafety: yes
Default API Mapping: v18
With Deprecated Public Symbols: yes
I/O filters (external): deflate(zlib),szip(encoder)
I/O filters (internal): shuffle,fletcher32,nbit,scaleoffset
MPE: no
Direct VFD: no
dmalloc: no
Clear file buffers before write: yes
Using memory checker: no
Function Stack Tracing: no
Strict File Format Checks: no
Optimization Instrumentation: no
Large File Support (LFS): yes
######################################################################
The information contained in this communication is confidential and
may contain information that is privileged or exempt from disclosure
under applicable law. If you are not a named addressee, please notify
the sender immediately and delete this email from your system.
If you have received this communication, and are not a named
recipient, you are hereby notified that any dissemination,
distribution or copying of this communication is strictly prohibited.
######################################################################