HDF5 read error when reading huge file

Hi,

I’m using HDF 1.12.0 to write and read HDF5 files on RedHat 7.x Linux. We’re using the standard C++ APIs to write/read the HDF5 files.

Writer side:
m_file = std::make_shared(, H5F_ACC_TRUNC);

Reader side:
m_file = std::make_shared(, H5F_ACC_RDONLY);

Recently, our customer created a HDF5 file that is about 283G and while attempting to read it, getting following errors:

===
Filename: …/itinglu/wdb/input.emirtap.emir0.wdb
HDF5-DIAG: Error detected in HDF5 (1.12.0) thread 0:
#000: H5O.c line 778 in H5Oget_native_info_by_name(): can’t get native file format info for object: ‘/’
major: Object header
minor: Can’t get value
#001: H5VLcallback.c line 5870 in H5VL_object_optional(): unable to execute object optional callback
major: Virtual Object Layer
minor: Can’t operate on object
#002: H5VLcallback.c line 5833 in H5VL__object_optional(): unable to execute object optional callback
major: Virtual Object Layer
minor: Can’t operate on object
#003: H5VLnative_object.c line 546 in H5VL__native_object_optional(): object not found
major: Object header
minor: Object not found
#004: H5Gloc.c line 921 in H5G_loc_native_info(): can’t find object
major: Symbol table
minor: Object not found
#005: H5Gtraverse.c line 855 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found
#006: H5Gtraverse.c line 769 in H5G__traverse_real(): traversal operator failed
major: Symbol table
minor: Can’t move to next iterator location
#007: H5Gloc.c line 877 in H5G__loc_native_info_cb(): can’t get object info
major: Symbol table
minor: Can’t get value
#008: H5Oint.c line 2378 in H5O_get_native_info(): can’t retrieve object’s btree & heap info
major: Object header
minor: Can’t get value
#009: H5Goh.c line 402 in H5O__group_bh_info(): can’t retrieve symbol table size info
major: Symbol table
minor: Can’t get value
#010: H5Gstab.c line 671 in H5G__stab_bh_size(): iteration operator failed
major: B-Tree node
minor: Unable to initialize object
#011: H5B.c line 1993 in H5B_get_info(): B-tree iteration failed
major: B-Tree node
minor: Iteration failed
#012: H5B.c line 1943 in H5B__get_info_helper(): unable to list B-tree node
major: B-Tree node
minor: Unable to list node
#013: H5B.c line 1900 in H5B__get_info_helper(): unable to load B-tree node
major: B-Tree node
minor: Unable to protect metadata
#014: H5AC.c line 1312 in H5AC_protect(): H5C_protect() failed
major: Object cache
minor: Unable to protect metadata
#015: H5C.c line 2346 in H5C_protect(): can’t load entry
major: Object cache
minor: Unable to load metadata into cache
#016: H5C.c line 6598 in H5C_load_entry(): Can’t read image*
major: Object cache
minor: Read failed
#017: H5Fio.c line 161 in H5F_block_read(): read through page buffer failed
major: Low-level I/O
minor: Read failed
#018: H5PB.c line 736 in H5PB_read(): read through metadata accumulator failed
major: Page Buffering
minor: Read failed
#019: H5Faccum.c line 212 in H5F__accum_read(): driver read request failed
major: Low-level I/O
minor: Read failed
#020: H5FDint.c line 193 in H5FD_read(): addr overflow, addr = 109808716816, size = 544, eoa = 2048
major: Invalid arguments to routine
minor: Address overflowed
h5stat error: unable to traverse objects/links in file “…/itinglu/wdb/input.emirtap.emir0.wdb”
H5tools-DIAG: Error detected in HDF5:tools (1.12.0) thread 0:
#000: h5trav.c line 1080 in h5trav_visit(): traverse failed
major: Failure in tools library
minor: error in function
#001: h5trav.c line 296 in traverse(): H5Lvisit_by_name failed
major: Failure in tools library
minor: error in function
#002: h5stat.c line 749 in obj_stats(): H5Oget_native_info_by_name failed
major: Failure in tools library
minor: error in function

===

I’ve tried to use h5ls and h5dump utilities to debug the issue further but haven’t been able to root cause the issue. Any pointers on how to debug this problem? And how can this type of issue be avoided in the future?

The tool/program can’t locate the root group. This can happen if the producing application crashes and fails to close the file properly. To get an idea of what’s in the file, can you run

strings -n 4 -t d your_file_name | grep -E 'BTHD|BTIN|BTLF|EADB|EAHD|EAIB|EASB|FADB|FAHD|FHDB|FHIB|FRHP|FSHD|FSSE|GCOL|HEAP|OCHK|OHDR|SMLI|SMTB|SNOD|TREE'

and send us the output?

G.

Hi Gerd,

Thanks for the follow up.

I’m running the command now and it is already more than 400K lines. Please let me know if I can send just a few lines – the top 100 - 200 lines, maybe?

Thanks
-Kat

Yes, let’s start with that (100 - 200 lines). G.

Here it is:

=====

136 TREE
680 HEAP

1008 TREE
1552 HEAP
1672 SNOD
2048 GCOL
6144 TREE
6688 HEAP
6808 SNOD
7176 TREE
7720 HEAP
9678 $dSNOD
1599608 GCOL
1665144 GCOL
1730680 GCOL
1796216 GCOL
1861752 GCOL
1927288 GCOL
1992824 GCOL
2058360 GCOL
2123896 GCOL
2189432 GCOL
19126504 GCOL
19192040 GCOL
19257576 GCOL
19323112 GCOL
19388648 GCOL
19454184 GCOL
19519720 GCOL
19585256 GCOL
19650792 GCOL
19716328 GCOL
19781864 GCOL
19847400 GCOL
19912936 GCOL
19978472 GCOL
20044008 GCOL
20109544 GCOL
20175080 GCOL
20240616 GCOL
20306152 GCOL
20371688 GCOL
20437224 GCOL
20502760 GCOL
20568296 GCOL
20633832 GCOL
20699368 GCOL
20764904 GCOL
20830440 GCOL
20895976 GCOL
20961512 GCOL
21027048 GCOL
21092584 GCOL
21158120 GCOL
21223656 GCOL
21289192 GCOL
21354728 GCOL
21420184 Xxb4_wrapper_left/bank_0_1/right_decoder_array/decoder/wldec[34]/decap1[0]/MM0#gGCOL
21485800 GCOL
21551336 GCOL
21616872 GCOL
21682408 GCOL
21747944 GCOL
21813480 GCOL
21879016 GCOL
21944552 GCOL
22010088 GCOL
22075624 GCOL
22141080 Xxb4_wrapper_left/bank_2_3/right_decoder_array/decoder/wldec[60]/decap2[5]/MM0#gGCOL
22206696 GCOL
22272232 GCOL
22337768 GCOL
22403304 GCOL
22468840 GCOL
22534376 GCOL
22599912 GCOL
22665448 GCOL
22730984 GCOL
22796520 GCOL
22862056 GCOL
22927592 GCOL
22993128 GCOL
23058664 GCOL
23124200 GCOL
23189736 GCOL
23255272 GCOL
23320808 GCOL
23386344 GCOL
23451792 Xxb4_wrapper_right/bank_2_3/array_decoder_right/array_top/array_bot[32]/Itrk_col/MPpu1#gGCOL
23517328 Xxb4_wrapper_right/bank_0_1/right_decoder_array/array_top/array_top[37]/Itrk_col/MPpu1#gGCOL
23582952 GCOL
23648488 GCOL
23714024 GCOL
23779560 GCOL
23845096 GCOL
23910632 GCOL
23976168 GCOL
24041704 GCOL
24107240 GCOL
24172776 GCOL
24238312 GCOL
24303848 GCOL
24369384 GCOL
24434920 GCOL
24500456 GCOL
24565992 GCOL
24631528 GCOL
24697064 GCOL
24762600 GCOL
24828136 GCOL
24893672 GCOL
24959208 GCOL
25024744 GCOL
25090280 GCOL
25155816 GCOL
25221352 GCOL
25286888 GCOL
25352424 GCOL
25417960 GCOL
25483496 GCOL
25549032 GCOL
25614568 GCOL
25680104 GCOL
25745640 GCOL
25811176 GCOL
25876712 GCOL
25942248 GCOL
26007784 GCOL
26073320 GCOL
26138856 GCOL
26204392 GCOL
26269928 GCOL
26335464 GCOL
26401000 GCOL
26466536 GCOL
26532072 GCOL
26597608 GCOL
26663144 GCOL
26728680 GCOL
26794216 GCOL
26859752 GCOL
26925288 GCOL
26990824 GCOL
27056360 GCOL
27121896 GCOL
27187432 GCOL
27252968 GCOL
27318504 GCOL
27384040 GCOL
27449576 GCOL
27515112 GCOL
27580648 GCOL
27646184 GCOL
27711720 GCOL
27777256 GCOL
27842792 GCOL
27908328 GCOL
27973864 GCOL
28039400 GCOL
28104936 GCOL
28170472 GCOL
28236008 GCOL
28301544 GCOL
28367080 GCOL
28432616 GCOL
28498152 GCOL
28563688 GCOL
28629224 GCOL
28694760 GCOL
28760296 GCOL
28825832 GCOL
28891368 GCOL
42975144 GCOL
43040680 GCOL
43106216 GCOL
43171656 Xxb4_wrapper_right/bank_2_3/array_decoder_left/decoder/wldec[38]/wldec3/wldrv_bot/pmos[13]/MM0#sGCOL
43237288 GCOL
43302824 GCOL
43368264 Xxb4_wrapper_right/bank_2_3/array_decoder_left/decoder/wldec[13]/wldec1/wldrv_top/pmos[2]/MM0#sGCOL
43433896 GCOL
43499432 GCOL
43564968 GCOL
43630504 GCOL
43696040 GCOL
43761576 GCOL
43827112 GCOL
43892648 GCOL
43958184 GCOL
44023720 GCOL
44089256 GCOL
44154792 GCOL
44220328 GCOL
44285864 GCOL
44351400 GCOL
44416936 GCOL
44482472 GCOL
44548008 GCOL
44613544 GCOL
44678984 Xxb4_wrapper_right/bank_2_3/lio_bot/lio_bef_repeater/lio21/pch_block_left[4]/pch_bl_top[0]/MM0#sGCOL
44744616 GCOL

=====

This doesn’t look unusual. Maybe let’s take a step back. How did you obtain that error stack? Can you run tools such as h5dump or h5stat on the file? What do

h5dump -pBH --enable-error-stack your_file_name

or

h5stat --enable-error-stack your_file_name

return?

G.

Here’s the output from running h5dump:


HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 0:
#000: H5L.c line 1516 in H5Lvisit_by_name2(): link visitation failed
major: Links
minor: Iteration failed
#001: H5VLcallback.c line 5173 in H5VL_link_specific(): unable to execute link specific callback
major: Virtual Object Layer
minor: Can’t operate on object
#002: H5VLcallback.c line 5136 in H5VL__link_specific(): unable to execute link specific callback
major: Virtual Object Layer
minor: Can’t operate on object
#003: H5VLnative_link.c line 364 in H5VL__native_link_specific(): link visitation failed
major: Links
minor: Iteration failed
#004: H5Gint.c line 1118 in H5G_visit(): can’t visit links
major: Symbol table
minor: Iteration failed
#005: H5Gobj.c line 673 in H5G__obj_iterate(): can’t iterate over symbol table
major: Symbol table
minor: Iteration failed
#006: H5Gstab.c line 521 in H5G__stab_iterate(): unable to protect symbol table heap
major: Symbol table
minor: Protected metadata error
#007: H5HL.c line 351 in H5HL_protect(): unable to load heap data block
major: Heap
minor: Unable to protect metadata
#008: H5AC.c line 1426 in H5AC_protect(): H5C_protect() failed
major: Object cache
minor: Unable to protect metadata
#009: H5C.c line 2370 in H5C_protect(): can’t load entry
major: Object cache
minor: Unable to load metadata into cache
#010: H5C.c line 7209 in H5C__load_entry(): Can’t read image*
major: Object cache
minor: Read failed
#011: H5Fio.c line 148 in H5F_block_read(): read through page buffer failed
major: Low-level I/O
minor: Read failed
#012: H5PB.c line 721 in H5PB_read(): read through metadata accumulator failed
major: Page Buffering
minor: Read failed
#013: H5Faccum.c line 208 in H5F__accum_read(): driver read request failed
major: Low-level I/O
minor: Read failed
#014: H5FDint.c line 184 in H5FD_read(): addr overflow, addr = 57873339304, size = 5767168, eoa = 2048
major: Invalid arguments to routine
minor: Address overflowed
h5dump error: internal error (file h5dump.c:line 1471)
H5tools-DIAG: Error detected in HDF5:tools (1.12.2) thread 0:
#000: h5tools_utils.c line 795 in init_objs(): finding shared objects failed
major: Failure in tools library
minor: error in function
#001: h5trav.c line 1058 in h5trav_visit(): traverse failed
major: Failure in tools library
minor: error in function
#002: h5trav.c line 290 in traverse(): H5Lvisit_by_name failed
major: Failure in tools library
minor: error in function


And here is the output from running h5stat:


HDF5-DIAG: Error detected in HDF5 (1.12.2) thread 0:
#000: H5O.c line 769 in H5Oget_native_info_by_name(): can’t get native file format info for object: ‘/’
major: Object header
minor: Can’t get value
#001: H5VLcallback.c line 5824 in H5VL_object_optional(): unable to execute object optional callback
major: Virtual Object Layer
minor: Can’t operate on object
#002: H5VLcallback.c line 5788 in H5VL__object_optional(): unable to execute object optional callback
major: Virtual Object Layer
minor: Can’t operate on object
#003: H5VLnative_object.c line 535 in H5VL__native_object_optional(): object not found
major: Object header
minor: Object not found
#004: H5Gloc.c line 891 in H5G_loc_native_info(): can’t find object
major: Symbol table
minor: Object not found
#005: H5Gtraverse.c line 837 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found
#006: H5Gtraverse.c line 754 in H5G__traverse_real(): traversal operator failed
major: Symbol table
minor: Can’t move to next iterator location
#007: H5Gloc.c line 849 in H5G__loc_native_info_cb(): can’t get object info
major: Symbol table
minor: Can’t get value
#008: H5Oint.c line 2323 in H5O_get_native_info(): can’t retrieve object’s btree & heap info
major: Object header
minor: Can’t get value
#009: H5Goh.c line 389 in H5O__group_bh_info(): can’t retrieve symbol table size info
major: Symbol table
minor: Can’t get value
#010: H5Gstab.c line 649 in H5G__stab_bh_size(): iteration operator failed
major: B-Tree node
minor: Unable to initialize object
#011: H5B.c line 1970 in H5B_get_info(): B-tree iteration failed
major: B-Tree node
minor: Iteration failed
#012: H5B.c line 1921 in H5B__get_info_helper(): unable to list B-tree node
major: B-Tree node
minor: Unable to list node
#013: H5B.c line 1878 in H5B__get_info_helper(): unable to load B-tree node
major: B-Tree node
minor: Unable to protect metadata
#014: H5AC.c line 1426 in H5AC_protect(): H5C_protect() failed
major: Object cache
minor: Unable to protect metadata
#015: H5C.c line 2370 in H5C_protect(): can’t load entry
major: Object cache
minor: Unable to load metadata into cache
#016: H5C.c line 7209 in H5C__load_entry(): Can’t read image*
major: Object cache
minor: Read failed
#017: H5Fio.c line 148 in H5F_block_read(): read through page buffer failed
major: Low-level I/O
minor: Read failed
#018: H5PB.c line 721 in H5PB_read(): read through metadata accumulator failed
major: Page Buffering
minor: Read failed
#019: H5Faccum.c line 202 in H5F__accum_read(): driver read request failed
major: Low-level I/O
minor: Read failed
#020: H5FDint.c line 184 in H5FD_read(): addr overflow, addr = 109808716816, size = 544, eoa = 2048
major: Invalid arguments to routine
minor: Address overflowed
HDF5-DIAG: Error detected in HDF5 1.12.2) thread 0:
#000: H5L.c line 1516 in H5Lvisit_by_name2(): link visitation failed
major: Links
minor: Iteration failed
#001: H5VLcallback.c line 5173 in H5VL_link_specific(): unable to execute link specific callback
major: Virtual Object Layer
minor: Can’t operate on object
#002: H5VLcallback.c line 5136 in H5VL__link_specific(): unable to execute link specific callback
major: Virtual Object Layer
minor: Can’t operate on object
#003: H5VLnative_link.c line 364 in H5VL__native_link_specific(): link visitation failed
major: Links
minor: Iteration failed
#004: H5Gint.c line 1118 in H5G_visit(): can’t visit links
major: Symbol table
minor: Iteration failed
#005: H5Gobj.c line 673 in H5G__obj_iterate(): can’t iterate over symbol table
major: Symbol table
minor: Iteration failed
#006: H5Gstab.c line 521 in H5G__stab_iterate(): unable to protect symbol table heap
major: Symbol table
minor: Protected metadata error
#007: H5HL.c line 351 in H5HL_protect(): unable to load heap data block
major: Heap
minor: Unable to protect metadata
#008: H5AC.c line 1426 in H5AC_protect(): H5C_protect() failed
major: Object cache
minor: Unable to protect metadata
#009: H5C.c line 2370 in H5C_protect(): can’t load entry
major: Object cache
minor: Unable to load metadata into cache
#010: H5C.c line 7209 in H5C__load_entry(): Can’t read image*
major: Object cache
minor: Read failed
#011: H5Fio.c line 148 in H5F_block_read(): read through page buffer failed
major: Low-level I/O
minor: Read failed
#012: H5PB.c line 721 in H5PB_read(): read through metadata accumulator failed
major: Page Buffering
minor: Read failed
#013: H5Faccum.c line 208 in H5F__accum_read(): driver read request failed
major: Low-level I/O
minor: Read failed
#014: H5FDint.c line 184 in H5FD_read(): addr overflow, addr = 57873339304, size = 5767168, eoa = 2048
major: Invalid arguments to routine
minor: Address overflowed
h5stat error: unable to traverse objects/links in file “…/itinglu/wdb/input.emirtap.emir0.wdb”
H5tools-DIAG: Error detected in HDF5:tools (1.12.2) thread 0:
#000: h5trav.c line 1058 in h5trav_visit(): traverse failed
major: Failure in tools library
minor: error in function
#001: h5trav.c line 290 in traverse(): H5Lvisit_by_name failed
major: Failure in tools library
minor: error in function
#002: h5stat.c line 659 in obj_stats(): H5Oget_native_info_by_name failed
major: Failure in tools library
minor: error in function


In both cases, the library attempts to read from addresses beyond the end-of-allocation (eoa), which is rather small (2048) and doesn’t make sense for the file size you’ve quoted. Assuming that the file wasn’t closed properly, it’s likely that certain elements of the superblock weren’t updated. You can obtain a dump of the first 128 bytes by running this:

od -x -N 128 your_file_name

What does that look like?

G.

This is the ‘od’ command output:

0000000 4889 4644 0a0d 0a1a 0000 0000 0800 0008
0000020 0004 0010 0001 0000 0000 0000 0000 0000
0000040 ffff ffff ffff ffff 0800 0000 0000 0000
0000060 ffff ffff ffff ffff 0000 0000 0000 0000
0000100 0060 0000 0000 0000 0001 0000 0000 0000
0000120 0088 0000 0000 0000 02a8 0000 0000 0000
0000140 0001 0004 0001 0000 0018 0000 0000 0000
0000160 0010 0010 0000 0000 0320 0000 0000 0000
0000200

-Kat

For reference, here’s the same dump for h5ex_t_vlstring.h5 from our examples collection:

0000000 4889 4644 0a0d 0a1a 0000 0000 0800 0008
0000020 0004 0010 0000 0000 0000 0000 0000 0000
0000040 ffff ffff ffff ffff 1840 0000 0000 0000
0000060 ffff ffff ffff ffff 0000 0000 0000 0000
0000100 0060 0000 0000 0000 0001 0000 0000 0000
0000120 0088 0000 0000 0000 02a8 0000 0000 0000
0000140 0001 0001 0001 0000 0018 0000 0000 0000
0000160 0011 0010 0000 0000 0088 0000 0000 0000
0000200

The file size is 6,208 bytes or 0x1840. Looking at the file format specification you can spot the End of File Address following the Address of File Free space Info, which is ffff ffff ffff ffff, in this example.

In your example,

0000000 4889 4644 0a0d 0a1a 0000 0000 0800 0008
0000020 0004 0010 0001 0000 0000 0000 0000 0000
0000040 ffff ffff ffff ffff 0800 0000 0000 0000
0000060 ffff ffff ffff ffff 0000 0000 0000 0000
0000100 0060 0000 0000 0000 0001 0000 0000 0000
0000120 0088 0000 0000 0000 02a8 0000 0000 0000
0000140 0001 0004 0001 0000 0018 0000 0000 0000
0000160 0010 0010 0000 0000 0320 0000 0000 0000
0000200

The End of File Address is 0x0800 or 2,048 (bytes).

You can try correcting this by hand when running the app. in a debugger or using your favorite binary editor on a copy of the file.

There could be other issues, but that’d be a start.

G.

Hi Gerd,

Thanks for the details here.

We were able to confirm that the writing application was indeed terminated using SIGTERM signal.

I would like to know if you have any guidance on what a reasonable approach would be when the writing application is terminated in this manner.

  1. Should the HDF5 file be removed, and some sort of warning logged so the end user is aware as to what happened?
  2. I looked up the HDF5 documentation and it seems I could call H5Fflush(H5File::getId(), H5F_SCOPE_GLOBAL) to flush the in-memory buffers to disk. Is this recommended?

Thanks for your insights.

-Kat.

Both are sensible steps to take. How effective this can be, depends a lot on the specifics of the disruption. If it’s not IO-related and HDF5 library structures (in user space!) weren’t compromised, there’s a good chance that flushing (and closing!) will leave things in a sane state. If it is IO-related, e.g., disk full, failed device, or (temporarily) lost connection, the chances of gracefully exiting might be slim. The assumption should be that the HDF5 library has no logic for “taking evasive action.” If a call fails, it fails, and the error stack will have a record of that, but any retry logic or state sanity assessment is on the application.

G.

Hi Gerd,

Thanks for your insights.

Also, we were able to reproduce the abnormal termination of the writing application (which resulted in the corrupt HDF5 file). It is due to an assertion in HDF5 library:

H5C.c:6732: H5C_load_entry: Assertion `entry->size < ((size_t)(32 * 1024 * 1024))’ failed.

The (32 * 1024 * 1024) expression is defined as H5C_MAX_ENTRY_SIZE.

The assertion occurs during the H5File::createDataSet call.

It looks like a similar discussion had happened regarding this assertion on this thread:

But I don’t see any conclusions on that thread.

Could you please let me know what this assertion means and how to overcome it?

Thanks
-Kat.

Can you describe the dataset you are trying to create (datatype, layout, rank, extent, etc.)? G.

Hi Gerd,

The dataset uses compound data type – POD structs with numeric (size_t, int, float, double), string and boolean values. All our datasets have RANK = 1. No chunking (yet).

In this case, the writing application creates ~4M groups and in each group, there would be 2 sub-groups. At each sub-group, there would be about 20 sub-groups. The above-mentioned datasets reside at this level.

So, the structure would be something like:

/net1/group1/layer{1…20}/dataset1
/net1/group2/layer{1…20}/dataset2
.
.
.
/net4000000/group1/layer{1…20}/dataset1
/net4000000/group2/layer{1…20}/dataset2

Each dataset could contain 10s or 100s of millions of entries.

Thanks
-Kat.

Can you reproduce the error at will? Can you provide us with a reproducer? None of the things you are describing sound unusual. The definition’s comment sounds a little equivocating.

/* This sanity-checking constant was picked out of the air.  Increase
 * or decrease it if appropriate.  Its purpose is to detect corrupt
 * object sizes, so it probably doesn't matter if it is a bit big.
 */
#define H5C_MAX_ENTRY_SIZE ((size_t)(32 * 1024 * 1024))

It suggests that we don’t expect cache entries to be big (32 MiB), and nothing from your description suggests anything near that. My hunch is that it has nothing to do with the H5File::createDataSet call but that some corruption (“detect corrupt object sizes”) is occurring in your application or somewhere in the library.

G.

For the given case, I’m able to consistently reproduce the assertion in HDF5 library. I’m not sure if I’d have the bandwidth to try and create a standalone reproducer but will try to do so in the next week or so.

I know that the writer application does not have Valgrind issues like Invalid Read/Write errors. Out of curiosity, I re-ran Valgrind and noticed this:

===
==400120== Syscall param pwrite64(buf) points to uninitialised byte(s)
==400120== at 0x12799FC3: ??? (in /usr/lib64/libpthread-2.17.so)
==400120== by 0x432FAB7: H5FD_sec2_write (H5FDsec2.c:816)
==400120== by 0x43273C8: H5FD_write (H5FDint.c:248)
==400120== by 0x460D996: H5F__accum_write (H5Faccum.c:826)
==400120== by 0x4465781: H5PB_write (H5PB.c:1031)
==400120== by 0x4304040: H5F_block_write (H5Fio.c:251)
==400120== by 0x426A9BA: H5C__flush_single_entry (H5C.c:6109)
==400120== by 0x4272611: H5C__make_space_in_cache (H5C.c:6961)
==400120== by 0x42735A7: H5C_insert_entry (H5C.c:1458)
==400120== by 0x423B279: H5AC_insert_entry (H5AC.c:810)
==400120== by 0x43ED434: H5O__apply_ohdr (H5Oint.c:548)
==400120== by 0x43F40DA: H5O_create (H5Oint.c:316)
==400120== by 0x42A6D53: H5D__update_oh_info (H5Dint.c:1030)
==400120== by 0x42A9C64: H5D__create (H5Dint.c:1373)
==400120== by 0x46071A5: H5O__dset_create (H5Doh.c:300)
==400120== by 0x43F1FB9: H5O_obj_create (H5Oint.c:2521)
==400120== by 0x43AB717: H5L__link_cb (H5L.c:1850)
==400120== by 0x43651E9: H5G__traverse_real (H5Gtraverse.c:629)
==400120== by 0x4365F80: H5G_traverse (H5Gtraverse.c:854)
==400120== by 0x43A37ED: H5L__create_real (H5L.c:2044)
==400120== by 0x43AD96E: H5L_link_object (H5L.c:1803)
==400120== by 0x42A8E28: H5D__create_named (H5Dint.c:410)
==400120== by 0x45A9051: H5VL__native_dataset_create (H5VLnative_dataset.c:74)
==400120== by 0x458409F: H5VL__dataset_create (H5VLcallback.c:1834)
==400120== by 0x458E19C: H5VL_dataset_create (H5VLcallback.c:1868)
==400120== by 0x42991AC: H5Dcreate2 (H5D.c:150)
==400120== by 0x41FBB9C: H5::H5Location::createDataSet(char const*, H5::DataType const&, H5::DataSpace const&, H5::DSetCreatPropList const&, H5::DSetAccPropList const&, H5::LinkCreatPropList const&) const (H5Location.cpp:932)
==400120== by 0x41FBD78: H5::H5Location::createDataSet(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, H5::DataType const&, H5::DataSpace const&, H5::DSetCreatPropList const&, H5::DSetAccPropList const&, H5::LinkCreatPropList const&) const (H5Location.cpp:958)

// Rest of writer application stack

===

Is this something that should be addressed? If so, could you suggest how? There is only one occurrence of this issue that Valgrind reports.

Thanks
-Kat

(The experts will correct me…) I think this is nothing to lose sleep over. When a new object (e.g., dataset) is created, it’s linked into the group structure, and an object header is created. Furthermore, the metadata cache is updated to have things on hand when needed. If you dig into the code, various structures with array fields may get only partially initialized (the arrays). I think that’s what valgrind is calling out here.

G.

(OK, we don’t have the state of the metadata cache in your application…)
We could try to reproduce the error by creating just the dataset you’re dealing with. What’s the type and shape of that dataset, and what are the creation properties?

G.

Hi Gerd,

The dataset for which the valgrind error occurs is like I’d mentioned above:

  • Compound data type with int, float, double and boolean values for the various fields.
  • Single dimension and using default dataset creation properties (no chunking and hence no compresssion).

Have created a writer program that shows the Valgrind issue. Please let me know how I can send the tar ball to you.

Thanks
-Kat