Segmentation fault when using h5repack on NetCDF-4 file

Hello,

I’m running a locally built version of HDF5 1.14.1-2 on an AWS EC2 instance that is running Amazon Linux release 2023 on an x86_64 architecture. I’m a trying to use h5repack to better package metadata in a NetCDF-4 file, but encounter a Segmentation fault (core dumped) error whenever I attempt to call h5repack. Calls to h5stat work for this same dataset, and I am able to successfully run h5repack when using an ICESAT-2 HDF5 file. Here is a link to the example NetCDF-4 data and to the example HDF5 data.

These commands run successfully:

h5stat ATL03_20221104141739_06851705_006_01.h5 | grep "File metadata"
h5repack -S PAGE -G 4147631 ATL03_20221104141739_06851705_006_01.h5 out.h5

But the second command here results in the the segmentation fault:

h5stat S1-GUNW-A-R-064-tops-20180528_20170521-015004-36885N_35006N-PP-baf1-v2_0_2.nc | grep "File metadata"
h5repack -S PAGE -G 105632 S1-GUNW-A-R-064-tops-20180528_20170521-015004-36885N_35006N-PP-baf1-v2_0_2.nc out.nc

Can anyone help me troubleshoot this issue?

Very interesting… I installed libhdf5-1.14.1-2 from the conda package repository and tried with your .nc file. The h5repack command on my macOS laptop does not throw a segfault, in fact, does not report any error at all. However, something is wrong with the output file out.nc:

  • h5stat out.nc reports unable to traverse objects/links in file “out.nc”.
  • h5dump -pH out.nc provides output but with some errors like error in getting creation property list ID for the groups and datasets in the file.

Do you know how was your .nc file created? What were the netCDF and libhdf5 versions? I do not see the _NCProperties global attribute in your .nc file.

Aleksandar

Oh, interesting. When I tried these commands on my MacBook, I got the same result as you. I obtained the data from the Alaska Satellite Facility’s website and didn’t create it myself. However, the team that did create it does have their code on GitHub. They’re using the Python NetCDF library, and in particular the Dataset class to create the NetCDF file (see here).

So it looks like the problem is likely with the creation of the input NetCDF, and not with h5repack, correct?

No, I still consider h5repack the main suspect.

My understanding is the _NCProperties global attribute is immediately created by the netCDF library and contains netCDF and libhdf5 versions. Not seeing it in the file is a bit strange.

Aleksandar

The _NCProperties attribute was added in netcdf release 4.4.1, 2016 June. Given that the .nc file data is from 2018 May, it is likely that this is an original file written by an earlier netcdf version. The superblock version 2 is also consistent with this notion. This all is probably unrelated to the h5repack problem.

Update: I’ve retried the same workflow with a more recent NetCDF-4 dataset from the same collection (you can download the dataset here if you have EarthData login credentials) and I got a different error this time.

Now,

h5repack -S PAGE -G 132797 S1-GUNW-A-R-085-tops-20221005_20220923-122248-00084E_00027N-PP-1159-v2_0_5.nc result.nc

Completes without error quickly, but a h5stat call on the output dataset results in this error:

h5stat error: unable to traverse objects/links in file "result.nc"

Thanks @dave.allured, I assumed the files were more recent.

Below is additional diagnostic information that may help locate the problem.

Running h5stat -S on the original .nc file:

% h5stat -S S1-GUNW-A-R-064-tops-20180528_20170521-015004-36885N_35006N-PP-baf1-v2_0_2.nc 
Filename: S1-GUNW-A-R-064-tops-20180528_20170521-015004-36885N_35006N-PP-baf1-v2_0_2.nc
File space management strategy: H5F_FSPACE_STRATEGY_FSM_AGGR
File space page size: 4096 bytes
Summary of file space information:
  File metadata: 105632 bytes
  Raw data: 55628271 bytes
  Amount/Percent of tracked free space: 0 bytes/0.0%
  Unaccounted space: 8399018 bytes
Total space: 64132921 bytes

shows pretty large unaccounted space – ~8.4MB for a file of ~64MB. I don’t think this means trouble, just unexpected.

Comparing the ncdump output (from Panoply) with h5dump for the original .nc file I found strange differences. Below is h5dump description of the /matchup HDF5 dataset that is also a netCDF dimension:

   DATASET "matchup" {
      DATATYPE  H5T_IEEE_F32BE
      DATASPACE  SIMPLE { ( 0 ) / ( H5S_UNLIMITED ) }
      STORAGE_LAYOUT {
         CHUNKED ( 1 )
         SIZE 0
      }
      FILTERS {
         NONE
      }
      FILLVALUE {
         FILL_TIME H5D_FILL_TIME_IFSET
         VALUE  H5D_FILL_VALUE_DEFAULT
      }
      ALLOCATION_TIME {
         H5D_ALLOC_TIME_INCR
      }
      ATTRIBUTE "CLASS" {
         DATATYPE  H5T_STRING {
            STRSIZE 16;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
         DATA {
         (0): "DIMENSION_SCALE"
         }
      }
      ATTRIBUTE "NAME" {
         DATATYPE  H5T_STRING {
            STRSIZE 64;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
         DATA {
         (0): "This is a netCDF dimension but not a netCDF variable.         0"
         }
      }
      ATTRIBUTE "REFERENCE_LIST" {
         DATATYPE  H5T_COMPOUND {
            H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
            H5T_STD_I32LE "dimension";
         }
         DATASPACE  SIMPLE { ( 15 ) / ( 15 ) }
         DATA {
         (0): {
               DATASET 0 "/science/grids/corrections/derived/ionosphere/ionosphere",
               0
            },
         (1): {
               DATASET 0 "/science/grids/corrections/external/troposphere/troposphereHydrostatic",
               0
            },
         (2): {
               DATASET 0 "/science/grids/corrections/external/troposphere/troposphereWet",
               0
            },
         (3): {
               DATASET 0 "/science/grids/corrections/external/tides/solidEarthTide",
               0
            },
         (4): {
               DATASET 0 "/science/radarMetaData/missionID",
               0
            },
         (5): {
               DATASET 0 "/science/radarMetaData/productType",
               0
            },
         (6): {
               DATASET 0 "/science/radarMetaData/ISCEversion",
               0
            },
         (7): {
               DATASET 0 "/science/radarMetaData/unwrapMethod",
               0
            },
         (8): {
               DATASET 0 "/science/radarMetaData/DEM",
               0
            },
         (9): {
               DATASET 0 "/science/radarMetaData/azimuthZeroDopplerStartTime",
               0
            },
         (10): {
               DATASET 0 "/science/radarMetaData/azimuthZeroDopplerEndTime",
               0
            },
         (11): {
               DATASET 0 "/science/radarMetaData/inputSLC/reference/L1InputGranules",
               0
            },
         (12): {
               DATASET 0 "/science/radarMetaData/inputSLC/reference/orbitType",
               0
            },
         (13): {
               DATASET 0 "/science/radarMetaData/inputSLC/secondary/L1InputGranules",
               0
            },
         (14): {
               DATASET 0 "/science/radarMetaData/inputSLC/secondary/orbitType",
               0
            }
         }
      }
      ATTRIBUTE "_Netcdf4Dimid" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
   }

It has no data, and its netCDF dimension size is 0, as stated in the NAME HDF5 attribute. The value of the REFERENCE_LIST attribute shows all the HDF5 datasets (netCDF variables) where /matchup serves as a netCDF dimension.

However, from the ncdump for two of these netCDF variables, /science/grids/corrections/derived/ionosphere/ionosphere and /science/radarMetaData/inputSLC/secondary/L1InputGranules the size of the matchup dimension is reported differently:

group: ionosphere {
  dimensions:
    matchup = UNLIMITED;   // (1 currently)
  variables:
    String ionosphere(matchup=1);
      :standard_name = "ionosphere";
      :long_name = "ionosphere";
      :_ChunkSizes = 524288U; // uint
}

above with size 1,

group: secondary {
  dimensions:
    matchup = UNLIMITED;   // (2 currently)
  variables:
    String L1InputGranules(matchup=2);
      :description = "Secondary input granules";
      :long_name = "L1InputGranules";
      :standard_name = "L1InputGranules";
      :_ChunkSizes = 1U; // uint
}

and now with size 2. There is only one matchup netCDF dimension in the file (the /matchup HDF5 dataset) and it has 0 size. So the above two matchup netCDF dimensions must be bogus, produced by the netCDF software trying to maintain consistency with the file content.

The last piece of information for this post is the output of h5dump that shows libhdf5 errors as it reads the .nc file. Here’s the beginning:

% h5dump --enable-error-stack=2 -p -A S1-GUNW-A-R-064-tops-20180528_20170521-015004-36885N_35006N-PP-baf1-v2_0_2.nc

HDF5-DIAG: Error detected in HDF5 (1.14.1-2) thread 1:
  #000: H5O.c line 2366 in H5Otoken_to_str(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1779 in H5VL_vol_object(): invalid identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.1-2) thread 1:
  #000: H5O.c line 2366 in H5Otoken_to_str(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1779 in H5VL_vol_object(): invalid identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.1-2) thread 1:
  #000: H5O.c line 2366 in H5Otoken_to_str(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1779 in H5VL_vol_object(): invalid identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.1-2) thread 1:
  #000: H5O.c line 2366 in H5Otoken_to_str(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1779 in H5VL_vol_object(): invalid identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.1-2) thread 1:
  #000: H5O.c line 2366 in H5Otoken_to_str(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1779 in H5VL_vol_object(): invalid identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.1-2) thread 1:
  #000: H5O.c line 2366 in H5Otoken_to_str(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1779 in H5VL_vol_object(): invalid identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.1-2) thread 1:
  #000: H5O.c line 2366 in H5Otoken_to_str(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1779 in H5VL_vol_object(): invalid identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.1-2) thread 1:
  #000: H5O.c line 2366 in H5Otoken_to_str(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1779 in H5VL_vol_object(): invalid identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.1-2) thread 1:
  #000: H5O.c line 2366 in H5Otoken_to_str(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1779 in H5VL_vol_object(): invalid identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.1-2) thread 1:
  #000: H5O.c line 2366 in H5Otoken_to_str(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1779 in H5VL_vol_object(): invalid identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.1-2) thread 1:
  #000: H5O.c line 2366 in H5Otoken_to_str(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1779 in H5VL_vol_object(): invalid identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.1-2) thread 1:
  #000: H5O.c line 2366 in H5Otoken_to_str(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1779 in H5VL_vol_object(): invalid identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.1-2) thread 1:
  #000: H5O.c line 2366 in H5Otoken_to_str(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1779 in H5VL_vol_object(): invalid identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.1-2) thread 1:
  #000: H5O.c line 2366 in H5Otoken_to_str(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1779 in H5VL_vol_object(): invalid identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.1-2) thread 1:
  #000: H5O.c line 2366 in H5Otoken_to_str(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1779 in H5VL_vol_object(): invalid identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.1-2) thread 1:
  #000: H5O.c line 2366 in H5Otoken_to_str(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1779 in H5VL_vol_object(): invalid identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.1-2) thread 1:
  #000: H5O.c line 2366 in H5Otoken_to_str(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1779 in H5VL_vol_object(): invalid identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.1-2) thread 1:
  #000: H5O.c line 2366 in H5Otoken_to_str(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1779 in H5VL_vol_object(): invalid identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.1-2) thread 1:
  #000: H5O.c line 2366 in H5Otoken_to_str(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1779 in H5VL_vol_object(): invalid identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5 "S1-GUNW-A-R-064-tops-20180528_20170521-015004-36885N_35006N-PP-baf1-v2_0_2.nc"

There are more of these error messages in the h5dump output. I don’t know yet what these errors imply but wanted to document it here.

Aleksandar

Hi @ffwilliams2,

I have this PR out that cleans up some of the h5dump code and should eliminate most of the errors being shown in the outputs here and should allow you to at least dump the repacked file. However, h5dump will still print an error about not being able to retrieve the dataset creation property list for the “ionosphere” dataset, among others, and will be unable to correctly list certain properties of that dataset, such as the fill value and any filters.

The root cause of this issue seems to be either a bug or unimplemented functionality in HDF5’s H5Ocopy routine when it comes to objects with a variable-length datatype. When repacking datasets into a new file, h5repack tries to determine whether it can copy the dataset with H5Ocopy or if it needs to manually re-create the dataset, read data from the old dataset and write it to the new dataset. What appears to be happening is that when the original file is repacked, the datasets with a variable-length datatype, like the “ionosphere” dataset, are having their raw data move to a different address in the new file as compared to the original file. This seems to be occurring due to the superblock version changing (back to version 0 if repacked without any options, or to version 3 if specifying -L to h5repack to use the latest file format) and causing either the global heap for the dataset data or the heap entries themselves to move to different addresses, while H5Ocopy is copying the old address values into each new dataset’s raw data. This then causes issues when a call to H5Dget_create_plist in h5dump tries to perform a fill value datatype conversion that reads from the new file in order to get an in-memory fill value buffer that can be used/returned.

I have a fix/workaround for h5repack that should be up fairly soon, but I need to do some more investigation to see whether this H5Ocopy issue applies just to variable-length string datatypes or variable-length datatypes in general. The change also causes the repacking to be a little bit slower for those datasets, so I’d like to apply the change as minimally as possible, but at least it’s functionally correct. In the meantime, you could test to see if keeping the superblock version the same allows h5repack’s default behavior to work correctly. Since the original file appears to have a version 2 superblock based on the output of h5dump -B, you could try running h5repack --low=1 --high=1 on the file. I believe this should restrict the file to using the same version 2 superblock and should in theory keep the global heap data in the same locations within the new file and allow repacking and dumping of the file without any issues.

2 Likes

@ffwilliams2 Based on a quick round of testing it appears that this is an issue for datasets with a variable-length datatype in general. It also appears that keeping the superblock version the same didn’t make a difference, so these global heap addresses are likely changing due to side effects of h5repack allocating space in the file differently, reclaiming free file space, etc. I have this PR out now to fix the issue with h5repack until such a time that H5Ocopy can be updated to handle this situation.

1 Like

First, @jhenderson, thank you very much for stepping in to triage the problem and implement a stop-gap fix which is now in the latest libhdf5-1.14.2 release.

I have checked what the new h5repack does for the .nc file here. I used the libhdf5-1.14.2, available from the Conda Forge package repository.

My repack command was:

h5repack -S PAGE -G $(expr 8 \* 1024 \* 1024) S1-GUNW-A-R-064-tops-20180528_20170521-015004-36885N_35006N-PP-baf1-v2_0_2.nc out.nc

h5stat still reports the output file as invalid:

% h5stat out.nc
Filename: out.nc
h5stat error: unable to traverse objects/links in file "out.nc"

Below is the first reported error by h5dump:

% h5dump --enable-error-stack=2 -p -H out.nc
HDF5-DIAG: Error detected in HDF5 (1.14.2) thread 1:
  #000: H5D.c line 775 in H5Dget_create_plist(): unable to get dataset creation properties
    major: Dataset
    minor: Can't get value
  #001: H5VLcallback.c line 2458 in H5VL_dataset_get(): dataset get failed
    major: Virtual Object Layer
    minor: Can't get value
  #002: H5VLcallback.c line 2427 in H5VL__dataset_get(): dataset get failed
    major: Virtual Object Layer
    minor: Can't get value
  #003: H5VLnative_dataset.c line 469 in H5VL__native_dataset_get(): can't get creation property list for dataset
    major: Dataset
    minor: Can't get value
  #004: H5Dint.c line 3665 in H5D_get_create_plist(): datatype conversion failed
    major: Dataset
    minor: Can't convert datatypes
  #005: H5T.c line 5308 in H5T_convert(): datatype conversion failed
    major: Datatype
    minor: Can't convert datatypes
  #006: H5Tconv.c line 3326 in H5T__conv_vlen(): can't read VL data
    major: Datatype
    minor: Read failed
  #007: H5Tvlen.c line 840 in H5T__vlen_disk_read(): unable to get blob
    major: Datatype
    minor: Can't get value
  #008: H5VLcallback.c line 7396 in H5VL_blob_get(): blob get failed
    major: Virtual Object Layer
    minor: Can't get value
  #009: H5VLcallback.c line 7367 in H5VL__blob_get(): blob get callback failed
    major: Virtual Object Layer
    minor: Can't get value
  #010: H5VLnative_blob.c line 119 in H5VL__native_blob_get(): unable to read VL information
    major: Virtual Object Layer
    minor: Read failed
  #011: H5HG.c line 560 in H5HG_read(): unable to protect global heap
    major: Heap
    minor: Unable to protect metadata
  #012: H5HG.c line 235 in H5HG__protect(): unable to protect global heap
    major: Heap
    minor: Unable to protect metadata
  #013: H5AC.c line 1277 in H5AC_protect(): H5C_protect() failed
    major: Object cache
    minor: Unable to protect metadata
  #014: H5Centry.c line 3126 in H5C_protect(): can't load entry
    major: Object cache
    minor: Unable to load metadata into cache
  #015: H5Centry.c line 1191 in H5C__load_entry(): incorrect metadata checksum after all read attempts
    major: Object cache
    minor: Read failed
  #016: H5HGcache.c line 195 in H5HG__cache_heap_get_final_load_size(): can't decode global heap prefix
    major: Heap
    minor: Unable to decode value
  #017: H5HGcache.c line 120 in H5HG__hdr_deserialize(): bad global heap collection signature
    major: Heap
    minor: Bad value

So one of the previous errors is no more but the repacked file still has problems.

Aleksandar

1 Like

@ajelenak Ah apologies, the PR covered variable-length types, but does not correctly detect variable-length string types, which are handled slightly differently inside the library. I now have this PR out that should fix this issue. I tried repacking and dumping the file after that fix and everything at least appeared to me to work .

1 Like

@jhenderson Just tested with the latest develop branch and it worked.

$ h5stat -S out.nc 
Filename: out.nc
File space management strategy: H5F_FSPACE_STRATEGY_PAGE
File space page size: 8388608 bytes
Summary of file space information:
  File metadata: 106705 bytes
  Raw data: 55628385 bytes
  Amount/Percent of tracked free space: 0 bytes/0.0%
  Unaccounted space: 19762382 bytes
Total space: 75497472 bytes

h5dump also does not report any errors for this file, too.

Thank you very much for fixing h5repack for this use case!

Aleksandar

1 Like