VOL: Unable to Write a Dataset at File Close Time


#1

Hi there,

Brief summary of the problem

I am trying to develop a new VOL that is basically a Passthru VOL but does the following extra thing at file close time:

  1. create a new HDF5 dataset called TEST_DSET (assuming no other dataset has the same name)
  2. write some data to the dataset
  3. close the dataset and file.

However, during step 2 when writing the dataset, I cannot use a self-created dxplid. Otherwise, a segmentation fault inside H5CX_pop will occur.

Not exactly sure but fixing this problem may need HDF5 developers’ help, since it seems not a VOL implementation problem.

How to reproduce the problem

This GitHub repo contains codes I wrote for the new VOL. It is the same as the original Passthru VOL except for the H5VL_pass_through_ext_file_close function. Inside H5VL_pass_through_ext_file_close, I created a new dataset called TEST_DSET and write some data to it using a new dxplid called new_dxplid. A segmentation fault inside HDF5 library (H5SL.c:1320)

A simple user application (test program) together with a makefile is available here.

More details about the problem

  1. The error does not happen inside H5VL_pass_through_ext_file_close. H5VL_pass_through_ext_file_close can finish without problems. The segfalut happens inside H5CX_pop of H5Fclose.
  2. If I comment out the H5VLdatset_write, i.e. a dataset is created but not written, then the test program can finish without problems.
  3. If I use dxpl_id instead of new_dxplid for the H5VLdatset_write, then the test program can finish without problems. The difference is that dxpl_id is the argument passed to H5VL_pass_through_ext_file_close, and is a default dataset transfer property list. Using H5P_DATASET_XFER_DEFAULT can also finish without problems.
  4. It seems that H5CX_pop of H5Fclose try to check an already-deleted dxplid (new_dxplid) and this causes a segmentation fault.
  5. Because of the above point, I also tried not calling H5Pclose over new_dxplid and in this case, the test program can finish without problem.

#2

Below is the output from gdb core dump, HDF5 version is 1.13.2

#0  0x00007f1e7abdfc4c in H5SL_search (slist=0x41, key=0x7f1e7ad66b46)
    at H5SL.c:1320
#1  0x00007f1e7ab76aa5 in H5P__do_prop (plist=0x141b290,
    name=0x7f1e7ad66b46 "actual_chunk_opt_mode",
    plist_op=0x7f1e7ab76f06 <H5P__set_plist_cb>,
    pclass_op=0x7f1e7ab771ce <H5P__set_pclass_cb>, udata=0x7ffec23ed4b0)
    at H5Pint.c:2805
#2  0x00007f1e7ab77563 in H5P_set (plist=0x141b290,
    name=0x7f1e7ad66b46 "actual_chunk_opt_mode", value=0x13eceac)
    at H5Pint.c:3187
#3  0x00007f1e7a96d9da in H5CX__pop_common (update_dxpl_props=true)
    at H5CX.c:3603
#4  0x00007f1e7a96de58 in H5CX_pop (update_dxpl_props=true) at H5CX.c:3650
#5  0x00007f1e7a9f1c70 in H5Fclose (file_id=72057594037927936) at H5F.c:1064
#6  0x00000000004010d7 in main ()

Libraries’ Versions

HDF5 1.13.2 (and also 1.13.0)
MPICH 3.4.2

This issue happens for both HDF5 1.13.2 and 1.13.0. (not tested with 1.13.1)