[HDFFV-10466] Another problem with copy

Hello!

Some time ago I reported a critical bug in H5Ocopy to help@hdfgroup.org. It was labeled [HDFFV-10466] but no investigation took place since. I now report it publicly to hdf-forum@lists.hdfgroup.org, in search for collaboration from HDF5 user community to try craft a patch.


To reproduce:

  1. Download
    https://zendto.acdlabs.com/download.php?claimID=PAECtrZAvv3vaJcg&claimPasscode=2GTeuhMRJPbdZ82w&auth=2da721baa9d05867fe94a8fc0f19e741&fid=18754

  2. h5ls -r sulfadrug.abf
    Works fine!

  3. h5copy -i sulfadrug.abf -o new.abf -s /msms/info -d /test -p -v
    Copying file <sulfadrug.abf> and object </msms/info> to file <new.abf> and object
    h5copy: Creating parent groups
    Segmentation fault

  4. cat new.abf
    %HDF
    (file exists)

  5. h5ls new.abf
    new.abf: unable to open file

Reproducible on vanilla HDF5 binaries, Debian and Windows, 1.10.0…1.10.3.


Running under debugger, I see that src->shared->u.vlen has
loc == H5T_LOC_BADLOC
and all callbacks incl. isnull are NULLs here:
https://bitbucket.hdfgroup.org/projects/HDFFV/repos/hdf5/browse/src/H5Tconv.c#3183

It’s not quite clear how src->shared could come with H5T_LOC_BADLOC and if it’s expected (according to the comment at
https://bitbucket.hdfgroup.org/projects/HDFFV/repos/hdf5/browse/src/H5Odtype.c#512
it might be).

Best wishes,
Andrey Paramonov

I managed to create a minimally reproducible example with h5py.
This works fine:

import h5py
with h5py.File('test.h5', libver='latest') as f:
    for i in range(8):
        f.attrs[str(i + 1)] = u'test'
    f.copy(f, 'copy')

But this breaks:

import h5py
with h5py.File('test.h5', libver='latest') as f:
    for i in range(9):  # <- note
        f.attrs[str(i + 1)] = u'test'
    f.copy(f, 'copy')

Best wishes,
Andrey Paramonov

What about the following patch?

--- H5Aint.c
+++ CMake-hdf5-1.10.3\hdf5-1.10.3\src\H5Aint.c
@@ -2522,6 +2522,10 @@
     HDassert(udata->file);
     HDassert(udata->cpy_info);
 
+    /* Mark datatype as being on disk now (analogously to H5O__attr_copy_file). */
+    if(H5T_set_loc(attr_src->shared->dt, udata->oloc_src->file, H5T_LOC_DISK) < 0)
+        HGOTO_ERROR(H5E_ATTR, H5E_CANTINIT, NULL, "invalid datatype location")
+
     if(NULL == (attr_dst = H5A__attr_copy_file(attr_src, udata->file, udata->recompute_size, udata->cpy_info)))
         HGOTO_ERROR(H5E_ATTR, H5E_CANTCOPY, H5_ITER_ERROR, "can't copy attribute")

Best wishes,
Andrey Paramonov

Thank you for the patch! Added to the issue and will review.

Elena

Hello Elena, Barbara, all!

Messages on the forum start mentioning “upcoming 1.10.5 release” more and more frequently, but I missed any updates on the status of
https://jira.hdfgroup.org/browse/HDFFV-10466
so far.
This report has a minimally-reproducible example, a simple patch, and was submitted several months ago. Let’s please make sure the fix gets into the 1.10.5 release.

Best wishes,
Andrey Paramonov

Andrey,

We plan to release your patch. Thank you for reminder!

Elena

Andrey,

First of all, I should say in my previous message that we planned to review your patch.

Last Friday HDF5 developers did review the patch and concluded that we couldn’t accept it. The patch just hides the problem. We need to debug more and find the root of the problem and then decide how it should be fixed.

Thank you!

Elena

Hello!

I no longer reproduce with HDF5 1.10.5-pre1. Good job!

For the reference, a minimal C test program:

#include <stdio.h>
#include <stdlib.h>
#include "hdf5.h"

int main() {
   hid_t fapl, file, datatype, dataspace, attr;
   char *key = "0";
   char *value = "value";
   herr_t status;

   fapl = H5Pcreate(H5P_FILE_ACCESS);
   status = H5Pset_libver_bounds(fapl, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST);

   file = H5Fcreate("test.h5", H5F_ACC_TRUNC, H5P_DEFAULT, fapl);

   datatype = H5Tcreate(H5T_STRING, 5);
   dataspace = H5Screate(H5S_SCALAR);
   for(int i=0; i<9; i++) {
      attr = H5Acreate2(file, itoa(i, key, 10), datatype, dataspace,
                        H5P_DEFAULT, H5P_DEFAULT);
      status = H5Awrite(attr, datatype, value);
      status = H5Aclose(attr);
   };
   status = H5Sclose(dataspace);
   status = H5Tclose(datatype);

   status = H5Ocopy(file, "/", file, "copy", H5P_DEFAULT, H5P_DEFAULT);

   status = H5Fclose(file);
}

Best wishes,
Andrey Paramonov

Hi folks!

This may have risen from the dead as I seem to have run into this issue as well, with both hdf5 1.10.5 and 1.10.6.

For example, the minimal h5py example provided above by @paramon results in a segmentation fault, and my observations match the others in this thread.

My environment (conda):

h5py                      2.10.0          nompi_py38hfb01d0b_104    conda-forge
hdf5                      1.10.6          nompi_h3c11f04_101    conda-forge
python                    3.8.5           h1103e12_5_cpython    conda-forge

Can anyone confirm?

Thanks!
Bruce Wallin

Hello,
have there been any updates on the topic? I also stumbled upon this issue with recent h5py library… the minimal h5py example is also not working for me. I’m using this environment:

Summary of the h5py configuration
---------------------------------

h5py    3.11.0
HDF5    1.14.2
Python  3.12.1 (tags/v3.12.1:2305ca5, Dec  7 2023, 22:03:25) [MSC v.1937 64 bit (AMD64)]
sys.platform    win32
sys.maxsize     9223372036854775807
numpy   1.26.4
cython (built with) 3.0.10
numpy (built against) 2.0.0rc1
HDF5 (built against) 1.14.2

Can anybody confirm this issue? Any help / workarounds would be highly appreciated!
Thanks, Moritz

Edit: Looks like this tracked here: Segfault when copying dataset with attributes · Issue #2414 · HDFGroup/hdf5 · GitHub and scheduled for 1.14.5 release.

The code below worked for me with the 1.14.4 release:

import h5py
with h5py.File('test.h5', mode='w', libver='latest') as f:
    for i in range(9):  # <- note
        f.attrs[str(i + 1)] = u'test'
    f.copy(f, 'copy')

I had to add mode='w', otherwise was getting an error about “no write intent on the file”.

Summary of the h5py configuration
---------------------------------

h5py    3.11.0
HDF5    1.14.4
Python  3.12.0 | packaged by conda-forge | (main, Oct  3 2023, 08:36:57) [Clang 15.0.7 ]
sys.platform    darwin
sys.maxsize     9223372036854775807
numpy   1.26.0
cython (built with) 0.29.36
numpy (built against) 1.26.0
HDF5 (built against) 1.14.4

-Aleksandar

sounds good! slightly off-topic: is there a fast and easy way to install h5py with HDF5 version 1.14.4?
I’m bound to windows and according to the h5py docs, it is not very straight forward to install from source explicitly specifying the HDF5 version.

I don’t know since I am not a Windows user.

-Aleksandar