Crash when freeing user-provided buffer on filter callback


#1

Hello there!

I’ve noticed that, on recent development snapshots of HDF5, data reads from chunked datasets are subject to different optimization that were not seen on 1.10.0. In special, the filter callback may no longer receive a pointer to the start of a heap-allocated buffer, but rather a pointer to an offset from the beginning of that buffer. This seems to be coming from H5D__chunk_read(), which allocates an internal buffer to hold the chunk data filled up by the filter callback routine.

The problem is that when we get a pointer to the middle of a malloc()d buffer, filters are no longer allowed to free it and replace it with something else – the code crashes immediately otherwise. My code does something like this:

size_t filter_callback(flags, cd_nelmts, *cd_values, nbytes, *buf_size, **buf) {
    if (! (flags & H5Z_FLAG_REVERSE)) {
        // transform the input data
        *buf_size = new_size;
        return new_size;
    } else {
        // read data from disk
        free(*buf);
        *buf = data_read;
        *buf_size = data_read_size;
    }
}

What I have observed is that if new_size is small enough (say, 200 bytes, while the file is compressed with a chunk size of 1MB), then the call to free(*buf) crashes the program. I’m not sure if this is relevant, yet, as I couldn’t dig any further.

Is this new behavior intentional?

Thanks,
Lucas


#2

Lucas,

Thank you for bringing this issue to our attention. I entered a JIRA task https://jira.hdfgroup.org/browse/HDFFV-10936 to investigate. We shouldn’t be crashing users programs :wink:

Thanks again!

Elena


#3

Elena, I have just updated the git snapshot and the filter is working with no crashes this time. Maybe the repository was in an inconsistent state when I cloned it? Anyway, please feel free to close the ticket on Jira, and thanks a lot for your prompt help!

Best regards,
Lucas


#4

Lucas,

We rarely have repository in an inconsistent state. All changes are thoroughly tested before they are checked in. I will leave the ticket open for now since we have too many reports for memory issues when using filters :thinking:

Thank you!
Elena


#5

Hi Elena!

It looks like it’s still happening – but oddly enough, only one of my machines is exhibiting the behavior. I wonder if libc is playing a crucial role here. Here are some details I could get from the problem:

  • H5D__chunk_lock is called, and the function determines that the chunk exists on disk, but isn’t the same size as the final chunk in memory. As a result, the function decides to allocate a new chunk to provide to the filter callback [https://github.com/live-clones/hdf5/blob/develop/src/H5Dchunk.c#L3956]
  • H5D__chunk_mem_alloc is called to allocate memory. Since pline->used == 1, allocation goes through H5MM_malloc()
  • The filter callback is called and attempts to free(*buf). Since that buffer is managed by HDF5, the application cannot free it, as the pointer passed back to the application is really just an offset within an internal HDF5 structure.

I’ve looked into the user buffer provided to the callback and it’s possible to see the magic id in the bytes that precede that buffer:

(gdb) xxd *buf-8 16
00000000: 4445 4144 4245 4546 7b22 6461 7461 7365  DEADBEEF{"datase

It looks like a raw buffer should be provided to the callback instead.

Here is the relevant stack trace:

#0  0x00007ffff74eb202 in free () from /lib64/libc.so.6
#1  0x00007ffff738d341 in readDataset<int> (names=std::vector of length 0, capacity 0, file_id=72057594037927937, n_elements=10000, hdf5_datatype=216172782113783872, buf_size=0x7fffffffd4c8, 
    buf=0x7fffffffd4e8, nbytes=@0x7fffffffd130: 771, payload=0x4c11a0 "\033LJ\002\nl", payload_size=675, dtype=0x7ffff73ba141 "int32_t*") at hdf5-filter.cpp:261
#2  0x00007ffff7389eda in filter_callback (flags=256, cd_nelmts=0, cd_values=0x0, nbytes=771, buf_size=0x7fffffffd4c8, buf=0x7fffffffd4e8) at hdf5-filter.cpp:307
#3  0x00007ffff7e22a00 in H5Z_pipeline (pline=pline@entry=0x4a9408, flags=flags@entry=256, filter_mask=filter_mask@entry=0x7fffffffd620, edc_read=H5Z_ENABLE_EDC, cb_struct=..., 
    nbytes=0x7fffffffd4c0, buf_size=0x7fffffffd4c8, buf=0x7fffffffd4e8) at H5Z.c:1322
#4  0x00007ffff7ab7d05 in H5D__chunk_lock (io_info=io_info@entry=0x7fffffffd9b0, udata=udata@entry=0x7fffffffd5f0, relax=relax@entry=false, prev_unfilt_chunk=prev_unfilt_chunk@entry=false)
    at H5Dchunk.c:3971
#5  0x00007ffff7ab9d71 in H5D__chunk_read (io_info=0x7fffffffd9b0, type_info=0x7fffffffd930, nelmts=<optimized out>, file_space=<optimized out>, mem_space=<optimized out>, fm=0x4bfb40)
    at H5Dchunk.c:2608
#6  0x00007ffff7ae4261 in H5D__read (dataset=dataset@entry=0x4a92b0, mem_type_id=mem_type_id@entry=216172782113784248, mem_space=0x4ac600, file_space=0x4ac600, buf=<optimized out>, 
    std::vector<std::string> dataset_names;
    for (auto name : names)
    {
        auto dataset = filterCallback<T>(file_id, name, n_elements, hdf5_datatype);
    buf@entry=0x4b5e70) at H5Dio.c:569
#7  0x00007ffff7e10cf9 in H5VL__native_dataset_read (obj=0x4a92b0, mem_type_id=216172782113784248, mem_space_id=0, file_space_id=0, dxpl_id=<optimized out>, buf=0x4b5e70, req=0x0)
    at H5VLnative_dataset.c:166
#8  0x00007ffff7df0afe in H5VL__dataset_read (obj=0x4a92b0, cls=0x438840, mem_type_id=mem_type_id@entry=216172782113784248, mem_space_id=mem_space_id@entry=0, 
    file_space_id=file_space_id@entry=0, plist_id=plist_id@entry=792633534417207304, buf=0x4b5e70, req=0x0) at H5VLcallback.c:2028
#9  0x00007ffff7dfa7b5 in H5VL_dataset_read (vol_obj=vol_obj@entry=0x4b0420, mem_type_id=mem_type_id@entry=216172782113784248, mem_space_id=mem_space_id@entry=0, 
    file_space_id=file_space_id@entry=0, plist_id=plist_id@entry=792633534417207304, buf=buf@entry=0x4b5e70, req=0x0) at H5VLcallback.c:2062
#10 0x00007ffff7ae2564 in H5Dread (dset_id=<optimized out>, mem_type_id=216172782113784248, mem_space_id=0, file_space_id=0, dxpl_id=792633534417207304, buf=0x4b5e70) at H5Dio.c:191

#6

Hi Lucas,

Developers discussed the issue and concluded that we would need to revert the usage of H5MM_malloc(free) when working with the chunks. I entered a JIRA issue HDFFV-10948 and we will try to address it in our next maintenance releases in Spring 2020 (or earlier, but I cannot promise).

Thank you for reporting!

Elena


#7

Thank you for the update! It’s great to know that the team has come up with a plan to fix the issue.

Best regards,
Lucas