VOL-Async: Debugging Async operation failures

(Deleted my previous post as I realize my failure analysis was false)

I’ve been tasked with evaluating VOL-Async by the DKRZ (Deutsches Klimarechenzentrum GmbH Hamburg). Please consider me to be a novice HDF user.

I’m in the process of implementing Async by replacing all calls with their async counterparts as described here https://hdf5-vol-async.readthedocs.io/en/latest/gettingstarted.html#explicit-mode and here https://www.hdfgroup.org/wp-content/uploads/2021/10/HUG21-Async-VOL.pdf .

hdf5-vol-async was installed with spack, loaded through modules and environmental variables were set as instructed in the guide. Compiled with all necessary flags and execute with mpiexec.

The code is supposed to generated an exactly 1GB HDF5 file. Prints were used for quick and dirty debugging.

Though I seem to missing something important as this fails and it’s quite hard for me to debug as I haven’t compiled the library by commenting in the debug lines. It’s probably something very simple but I just can’t figure it out due to my lack in experience with HDF5. I can definitely infer that I have failing operations due to H5ESwait().

I would greatly appreciated the help in advance.

void create_hdf5_async(int argc, char **argv, bool with_chunking)
{
    hid_t plist_id, file_id, dataspace_id, dataset_id; /* file identifier */
    herr_t status;
    hsize_t dims[1];
    hsize_t cdims[1];
    herr_t es_id;

    int mpi_thread_required = MPI_THREAD_MULTIPLE;
    int mpi_thread_provided = 0;
    /* Initialize MPI with threading support */
    MPI_Init_thread(&argc, &argv, mpi_thread_required, &mpi_thread_provided);

    es_id = H5EScreate();

    /* Create a new file using default properties. */
    printf("Create file \n");
    file_id = H5Fcreate_async("data/datasets/test_dataset_hdf5-c.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT, es_id);

    // setup dimensions
    int some_size = 134217728;
    printf("size: %d \n", some_size);

    dims[0] = some_size;
    dataspace_id = H5Screate_simple(1, dims, NULL);

    plist_id = H5Pcreate(H5P_DATASET_CREATE);

    if (with_chunking)
    {
        // setup chunking
        cdims[0] = 512;
        status = H5Pset_chunk(plist_id, 1, cdims);
    }

    // create Dataset
    printf("Create dataset \n");
    dataset_id = H5Dcreate_async(file_id, "/X", H5T_IEEE_F64LE, dataspace_id, H5P_DEFAULT, plist_id, H5P_DEFAULT, es_id);

    // fill buffer
    float *dset_data = calloc(some_size, sizeof(float));

    if (!dset_data)
    {
        fprintf(stderr, "Fatal: unable to allocate array\n");
        exit(EXIT_FAILURE);
    }

    int i, j, k;

    printf("Fill with values \n");
    for (i = 0; i < some_size; i++)
    {

        float rand_float = (float)rand() / RAND_MAX;
        // printf("i: %d, j: %d, k: %d, random float: %f \n", i, j, k, rand_float);

        dset_data[i] = (float)rand() / RAND_MAX;
    }

    printf("Write data to dataset \n");
    status = H5Dwrite_async(dataset_id, H5T_NATIVE_FLOAT, H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data, es_id);
    printf("status: %d \n", status);

    free(dset_data);
    printf("Finish writing data to dataset \n");

    /* Terminate access to the file. */

    size_t num_in_progress;
    hbool_t op_failed;
    printf("Wait for async answer \n");
    H5ESwait(es_id, H5ES_WAIT_FOREVER, &num_in_progress, &op_failed);
    printf("Finish waiting for async, num in progess: %ld, failed: %d \n", num_in_progress, op_failed);


    printf("Close async dataset \n");
    status = H5Dclose_async(dataset_id, es_id);
    printf("Close sync dataspace \n");
    status = H5Sclose(dataspace_id);
    printf("Close async file \n");
    status = H5Fclose_async(file_id, es_id);

    status = H5ESclose(es_id);
}

Stacktrace

Bench hdf5 variable async 
Create file 
size: 134217728 
Create dataset 
Fill with values 
Write data to dataset 
status: 0 
Finish writing data to dataset 
Wait for async answer 
Finish waiting for async, num in progess: 139881184669379, failed: 203 
Close async dataset 
[leucht:51478] *** Process received signal ***
[leucht:51478] Signal: Segmentation fault (11)
[leucht:51478] Signal code: Address not mapped (1)
[leucht:51478] Failing at address: 0x7f386bfff010
[leucht:51478] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f38a04e0520]
[leucht:51478] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x1a67cd)[0x7f38a06447cd]
[leucht:51478] [ 2] /home/dev/spack/opt/spack/linux-ubuntu22.04-x86_64_v4/gcc-11.4.0/hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/lib/libhdf5.so.310(H5D__gather_mem+0x140)[0x7f38a07d4330]
[leucht:51478] [ 3] /home/dev/spack/opt/spack/linux-ubuntu22.04-x86_64_v4/gcc-11.4.0/hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/lib/libhdf5.so.310(H5D__scatgath_write+0x2ab)[0x7f38a07d502b]
[leucht:51478] [ 4] /home/dev/spack/opt/spack/linux-ubuntu22.04-x86_64_v4/gcc-11.4.0/hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/lib/libhdf5.so.310(H5D__contig_write+0x2d)[0x7f38a07ae26d]
[leucht:51478] [ 5] /home/dev/spack/opt/spack/linux-ubuntu22.04-x86_64_v4/gcc-11.4.0/hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/lib/libhdf5.so.310(H5D__write+0xe4d)[0x7f38a07c370d]
[leucht:51478] [ 6] /home/dev/spack/opt/spack/linux-ubuntu22.04-x86_64_v4/gcc-11.4.0/hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/lib/libhdf5.so.310(H5VL__native_dataset_write+0xba)[0x7f38a0a3302a]
[leucht:51478] [ 7] /home/dev/spack/opt/spack/linux-ubuntu22.04-x86_64_v4/gcc-11.4.0/hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/lib/libhdf5.so.310(H5VLdataset_write+0x142)[0x7f38a0a1f5c2]
[leucht:51478] [ 8] /home/dev/spack/opt/spack/linux-ubuntu22.04-x86_64_v4/gcc-11.4.0/hdf5-vol-async-1.7-vbevlbxizecl24eswhgtycejo4lbfvft/lib/libh5async.so(+0x1bff8)[0x7f389f479ff8]
[leucht:51478] [ 9] /home/dev/spack/opt/spack/linux-ubuntu22.04-x86_64_v4/gcc-11.4.0/argobots-1.2-jxiwbmgrzkanyu2vqjxtvcb6xjfqkvwx/lib/libabt.so.1(+0x223cc)[0x7f389f4453cc]
[leucht:51478] [10] /home/dev/spack/opt/spack/linux-ubuntu22.04-x86_64_v4/gcc-11.4.0/argobots-1.2-jxiwbmgrzkanyu2vqjxtvcb6xjfqkvwx/lib/libabt.so.1(+0x28b5f)[0x7f389f44bb5f]
[leucht:51478] *** End of error message ***
--------------------------------------------------------------------------
prterun noticed that process rank 0 with PID 51478 on node leucht exited on
signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Hi @pphilippllucas,

which build system are you using to build the Async VOL, CMake or the Makefile in the source? If CMake, note that the “memcpy” option mentioned on Building with Spack — HDF5 Asynchronous I/O VOL Connector 0.1 documentation seems to have changed names since the writing of that document. The option is ENABLE_WRITE_MEMCPY rather than CMAKE_ENABLE_WRITE_MEMCPY, so you should add -DENABLE_WRITE_MEMCPY=ON to your CMake command. Otherwise, the VOL connector won’t be making a copy of your buffer during H5Dwrite_async(). So, I’m assuming the segfault that you’re getting comes from the internal H5D__gather_mem() call trying to perform memcpy()s on a buffer that you’ve already freed before calling H5ESwait().

Hey @jhenderson,

I apologize but I believe I don’t quite understand yet. I’ll add some information about my setup.

I’m using the default build system which for spack is cmake according to spack info hdf5-vol-async version 1.7. “memcpy” should be disabled by default when building the package with spack as I installed “hdf5-vol-async” with just spack install hdf5-vol-async I haven’t configured double buffering yet as I was only interested in getting it up and running.

If I understand correctly due to name change the option might not be correctly set, which you believe causes the issue? Is “memcpy” or rather double buffering needed to even be able to write files asynchronously as without it only reading is supported? Is that what is meant with “If the application needs Async VOL for dataset write buffer management (double buffering), e.g. SW4, Flash-X, and AMReX applications, use the following spack command”?

Sorry, I missed that you mentioned the connector was installed with spack! By default, the connector won’t copy your write buffer unless the ENABLE_WRITE_MEMCPY option is enabled. This means that, for your current example, you would need to wait to free your buffer until after the call to H5ESwait(). However, if you install the “memcpy” variant of the package from spack then I believe your example program should run as intended, since it will make a copy of your write buffer and you can then safely free it.

Ahh I see that actually makes sense, will try it now. I actually didn’t pay attention to when I free. Since for my use-cases I can easily free after waiting I will move it.

Though what would be your suggestion in that case, would it be better to free() after or to simply use “memcpy”?

@jhenderson moving free() after H5ESwait() still yielded the same error. Now I have gone ahead and installed “hdf5-vol-async” with “+memcpy” and I’m now getting a different error.

Though this time it seems to be write permission related. Will try to figure that one out now.

Already appreciate the help this far. I will try to do more testing myself as to not turn this into a pair-programming session. If I can’t make any process with either approach I will return to this thread.

Stacktrace

ench hdf5 variable async 
Create file 
size: 134217728 
Create dataset 
Fill with values 
Write data to dataset 
[leucht:99042] *** Process received signal ***
[leucht:99042] Signal: Segmentation fault (11)
[leucht:99042] Signal code: Invalid permissions (2)
[leucht:99042] Failing at address: 0x7f677008a040
[leucht:99042] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f6782166520]
[leucht:99042] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x1a6e24)[0x7f67822cae24]
[leucht:99042] [ 2] /home/dev/spack/opt/spack/linux-ubuntu22.04-x86_64_v4/gcc-11.4.0/hdf5-vol-async-1.7-bo2zasnxljk2ff6wqfeydzhc7noxvto6/lib/libh5async.so(+0x319f8)[0x7f67810eb9f8]
[leucht:99042] [ 3] /home/dev/spack/opt/spack/linux-ubuntu22.04-x86_64_v4/gcc-11.4.0/hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/lib/libhdf5.so.310(H5VL_dataset_write+0x9e)[0x7f67826a532e]
[leucht:99042] [ 4] /home/dev/spack/opt/spack/linux-ubuntu22.04-x86_64_v4/gcc-11.4.0/hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/lib/libhdf5.so.310(+0xc2f5c)[0x7f6782415f5c]
[leucht:99042] [ 5] /home/dev/spack/opt/spack/linux-ubuntu22.04-x86_64_v4/gcc-11.4.0/hdf5-1.14.5-5tlcfqadprf4tpamuxv4tvn67bcastlj/lib/libhdf5.so.310(H5Dwrite_async+0xf1)[0x7f67824192f1]
[leucht:99042] [ 6] ./a.out(+0x31fa)[0x55bc3b38a1fa]
[leucht:99042] [ 7] ./a.out(+0x44aa)[0x55bc3b38b4aa]
[leucht:99042] [ 8] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f678214dd90]
[leucht:99042] [ 9] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f678214de40]
[leucht:99042] [10] ./a.out(+0x2725)[0x55bc3b389725]
[leucht:99042] *** End of error message ***
--------------------------------------------------------------------------
prterun noticed that process rank 0 with PID 99042 on node leucht exited on
signal 11 (Segmentation fault).
--------------------------------------------------------------------------

@jhenderson I can cleanup the stacktrace completely by moving free() to be the last call after the _async close. This might help for now do narrow down the issue as I believe the free() isn’t the issue right now. Thought freeing my buffer before finishing the waiting process was definitely another issue, so it’s good we caught that already.

This makes sense as free() only becomes an issue when H5ESwait() detects an operation failure. In that instance it returns without waiting for the async calls to finish. If I move free() to the end the es_id will already be closed, thus cleaning up the stacktrace.

Though the initial issue still stand which is async vol experiencing operation failures during the async wait period. Is there any way for me to debug and find the specific reasons for those failures? I’ve been trying to figure out H5ESget_err_status() but sadly can’t figure out how to use it properly due to my lack in experience. Are there any examples?

Moving free() after H5ESwait() will be sufficient once I can fix that issue as you already assumed and I now noticed the guide also mentions.

Wait for async answer 
Finish waiting for async, num in progess: 139972320046787, failed: 60, status: 0 
Close async dataset 
Close sync dataspace 
Close async file 

I also want to add, that doing this without “memcpy” would be preferable for my use-cases. The DKRZ wants evaluate “vol async” on their Cluster and in the future I want to be able to arbitrarily scale this dataset which will also push memory consumption. I would therefore like to avoid using “memcpy” entirely for now.

Though what would be your suggestion in that case, would it be better to free() after or to simply use “memcpy”?

This is something that often must be determined on a case by case basis. Allowing the connector to copy your buffer can allow for more asynchrony, but also usually increases the amount of memory used at any given time during program execution. It also tends to give you less control over when buffers are freed. But if the system you’re running on has lots of memory to throw at a problem, allowing the buffer copies can be helpful.

Looking back at your program, you should make sure that the calls to H5Dclose_async() and H5Fclose_async() come after the call to H5ESwait(). Otherwise, H5ESclose() should fail if there are any active operations going on in the event set being closed.

For checking for errors, the following is boilerplate for this from the async VOL document:

// Check if event set has failed operations (es_err_status is set to true)
status = H5ESget_err_status(es_id, &es_err_status);
// Retrieve the number of failed operations in this event set
H5ESget_err_count(es_id, &es_err_count);
// Retrieve information about failed operations
H5ESget_err_info(es_id, 1, &err_info, &es_err_cleared);
// Inspect and handle the error if there is any
...
H5free_memory(err_info.api_name);
H5free_memory(err_info.api_args);
H5free_memory(err_info.app_file_name);
H5free_memory(err_info.app_func_name);

H5ESget_err_status() just returns a boolean in es_err_status telling you whether an error occurred. If it’s true, then you call H5ESget_err_count() to get the number of errors that occurred (returned in es_err_count) and then you should allocate an array of H5ES_err_info_t structures of size es_err_count, then call H5ESget_err_info() to populate those structures with info about each error that occurred. Each structure contains information like the API routine that was called, the source line in the program where it was called, etc. that you can inspect. Then, for each structure in the array you should call H5free_memory on the above fields to free memory that the library allocated. Note that all this error handling code should basically be the last thing in your current program, after the call to H5ESwait() and before closing the event set with H5ESclose().

1 Like

We don’t currently have any good examples showing how to do this, but we should create some.

1 Like

Right, I believe I have the calls in correct order now. Thanks already for that

The error handling boilerplate is another thing I overlooked, should have probably spend more time on that, apologies.

Also thanks for the information and clarifications. I will now actually implement the error handling as described and continue from there.