Memory allocation failed for raw data chunk

Hi Team,

I have a memory problem when trying to write large chunked dataset’s in a loop. The process consumes all available RAM and crashes at the end. This is an output from HDF5 lib:

#000: C:\Data\09_C\hdf5-1.8.20\src\H5Dio.c line 322 in H5Dwrite(): can’t prepare for writing data
major: Dataset
minor: Write failed
#001: C:\Data\09_C\hdf5-1.8.20\src\H5Dio.c line 403 in H5D__pre_write(): can’t write data
#002: C:\Data\09_C\hdf5-1.8.20\src\H5Dio.c line 846 in H5D__write(): can’t write data
#003: C:\Data\09_C\hdf5-1.8.20\src\H5Dchunk.c line 2224 in H5D__chunk_write(): unable to read raw data chunk
major: Low-level I/O
minor: Read failed
#004: C:\Data\09_C\hdf5-1.8.20\src\H5Dchunk.c line 3093 in H5D__chunk_lock(): memory allocation failed for raw data chunk
major: Resource unavailable
minor: No space available for allocation

I carefully checked that I close all objects except dataSet which I keep open until the end of a program.

When I close and reopen dataset on each iteration the situation becomes better, but I see the memory increases each time the H5Dwrite is called. I also checked H5Fget_obj_count each loop is returning 1 for root file instance which is OK.

The memory is getting free only when the H5Fclose is called at the end of the program.

Do I missed something?

Can you tell us something about the datatype (in memory and on disk), the chunk size,
and the growth of your dataset? A sketch of your loop might help as well.
Are you writing one full chunk at a time? (You might be able to write the chunk directly
into the file.)

G.

One way to find out is by using valgrind.

From your description it is not possible to tell what is inside the loop, and how resource allocations are handled; however I can tell that the HDF5 CAPI checked out fine for all recent versions.
If C++ is an options for you, then recommend you H5CPP an easy to use persistence for modern C++. Its packet table implementation is profiled, and covers most use cases.

Chris Drozdowski got H5CPP up working on Windows – by making adjustment to the header files.

Here are his words:

Your project is simply awesome! Just a few lines of code allows me to generate the HDF5 files I need. You’ve done a great service to the community.

If you are interested in the windows setup either contact him, or I can post his method. As soon as schedule allows I will incorporate his method as well as a cross compiled binary of the LLVM based reflection tool for windows.

hope it helps: steve

Hi Roman,

We had a very similar problem recently. In our case the memory leak was associated with creating many small files with a single 2-D array each. The leak was much smaller if we appended each of the arrays into 3-D array in a large file. We verified with H5Fget_obj_count() that the file itself was the only open object before closing it. Only completely closing HDF5 library itself with H5close() would free the memory, but that is not a good thing to do.

We finally tracked the problem down to failures to call H5Tclose() H5Sclose(),and H5Pclose().

I have appended part of our e-mail thread with HDF5 support folks.

Mark

Hi Roman,

I am sending this again, because the list server seems to have removed the thread with HDF5 support I was trying to send you. I’ve reformatted it so hopefully it will get through this time.

We had a very similar problem recently. In our case the memory leak was associated with creating many small files with a single 2-D array each. The leak was much smaller if we appended each of the arrays into 3-D array in a large file. We verified with H5Fget_obj_count() that the file itself was the only open object before closing it. Only completely closing HDF5 library itself with H5close() would free the memory, but that is not a good thing to do.

We finally tracked the problem down to failures to call H5Tclose() H5Sclose(),and H5Pclose().

This is part of our e-mail thread with HDF5 support folks.

Hi Roman,

One more try, I don’t know what is causing the listserver to truncate my message, so I’ll just put the important parts of the e-mail thread in my message.

We had a very similar problem recently. In our case the memory leak was associated with creating many small files with a single 2-D array each. The leak was much smaller if we appended each of the arrays into 3-D array in a large file. We verified with H5Fget_obj_count() that the file itself was the only open object before closing it. Only completely closing HDF5 library itself with H5close() would free the memory, but that is not a good thing to do.

It turns out that we were not properly closing datatypes and dataspaces. Adding calls like this fixed the leaks.

H5Tclose(hdfdatatype);

H5Sclose(hdfdataspace);

H5Sclose(hdfattrdataspace);

H5Pclose(dset_access_plist);

H5Sclose(this->dataspace);

H5Pclose(create_plist);

H5Pclose(access_plist);

Note that we were checking that the only open object was the file itself before closing it.

This was the original problem description.

We have been tracking down a memory leak<https://github.com/areaDetector/ADCore/issues/385 > to the HDF5 library used in the EPICS areaDetector HDF5 file writer plugin. The leak appears in the cycle of reating and closing files. There seems to be a blocking of leaks in 64MB blocks<https://github.com/areaDetector/ADCore/issues/385#issuecomment-476746667 > over 1000 cycles.

Initially we thought that we were forgetting to close some objects before but printing out the open object count<https://github.com/areaDetector/ADCore/issues/385#issuecomment-477496953 > doesn’t seem to reveal any leaks. We also set the “close degree” to “strong” anyway.

The leaks goes away when we insert a call to H5close<https://github.com/areaDetector/ADCore/issues/385#issuecomment-477521748 > after closing the file - but that seems a bit excessive.

Mark

Thank you for response.

Basically, we created a trace mechanism for the code generated from Simulink models. The HDF5 file structure follows the model hierarchy and can contain different datatypes with scalars, arrays, compound types in all possible combinations. In this particular case the file contains 1759 datasets of different datatypes. Each dataset has 8001 points written as a chunks with 1000 elements.

We also tested the code with smaller models and don’t see any memory problems. It looks like it happends on particular datatype combinations.

Next is the loop code where datasets are written. CHUNK_SIZE is 1000. This is the 2nd solution where the datasets are opened and closed on each chunk write and the memory increases each H5Dwrite call (but not so dramatically if we leave datasets open until the program end). endpoint argument holds the dataset name, dataset type, data for writing and additional loop counters.

void writeChunks(hid_t root, HD5Endpoint *endpoint) {
if (endpoint->currentPoint >= CHUNK_SIZE) {
    herr_t status;
    hsize_t size[1];
    size[0] = endpoint->usedPoints + CHUNK_SIZE;
    hid_t dataSet = H5Dopen(root, endpoint->dataSetName, H5P_DEFAULT);
    if (dataSet > 0) 
    {
        status = H5Dset_extent(dataSet, size);
        hid_t fileSpace = H5Dget_space(dataSet);
        hsize_t offset[1];
        offset[0] = endpoint->usedPoints;
        hsize_t dimsext[1];
        dimsext[0] = CHUNK_SIZE;
        status = H5Sselect_hyperslab(fileSpace, H5S_SELECT_SET, offset, NULL, dimsext, NULL);
        hid_t memSpace = H5Screate_simple(1, dimsext, NULL);
        status = H5Dwrite(dataSet, endpoint->dataTypes.hids[endpoint->dataTypes.usedHids - 1], memSpace, fileSpace, H5P_DEFAULT, endpoint->data);
        H5Sclose(memSpace);
        H5Sclose(fileSpace);
        H5Dclose(dataSet);
        H5garbage_collect();
        endpoint->currentPoint = 0;
        endpoint->usedPoints += CHUNK_SIZE;
    } else {
        printf(" HD5 CAPI warning: cant open dataset %s. Skipping!\n", endpoint->dataSetName);
    }
    
}
}

Also, I need to write that we use ZLIB with Compression Level 7. But if we set Compression Level to 0, the memory problem remains the same.

I hope it helps. I will like to provide more details if needed.

P.S. In the file I share the CHUNK_SIZE is set to 200, but it doesn’t change the memory usage comparing to CHUNK_SIZE 1000.

P.P.S. I’v also tried latest HDF5 release (1.10.5), The memory problem remains the same.

Thanks Steven, unfortunately I can’t use C++ with this particular case. The C code needs to be compiled as static library with VS2013 and finally linked to the Fortran executable using Intel® Parallel Studio XE 2015. I also can’t use valgrind. What I tried is Dr. Memory for Windows (which is great alternative). I found some memory leaks in my code which I fixed, but it doesn’t show me any other problems I mentioned in this topic.

Also the original error messages I posted above are happened many times in a loop and app memory continues to increase before it actually crashes at the end.

Hi Mark, thank you for helping!

We actually write single file with many datasets and datatypes inside. Some of datatypes have hierarchical structure with many levels. it also can mix compound datatypes and arrays. I’v also checked only the root file instance is opened, but the memory increases on each write call. After closing the file all memory is released.

I’v also tried to call H5garbage_collect but it doesn’t help.

Best regards,
Roman.

Here is the our h5 file structure file_structure.zip (137.6 KB)

One of the responses we received from the HDF Group support before we solved our problem was the following.

One possible explanation:

The HDF5 library maintains a large number of free lists to avoid the overhead of malloc and free.

If memory serves, this memory is not normally released until the library shuts down. Thus if large numbers of files are opened simultaneously, written to and then closed, the memory needed to manage the files would not be released when the files are closed, but only when the HDF5 library is shut down.

Is this consistent with the observed behavior?

You can turn off the free lists by building with --enable-using-memchecker=yes with configure (or HDF5_ENABLE_USING_MEMCHECKER in CMake).

Mark

Thanks! I will try the option.

Can wrap that loop in a simple test program and see if it also has the problem? If so then the HDF Group may be able to figure out the issue.

This is very difficult to realize, because the structure of the hdf5 file is created from the Simulink model on the fly. When I use the same code to write trivial data, I don’t see any problems with memory.

I’v also tried to set HDF5_ENABLE_USING_MEMCHECKER. I think the memory usage is better now. Will try to leave all datasets open and see if it goes out of memory.

So I tested the library with HDF5_ENABLE_USING_MEMCHECKER and leaving datasets open. See the loop code:

void writeChunks(hid_t root, HD5Endpoint *endpoint) {
    if (endpoint->currentPoint >= CHUNK_SIZE) {
        herr_t status;
        hsize_t size[1];
        size[0] = endpoint->usedPoints + CHUNK_SIZE;
        //hid_t dataSet = H5Dopen(root, endpoint->dataSetName, H5P_DEFAULT);    dataset is already open
        hid_t dataSet = endpoint->dataSet;
        if (dataSet > 0) 
        {
            status = H5Dset_extent(dataSet, size);
            hid_t fileSpace = H5Dget_space(dataSet);
            hsize_t offset[1];
            offset[0] = endpoint->usedPoints;
            hsize_t dimsext[1];
            dimsext[0] = CHUNK_SIZE;
            status = H5Sselect_hyperslab(fileSpace, H5S_SELECT_SET, offset, NULL, dimsext, NULL);
            hid_t memSpace = H5Screate_simple(1, dimsext, NULL);
            status = H5Dwrite(dataSet, endpoint->dataTypes.hids[endpoint->dataTypes.usedHids - 1], memSpace, fileSpace, H5P_DEFAULT, endpoint->data);
            H5Sclose(memSpace);
            H5Sclose(fileSpace);
            //H5Dclose(dataSet);    don't close dataset. Close it on the program end.
            H5garbage_collect();
            endpoint->currentPoint = 0;
            endpoint->usedPoints += CHUNK_SIZE;
        } else {
            printf(" HD5 CAPI warning: cant open dataset %s. Skipping!\n", endpoint->dataSetName);
        }
        
    }
}

The application still goes out of memory.

HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
  #000: C:\Data\09_C\hdf5\src\H5Dio.c line 336 in H5Dwrite(): can't write data
    major: Dataset
    minor: Write failed
  #001: C:\Data\09_C\hdf5\src\H5Dio.c line 818 in H5D__write(): can't write data
  #002: C:\Data\09_C\hdf5\src\H5Dchunk.c line 2393 in H5D__chunk_write(): unable to read raw data chunk
    major: Low-level I/O
    minor: Read failed
  #003: C:\Data\09_C\hdf5\src\H5Dchunk.c line 3588 in H5D__chunk_lock(): memory allocation failed for raw data chunk
    major: Resource unavailable
    minor: No space available for allocation
  #002: C:\Data\09_C\hdf5\src\H5Dchunk.c line 2364 in H5D__chunk_write(): error looking up chunk address
    minor: Can't get value
  #003: C:\Data\09_C\hdf5\src\H5Dchunk.c line 2951 in H5D__chunk_lookup(): can't query chunk address
  #004: C:\Data\09_C\hdf5\src\H5Dbtree.c line 1049 in H5D__btree_idx_get_addr(): can't get chunk info
  #005: C:\Data\09_C\hdf5\src\H5B.c line 335 in H5B_find(): unable to load B-tree node
    major: B-Tree node
    minor: Unable to protect metadata
  #006: C:\Data\09_C\hdf5\src\H5AC.c line 1352 in H5AC_protect(): H5C_protect() failed
    major: Object cache
  #007: C:\Data\09_C\hdf5\src\H5C.c line 2345 in H5C_protect(): can't load entry
    minor: Unable to load metadata into cache
  #008: C:\Data\09_C\hdf5\src\H5C.c line 6699 in H5C_load_entry(): Can't deserialize image

Hi Roman,

We have use cases where hyperslab selections may cause memory growth. Eventually, application runs out of memory. It would be good to check if the recent improvements to the code address the issue you are seeing.

Would it be possible for you to try a snapshot of the HDF5 develop branch that we created for you? You can find it here ftp://gamma.hdfgroup.org/pub/outgoing/hdf5/snapshots/v111/hdf5-1.11.5.zip.

Also, could you please send to help@hdfgroup.org

  • h5dump output using -pH option
  • description of hyperslab you are writing

It will help us to reproduce the problem here.

Thank you!

Elena

Hi Elena, thanks! I will try tomorrow.