A lot of datasets in one hdf5 file -> rapidly increasing memory

Hello everyone,
I’m planning to use hdf5 to store a lot of image datasets with some metadata in one hdf5 file. Ideally there will be a lot of groups, each group will contain one frame consisting of different images and some metadata.
I tried different ways to realize this, but in the end there is always one problem: There seemed to be something like a memory leak. With every iteration (create, write, close a dataset or group) the main memory increases rapidly.

I found only one solution, close the library by “H5close()”. But this isn’t a usable way for me because it seems too much of time.

Some information about what I use:
HDF5 1.8.10 or 1.8.9
Visual Studio 2010
Windows 7

As mentioned before, I tried a lot of things:
• Use High level and low level API
• Use garbage collector with property lists
• Use “close” commands in every iteration for datasets, spaces and file
• Use flush or something similar
• Trying to manually free the memory
• Some other stuff

My questions are:
How can I avoid the library to allocate so much working memory?
How can I free the memory in every iteration?
Would you store these data in another way? Maybe there is a better way / strategy to store a lot of groups with many images and metadata in hfd5.

I attached a simple program to show the problem. It only opens one hdf5 and writes some datasets (to simplify with small datasets and without groups). As you can see the memory will increase rapidly.

Thanks a lot in advanced!
Nils

···

*********************************************

Sample code:
       hid_t file_id, dataset, intType, dataspace;
       hsize_t dims[RANK]={2,3};
       int data[6]={1,2,3,4,5,6};
       herr_t status;
       int count = 0;

       //for dataset name
       char datasetName[256];

       intType= H5Tcopy(H5T_NATIVE_INT);
       dataspace = H5Screate_simple(RANK, dims, NULL);

       /* create a HDF5 file */
       file_id = H5Fcreate ("hdf_test.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

       for(int i = 0; i <= 32000; i++){
             sprintf(datasetName, "/dset_%05d", i);
             //dataset = H5Gcreate(file_id, datasetName.str().c_str(), H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);

             //Create integer type dataset
             dataset = H5Dcreate(file_id, datasetName, intType, dataspace, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);

             //Release hdf5 dataset and flush file
             H5Fflush(file_id,H5F_SCOPE_GLOBAL);
             H5Dclose(dataset);

             Sleep(100);
             count++;
       
       }

       /* close file */
       status = H5Fclose (file_id);

Hi Nils,

Hello everyone,
I’m planning to use hdf5 to store a lot of image datasets with some metadata in one hdf5 file. Ideally there will be a lot of groups, each group will contain one frame consisting of different images and some metadata.
I tried different ways to realize this, but in the end there is always one problem: There seemed to be something like a memory leak. With every iteration (create, write, close a dataset or group) the main memory increases rapidly.

I found only one solution, close the library by “H5close()”. But this isn’t a usable way for me because it seems too much of time.

Some information about what I use:
HDF5 1.8.10 or 1.8.9
Visual Studio 2010
Windows 7

As mentioned before, I tried a lot of things:
• Use High level and low level API
• Use garbage collector with property lists
• Use “close” commands in every iteration for datasets, spaces and file
• Use flush or something similar
• Trying to manually free the memory
• Some other stuff

My questions are:
How can I avoid the library to allocate so much working memory?
How can I free the memory in every iteration?
Would you store these data in another way? Maybe there is a better way / strategy to store a lot of groups with many images and metadata in hfd5.

I attached a simple program to show the problem. It only opens one hdf5 and writes some datasets (to simplify with small datasets and without groups). As you can see the memory will increase rapidly.

  HDF5 uses internal free lists for memory it allocates. You could try calling H5garbage_collect() in your loop and see if that makes a difference. (But, in the long-run, it shouldn't matter, since the memory that HDF5 uses while creating one dataset will be re-used for creating the next)

  Quincey

···

On Nov 21, 2012, at 3:04 AM, Ceratos@gmx.de wrote:
  

Thanks a lot in advanced!
Nils

*********************************************

Sample code:
      hid_t file_id, dataset, intType, dataspace;
      hsize_t dims[RANK]={2,3};
      int data[6]={1,2,3,4,5,6};
      herr_t status;
      int count = 0;

      //for dataset name
      char datasetName[256];

      intType= H5Tcopy(H5T_NATIVE_INT);
      dataspace = H5Screate_simple(RANK, dims, NULL);

      /* create a HDF5 file */
      file_id = H5Fcreate ("hdf_test.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

      for(int i = 0; i <= 32000; i++){
            sprintf(datasetName, "/dset_%05d", i);
            //dataset = H5Gcreate(file_id, datasetName.str().c_str(), H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);

            //Create integer type dataset
            dataset = H5Dcreate(file_id, datasetName, intType, dataspace, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);

            //Release hdf5 dataset and flush file
            H5Fflush(file_id,H5F_SCOPE_GLOBAL);
            H5Dclose(dataset);

            Sleep(100);
            count++;

      }

      /* close file */
      status = H5Fclose (file_id);

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Maybe for clarification:
What I really want is to save the frames (one group with different images
and metadata) continuously in one hdf5.
At the beginning I create only one file without a frame. Continuously the
frames arrive and I want to append them to the file.

Maybe there are other structures or strategy in HDF to store this?

Thanks in advanced!
Nils

···

--
View this message in context: http://hdf-forum.184993.n3.nabble.com/A-lot-of-datasets-in-one-hdf5-file-rapidly-increasing-memory-tp4025652p4025656.html
Sent from the hdf-forum mailing list archive at Nabble.com.

Hi Nils,

···

On Nov 22, 2012, at 1:39 AM, ceratos <ceratos@gmx.de> wrote:

Maybe for clarification:
What I really want is to save the frames (one group with different images
and metadata) continuously in one hdf5.
At the beginning I create only one file without a frame. Continuously the
frames arrive and I want to append them to the file.

Maybe there are other structures or strategy in HDF to store this?

  You could create a 3-D chunked dataset with an unlimited dimension and store each frame as a new "slice" in the Z direction.

  Quincey

Hi Nils,

Maybe for clarification:
What I really want is to save the frames (one group with different images
and metadata) continuously in one hdf5.
At the beginning I create only one file without a frame. Continuously the
frames arrive and I want to append them to the file.

Maybe there are other structures or strategy in HDF to store this?

        You could create a 3-D chunked dataset with an unlimited dimension and store each frame as a new "slice" in the Z direction.

Just curious, is this faster than packet table?

···

On Sat, Nov 24, 2012 at 6:33 PM, Quincey Koziol <koziol@hdfgroup.org> wrote:

On Nov 22, 2012, at 1:39 AM, ceratos <ceratos@gmx.de> wrote:

        Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Well, packet tables are designed for 1-D data, so it's a bit of an apples-to-oranges comparison...

    Quincey

···

On Nov 24, 2012, at 8:59 PM, dashesy <dashesy@gmail.com> wrote:

On Sat, Nov 24, 2012 at 6:33 PM, Quincey Koziol <koziol@hdfgroup.org> wrote:

Hi Nils,

On Nov 22, 2012, at 1:39 AM, ceratos <ceratos@gmx.de> wrote:

Maybe for clarification:
What I really want is to save the frames (one group with different images
and metadata) continuously in one hdf5.
At the beginning I create only one file without a frame. Continuously the
frames arrive and I want to append them to the file.

Maybe there are other structures or strategy in HDF to store this?

       You could create a 3-D chunked dataset with an unlimited dimension and store each frame as a new "slice" in the Z direction.

Just curious, is this faster than packet table?

Hi Nils,

Maybe for clarification:
What I really want is to save the frames (one group with different images
and metadata) continuously in one hdf5.
At the beginning I create only one file without a frame. Continuously the
frames arrive and I want to append them to the file.

Maybe there are other structures or strategy in HDF to store this?

       You could create a 3-D chunked dataset with an unlimited dimension and store each frame as a new "slice" in the Z direction.

Just curious, is this faster than packet table?

        Well, packet tables are designed for 1-D data, so it's a bit of an apples-to-oranges comparison...

But I am using it for vector data of fixed length, data type can be
anything, like a 2D array representing an image.
Again this is a matter of choice, but I am more curious to know which
one is better for realtime tasks (requires less IO for example)

···

On Sat, Nov 24, 2012 at 8:22 PM, Quincey Koziol <koziol@hdfgroup.org> wrote:

On Nov 24, 2012, at 8:59 PM, dashesy <dashesy@gmail.com> wrote:

On Sat, Nov 24, 2012 at 6:33 PM, Quincey Koziol <koziol@hdfgroup.org> wrote:

On Nov 22, 2012, at 1:39 AM, ceratos <ceratos@gmx.de> wrote:

                Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

In that case, I would expect the packet tables to perform similarly to what I described. (They are just a wrapper around the functionality in the library that implements what I described)

  Quincey

···

On Nov 24, 2012, at 9:38 PM, dashesy <dashesy@gmail.com> wrote:

On Sat, Nov 24, 2012 at 8:22 PM, Quincey Koziol <koziol@hdfgroup.org> wrote:

On Nov 24, 2012, at 8:59 PM, dashesy <dashesy@gmail.com> wrote:

On Sat, Nov 24, 2012 at 6:33 PM, Quincey Koziol <koziol@hdfgroup.org> wrote:

Hi Nils,

On Nov 22, 2012, at 1:39 AM, ceratos <ceratos@gmx.de> wrote:

Maybe for clarification:
What I really want is to save the frames (one group with different images
and metadata) continuously in one hdf5.
At the beginning I create only one file without a frame. Continuously the
frames arrive and I want to append them to the file.

Maybe there are other structures or strategy in HDF to store this?

      You could create a 3-D chunked dataset with an unlimited dimension and store each frame as a new "slice" in the Z direction.

Just curious, is this faster than packet table?

       Well, packet tables are designed for 1-D data, so it's a bit of an apples-to-oranges comparison...

But I am using it for vector data of fixed length, data type can be
anything, like a 2D array representing an image.
Again this is a matter of choice, but I am more curious to know which
one is better for realtime tasks (requires less IO for example)