Proper way to create a group in HDF5 file

I have some code that I wrote that creates a group in an hdf5 file:

hid_t H5Utilities::createGroup(hid_t loc_id, const std::string &group)
  hid_t grp_id = -1;
  herr_t err = -1;
err = H5Gget_objinfo(loc_id, group.c_str(), 0, NULL);
  if (err == 0)
  {
    grp_id = H5Gopen(loc_id, group.c_str());
  }
  else
  {
    grp_id = H5Gcreate(loc_id, group.c_str(), 0);
  }
return grp_id;
}

That code gets called lots and lots of times (~44,000 times) during
the course of a program and I have noticed that my memory usage keeps
rising during the entire time the file is open which suggests a memory
leak maybe? Here is some code that I am using to test that bit of
code:

void StressTestCreateGroups()
{
  std::cout << logTime() << " Starting StressTestCreateGroups()" << std::endl;
  char path[64];
  ::memset(path, 0, 64);
  int err = 0;
  hid_t file_id;
  hid_t grpId;
  /* Create a new file using default properties. */
  file_id = H5Fcreate( MXAUnitTest::H5UtilTest::GroupTest.c_str(),
H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT );
  BOOST_REQUIRE(file_id > 0);
  for (int i = 0; i < 100; ++i) {
// err = H5Fclose(file_id);
// BOOST_REQUIRE(err >= 0);
// file_id = H5Fopen(MXAUnitTest::H5UtilTest::GroupTest.c_str(),
H5F_ACC_RDWR, H5P_DEFAULT);
    ::memset(path, 0, 64);
    snprintf(path, 64, "/%03d", i);
    grpId = H5Utilities::createGroup(file_id, path);
    BOOST_REQUIRE(grpId > 0);
    err = H5Gclose(grpId);
    BOOST_REQUIRE(err >= 0);
    for (int j = 0; j < 100; ++j) {

      snprintf(path, 64, "/%03d/%03d", i, j);
      grpId = H5Utilities::createGroup(file_id, path);
      BOOST_REQUIRE(grpId > 0);
      BOOST_REQUIRE(grpId > 0);
      err = H5Gclose(grpId);
      BOOST_REQUIRE(err >= 0);
      for (int k = 0; k < 100; ++k) {

        snprintf(path, 64, "/%03d/%03d/%03d", i, j, k);
        grpId = H5Utilities::createGroup(file_id, path);
        BOOST_REQUIRE(grpId >= 0);
        BOOST_REQUIRE(grpId > 0);
        err = H5Gclose(grpId);
        BOOST_REQUIRE(err >= 0);
      }
    }
  }
  err = H5Fclose(file_id);
  BOOST_REQUIRE(err >= 0);
}

That test shows the same type of memory usage pattern. Using some
tools on OS X seems to suggest that all the memory is coming from
malloc routines from HDF5. I can somewhat control the usage if I close
then reopen the file at the top of each outer loop (code currently
commented out). My question is therefore: is this a known usage
scenario? Should I be using different code to create groups in HDF5
files? Is HDF5 caching the name/path of the group for later usage?

  This is with HDF5 1.6.8 on OS X 10.5.6 Intel.

Any help would be appreciated.

···

-----
Mike Jackson www.bluequartz.net
BlueQuartz Software Principal Software Engineer

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Mike,

We have been unable to reproduce the problem here. Version 1.6.8 shows a maximum memory growth of ~200k over the course of the program (which of course creates 1,000,000 groups), which is totally eliminated by closing the file in the outer loop. Version 1.6.8 does show higher memory usage than 1.8, though the memory growth in 1.8 is higher (but still relatively small, and can also be eliminated by closing the file). I suspect the small amount of file growth that we do see is due to growth of the metadata cache, which gets flushed when the file is closed. What tool are you using to measure the memory usage of your application?

Thanks,
Neil Fortner
The HDF Group

Mike Jackson wrote:

···

I have some code that I wrote that creates a group in an hdf5 file:

hid_t H5Utilities::createGroup(hid_t loc_id, const std::string &group)
  hid_t grp_id = -1;
  herr_t err = -1;
err = H5Gget_objinfo(loc_id, group.c_str(), 0, NULL);
  if (err == 0)
  {
    grp_id = H5Gopen(loc_id, group.c_str());
  }
  else
  {
    grp_id = H5Gcreate(loc_id, group.c_str(), 0);
  }
return grp_id;
}

That code gets called lots and lots of times (~44,000 times) during
the course of a program and I have noticed that my memory usage keeps
rising during the entire time the file is open which suggests a memory
leak maybe? Here is some code that I am using to test that bit of
code:

void StressTestCreateGroups()
{
  std::cout << logTime() << " Starting StressTestCreateGroups()" << std::endl;
  char path[64];
  ::memset(path, 0, 64);
  int err = 0;
  hid_t file_id;
  hid_t grpId;
  /* Create a new file using default properties. */
  file_id = H5Fcreate( MXAUnitTest::H5UtilTest::GroupTest.c_str(),
H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT );
  BOOST_REQUIRE(file_id > 0);
  for (int i = 0; i < 100; ++i) {
// err = H5Fclose(file_id);
// BOOST_REQUIRE(err >= 0);
// file_id = H5Fopen(MXAUnitTest::H5UtilTest::GroupTest.c_str(),
H5F_ACC_RDWR, H5P_DEFAULT);
    ::memset(path, 0, 64);
    snprintf(path, 64, "/%03d", i);
    grpId = H5Utilities::createGroup(file_id, path);
    BOOST_REQUIRE(grpId > 0);
    err = H5Gclose(grpId);
    BOOST_REQUIRE(err >= 0);
    for (int j = 0; j < 100; ++j) {

      snprintf(path, 64, "/%03d/%03d", i, j);
      grpId = H5Utilities::createGroup(file_id, path);
      BOOST_REQUIRE(grpId > 0);
      err = H5Gclose(grpId);
      BOOST_REQUIRE(err >= 0);
      for (int k = 0; k < 100; ++k) {

        snprintf(path, 64, "/%03d/%03d/%03d", i, j, k);
        grpId = H5Utilities::createGroup(file_id, path);
        BOOST_REQUIRE(grpId >= 0);
        BOOST_REQUIRE(grpId > 0);
        err = H5Gclose(grpId);
        BOOST_REQUIRE(err >= 0);
      }
    }
  }
  err = H5Fclose(file_id);
  BOOST_REQUIRE(err >= 0);
}

That test shows the same type of memory usage pattern. Using some
tools on OS X seems to suggest that all the memory is coming from
malloc routines from HDF5. I can somewhat control the usage if I close
then reopen the file at the top of each outer loop (code currently
commented out). My question is therefore: is this a known usage
scenario? Should I be using different code to create groups in HDF5
files? Is HDF5 caching the name/path of the group for later usage?

  This is with HDF5 1.6.8 on OS X 10.5.6 Intel.

Any help would be appreciated.

-----
Mike Jackson www.bluequartz.net
BlueQuartz Software Principal Software Engineer

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.
  
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Well this is odd. The initial measurements that I did were on a new Mac Pro. The tools I used were Activity Monitor, Big Top, and Instruments from the OS X developer tools. Now compiling on a Core 2 Duo based Mac Book Pro I can not seem to reproduce the problem. Odd. I'll rerun the tests in the morning at work to see if I can reproduce the issues there again.

Thanks for the sanity check I do appreciate the time that you spent to verify things are working properly.

···

_________________________________________________________
Mike Jackson mike.jackson@bluequartz.net
BlueQuartz Software www.bluequartz.net
Principal Software Engineer Dayton, Ohio

On Apr 16, 2009, at 6:38 PM, Neil Fortner wrote:

Mike,

We have been unable to reproduce the problem here. Version 1.6.8 shows a maximum memory growth of ~200k over the course of the program (which of course creates 1,000,000 groups), which is totally eliminated by closing the file in the outer loop. Version 1.6.8 does show higher memory usage than 1.8, though the memory growth in 1.8 is higher (but still relatively small, and can also be eliminated by closing the file). I suspect the small amount of file growth that we do see is due to growth of the metadata cache, which gets flushed when the file is closed. What tool are you using to measure the memory usage of your application?

Thanks,
Neil Fortner
The HDF Group

Mike Jackson wrote:

I have some code that I wrote that creates a group in an hdf5 file:

hid_t H5Utilities::createGroup(hid_t loc_id, const std::string &group)
hid_t grp_id = -1;
herr_t err = -1;
err = H5Gget_objinfo(loc_id, group.c_str(), 0, NULL);
if (err == 0)
{
   grp_id = H5Gopen(loc_id, group.c_str());
}
else
{
   grp_id = H5Gcreate(loc_id, group.c_str(), 0);
}
return grp_id;
}

That code gets called lots and lots of times (~44,000 times) during
the course of a program and I have noticed that my memory usage keeps
rising during the entire time the file is open which suggests a memory
leak maybe? Here is some code that I am using to test that bit of
code:

void StressTestCreateGroups()
{
std::cout << logTime() << " Starting StressTestCreateGroups()" << std::endl;
char path[64];
::memset(path, 0, 64);
int err = 0;
hid_t file_id;
hid_t grpId;
/* Create a new file using default properties. */
file_id = H5Fcreate( MXAUnitTest::H5UtilTest::GroupTest.c_str(),
H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT );
BOOST_REQUIRE(file_id > 0);
for (int i = 0; i < 100; ++i) {
// err = H5Fclose(file_id);
// BOOST_REQUIRE(err >= 0);
// file_id = H5Fopen(MXAUnitTest::H5UtilTest::GroupTest.c_str(),
H5F_ACC_RDWR, H5P_DEFAULT);
   ::memset(path, 0, 64);
   snprintf(path, 64, "/%03d", i);
   grpId = H5Utilities::createGroup(file_id, path);
   BOOST_REQUIRE(grpId > 0);
   err = H5Gclose(grpId);
   BOOST_REQUIRE(err >= 0);
   for (int j = 0; j < 100; ++j) {

     snprintf(path, 64, "/%03d/%03d", i, j);
     grpId = H5Utilities::createGroup(file_id, path);
     BOOST_REQUIRE(grpId > 0);
     err = H5Gclose(grpId);
     BOOST_REQUIRE(err >= 0);
     for (int k = 0; k < 100; ++k) {

       snprintf(path, 64, "/%03d/%03d/%03d", i, j, k);
       grpId = H5Utilities::createGroup(file_id, path);
       BOOST_REQUIRE(grpId >= 0);
       BOOST_REQUIRE(grpId > 0);
       err = H5Gclose(grpId);
       BOOST_REQUIRE(err >= 0);
     }
   }
}
err = H5Fclose(file_id);
BOOST_REQUIRE(err >= 0);
}

That test shows the same type of memory usage pattern. Using some
tools on OS X seems to suggest that all the memory is coming from
malloc routines from HDF5. I can somewhat control the usage if I close
then reopen the file at the top of each outer loop (code currently
commented out). My question is therefore: is this a known usage
scenario? Should I be using different code to create groups in HDF5
files? Is HDF5 caching the name/path of the group for later usage?

This is with HDF5 1.6.8 on OS X 10.5.6 Intel.

Any help would be appreciated.

-----
Mike Jackson www.bluequartz.net
BlueQuartz Software Principal Software Engineer

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.