I have written a simple program in that creates an hdf5 file with one group and one attribute. Then the program writes 100 datasets under the group. I wrote this program to check if there is a memory increase when writing data sets. I am using Windows 11 and Visual Studio 2019 and HDF5 version HDF5 library version: 1.14.2
Looking at the memory usage in the visual studio, the memory slowly increases from 3 MB to 4 MB after writing about 46 data sets. I tired doing a global flush and explicitly call for a garbage collect but it did not seem to help. The only way I saw where it is possible to keep memory constant to close and re-open the file after each write. My question, is there a way to keep the memory constant without having to close an re-poening the file? here is a screen shot of the memory increase and I have posted the code below that.
Thank you for the reply @hyoklee
Just to leave an English answer here, the best practice here is “not to use flush” because it could cause a memory leak where the data is written to the disk but not necessarily released from memory.
Instead the best practice is to close the file after the write is complete.
I will rewrite my example with no flush and again with an open and close to compare memory usage and post it to have a complete comparison which I would hope with that it would help someone else who might have the same question. I will do this later this week.
Hello,
So I worked on my example and changed the code to make it easier to provide more output. I have added a flag to either do multiple writes without closing the file ( no flush ) and with closing the file. The memory increase still happens and it does not seem to matter the files closed between writes or not. I am not sure if I am doing anything wrong or if I am missing something. I would appreciate any further comments.
Here are screen shots again from the memory usage monitor in visual studio ( 2019 )
initial memory
After 50 or more writes
HDF LIBRARY VERSION FROM H5get_libversion() = 1.14.2
Sleep between writes =500 milli-seconds
Size of data to be written per dataset =1 MB
Here is my code. Please note that there is a boolean flag called “closeReopen” . When the flag is set to true the file will be closed and reopened after each write. If the flag is false the file will not be closed.
#include "H5Cpp.h"
#include <iostream>
#include <vector>
#include <string>
#include <chrono>
#include <iostream>
#include <thread>
int main(int argc, char** argv)
{
std::string filename = "test.h5";
size_t numberOfWrites = 100; // number of writes to prefom
size_t waitInMilliSec = 500; // wait time between writes in millseconds
bool closeReopen = false; // choose wether to close and reopen between write ( set to true ) or not close ( set to false )
size_t dataSize = 256 * 256 * 4; // the data size to be written ( float vector with each item having a value of 10.0f)
H5::H5File* localFileHandel = nullptr;
/*
* OPEN FILE
*/
try
{
localFileHandel = new H5::H5File(filename, H5F_ACC_TRUNC);
}
catch (H5::FileIException& error)
{
std::cout << "ERROR: po::io::HDF5::createNew() Unable to create an new HDF5 file with the name = " << filename << std::endl;
std::cout << error.getDetailMsg() << std::endl;
return -1;
}
std::cout << "Created HDF file:" << filename << std::endl;
/*
* CREATE GROUP
*/
// First create a group and call it Group Test
H5::Group group = localFileHandel->createGroup("GroupTest");
/*
* Check if the group creation was OK
*/
if (false == group.isValid(group.getId()))
{
std::cout << "Error: unable to create group " << std::endl;
return -1;
}
else
{
std::cout << "Group created OK" << std::endl;
}
/*
* CREATE GROUP
*/
//Create an integer scalar attribute and call the attribute Attribute test
H5::DataSpace dataSpace(H5S_SCALAR);
H5::Attribute attribute = group.createAttribute("AttributeTest", H5::PredType::NATIVE_INT, dataSpace);
/*
* Get the library version and print it
*/
unsigned int h5MajorVersion;
unsigned int h5MinorVersion;
unsigned int h5ReleaseNumber;
H5get_libversion(&h5MajorVersion, &h5MinorVersion, &h5ReleaseNumber);
/* Create the data that we will be writing muiltiple times
* The data size is dataSize, the data is initialized to 10.0f
* single precision real data
*/
std::vector<float> data(dataSize, 10.0f);
float vectorSizeInMem = (float)(data.capacity() * sizeof(float)) / 1024 / 1024;
std::cout << "Testing HDF5 multiple writes." << std::endl;
std::cout << "Writing = " << numberOfWrites << " Times" << std::endl;
std::cout << "HDF LIBRARY VERSION FROM H5get_libversion() = " << h5MajorVersion << "." << h5MinorVersion << "." << h5ReleaseNumber << std::endl;
std::cout << "Sleep between writes =" << waitInMilliSec << " milli-seconds" << std::endl;
std::cout << "Size of data to be written per dataset =" << vectorSizeInMem << " MB" << std::endl;
if (true == closeReopen)
{
std::cout << "File will be closed and reopen after each data set" << std::endl;
}
else
{
std::cout << "File will not be closed between writes" << std::endl;
}
std::cout << std::endl << std::endl;
/*
* Loop and write the data
*/
for (size_t ii = 0; ii < numberOfWrites; ++ii)
{
// Group close / reopen does not have an affect on the memory !
// group = localFileHandel->openGroup("GroupTest");
//
// The data set name will be the loop index number
std::string dataName = std::to_string(ii);
std::vector<hsize_t> dims = { (hsize_t)data.size() };
/*
* Create the dataspace and the data set with GroupTest as its parent
*/
H5::DataSpace dataSpace((hsize_t)dims.size(), &dims[0], &dims[0]);
H5::DataSet dataSet = group.createDataSet(dataName, H5::FloatType(H5::PredType::NATIVE_FLOAT), dataSpace);
/*
* Write the data set
*/
dataSet.write(&data[0],H5::FloatType(H5::PredType::NATIVE_FLOAT));
/*
* Close the data set
*/
dataSet.close();
// Group close / reopen does not have an affect on the memory !
//group.close();
/*
* Adding sleep to wait between writes
*/
std::cout << "Wrote data set" << ii << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(waitInMilliSec));
if (true == closeReopen)
{
// close the file
//std::cout << "closing file with close() function " << std::endl;
localFileHandel->close();
// re-open the file with read write access to append data to it
try
{
std::cout << "open file with H5F_ACC_RDWR attribute " << std::endl;
localFileHandel->openFile(filename, H5F_ACC_RDWR);
}
catch (H5::FileIException& error)
{
std::cout << "ERROR: po::io::HDF5::createNew() Unable to create an new HDF5 file with the name = " << filename << std::endl;
std::cout << error.getDetailMsg() << std::endl;
return -1;
}
}
}
/*
* Write info one more time since the print for each ouput to console is done
* which makes it difficult to scroll the top some times to see this information
*/
std::cout << "Wrote = " << numberOfWrites << " Times" << std::endl;
std::cout << "HDF LIBRARY VERSION FROM H5get_libversion() = " << h5MajorVersion << "." << h5MinorVersion << "." << h5ReleaseNumber << std::endl;
std::cout << "Sleep between writes =" << waitInMilliSec << " milli-seconds" << std::endl;
std::cout << "Size of data to be written per dataset =" << vectorSizeInMem << " MB" << std::endl;
if (true == closeReopen)
{
std::cout << "File will be closed and reopened after each data set" << std::endl;
}
else
{
std::cout << "File will not be be closed between writes" << std::endl;
}
return 0;
}
Hello, I have finally figured out what is going on with the help of my collogues. I am posting a modified version of the code with what I believe is the correct way to handle H5 file writes over time.
It is important to close all attributes / groups / data sets after each use and also close the file. Closing the file alone will still create memory leaks.
In my previous post, I did not close the attribute and did not close and repoen the group which was causing the issue.
In m new modified program there are no memory leaks when I have the flag I created closeReopen set to true ( which will open and close the file for each data set write ) . If he closeReopen flag is turned to false then the HF5 file will only opened once. This will create a memory leak.
Hope this code will help others
#include "H5Cpp.h"
#include <iostream>
#include <vector>
#include <string>
#include <chrono>
#include <iostream>
#include <thread>
int main(int argc, char** argv)
{
/*
* To avoid memory leaks :
* close all open groups / attributes / data sets after each use
* close the file after each data set write and repon it again
*/
std::string filename = "test.h5";
size_t numberOfWrites = 100; // number of writes to prefom
size_t waitInMilliSec = 500; // wait time between writes in millseconds
bool closeReopen = true; // choose wether to close and reopen between write ( set to true ) or not close ( set to false )
size_t dataSize = 256 * 256 * 4; // the data size to be written ( float vector with each item having a value of 10.0f)
H5::H5File* localFileHandel = nullptr;
/*
* OPEN FILE
*/
try
{
localFileHandel = new H5::H5File(filename, H5F_ACC_TRUNC);
}
catch (H5::FileIException& error)
{
std::cout << "ERROR: po::io::HDF5::createNew() Unable to create an new HDF5 file with the name = " << filename << std::endl;
std::cout << error.getDetailMsg() << std::endl;
return -1;
}
std::cout << "Created HDF file:" << filename << std::endl;
/*
* CREATE GROUP
*/
// First create a group and call it Group Test
H5::Group group = localFileHandel->createGroup("GroupTest");
/*
* Check if the group creation was OK
*/
if (false == group.isValid(group.getId()))
{
std::cout << "Error: unable to create group " << std::endl;
return -1;
}
else
{
std::cout << "Group created OK" << std::endl;
}
/*
* CREATE GROUP
*/
//Create an integer scalar attribute and call the attribute Attribute test
H5::DataSpace dataSpace(H5S_SCALAR);
H5::Attribute attribute = group.createAttribute("AttributeTest", H5::PredType::NATIVE_INT, dataSpace);
/*
* NOTE : it is important that you close groups/attributes/data sets after they have been used
* If not, this creates a memory leak even if , you close and repon the file
*/
attribute.close();
group.close();
/*
* Get the library version and print it
*/
unsigned int h5MajorVersion;
unsigned int h5MinorVersion;
unsigned int h5ReleaseNumber;
H5get_libversion(&h5MajorVersion, &h5MinorVersion, &h5ReleaseNumber);
/* Create the data that we will be writing muiltiple times
* The data size is dataSize, the data is initialized to 10.0f
* single precision real data
*/
std::vector<float> data(dataSize, 10.0f);
float vectorSizeInMem = (float)(data.capacity() * sizeof(float)) / 1024 / 1024;
std::cout << "Testing HDF5 multiple writes." << std::endl;
std::cout << "Writing = " << numberOfWrites << " Times" << std::endl;
std::cout << "HDF LIBRARY VERSION FROM H5get_libversion() = " << h5MajorVersion << "." << h5MinorVersion << "." << h5ReleaseNumber << std::endl;
std::cout << "Sleep between writes =" << waitInMilliSec << " milli-seconds" << std::endl;
std::cout << "Size of data to be written per dataset =" << vectorSizeInMem << " MB" << std::endl;
if (true == closeReopen)
{
std::cout << "File will be closed and reopen after each data set" << std::endl;
}
else
{
std::cout << "File will not be closed between writes" << std::endl;
}
std::cout << std::endl << std::endl;
/*
* Loop and write the data
*/
for (size_t ii = 0; ii < numberOfWrites; ++ii)
{
// Group open
// Again important to open / close group after use to avoid memory leaks
group = localFileHandel->openGroup("GroupTest");
//
// The data set name will be the loop index number
std::string dataName = std::to_string(ii);
std::vector<hsize_t> dims = { (hsize_t)data.size() };
/*
* Create the dataspace and the data set with GroupTest as its parent
*/
H5::DataSpace dataSpace((hsize_t)dims.size(), &dims[0], &dims[0]);
H5::DataSet dataSet = group.createDataSet(dataName, H5::FloatType(H5::PredType::NATIVE_FLOAT), dataSpace);
/*
* Write the data set
*/
dataSet.write(&data[0],H5::FloatType(H5::PredType::NATIVE_FLOAT));
/*
* Close the data set
*/
dataSet.close();
// Group close
group.close();
/*
* Adding sleep to wait between writes
*/
std::cout << "Wrote data set" << ii << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(waitInMilliSec));
if (true == closeReopen)
{
// close the file
//std::cout << "closing file with close() function " << std::endl;
localFileHandel->close();
// re-open the file with read write access to append data to it
try
{
std::cout << "open file with H5F_ACC_RDWR attribute " << std::endl;
localFileHandel->openFile(filename, H5F_ACC_RDWR);
}
catch (H5::FileIException& error)
{
std::cout << "ERROR: po::io::HDF5::createNew() Unable to create an new HDF5 file with the name = " << filename << std::endl;
std::cout << error.getDetailMsg() << std::endl;
return -1;
}
}
}
/*
* Write info one more time since the print for each ouput to console is done
* which makes it difficult to scroll the top some times to see this information
*/
std::cout << "Wrote = " << numberOfWrites << " Times" << std::endl;
std::cout << "HDF LIBRARY VERSION FROM H5get_libversion() = " << h5MajorVersion << "." << h5MinorVersion << "." << h5ReleaseNumber << std::endl;
std::cout << "Sleep between writes =" << waitInMilliSec << " milli-seconds" << std::endl;
std::cout << "Size of data to be written per dataset =" << vectorSizeInMem << " MB" << std::endl;
if (true == closeReopen)
{
std::cout << "File will be closed and reopened after each data set" << std::endl;
}
else
{
std::cout << "File will not be be closed between writes" << std::endl;
}
return 0;
}