How to free memory after H5Fclose

Hi,

I am using HDF5 storage in SWMR mode. I am open files with huge number of datasets and then I close it but after H5Fclose the amount of memory reserved by the process is the same.

The same effect happens when data is written to a data set. All memory reserved by the process (for chunks, caches, etc) is not freed after file is closed (H5Fclose).

I have checked with valgrind but no memory leak is detected. It seams there is a free of memory before the process finishes but I need this free when file is closed.

Is possible to have memory free after H5Fclose without finishing the process?

A simplified example of my code follows the typical sequence:

···

———————————————————————————————
#include "hdf5.h"
#include "hdf5_hl.h"
#include <stdlib.h>

// Number of tables to create
#define NUM_DATASETS 10000
#define CHUNK_SIZE 100

typedef struct
{
  double data;
        long long timestamp;
} data_t;

int main(void)
{
  hid_t fid;
  hid_t sid;
  hid_t dcpl;
   hid_t pdsets[NUM_DATASETS];
  char dname[300];
      hsize_t dims[2] = {1, 0}; /* Dataset starting dimensions */
      hsize_t max_dims[2] = {1, H5S_UNLIMITED}; /* Dataset maximum dimensions */
  hsize_t chunk_dims[2] = {1, CHUNK_SIZE}; /* Chunk dimensions */
  int i;

  printf("Creating file\n");

   // Open file
  fid = H5Fcreate("packet.h5", H5F_ACC_TRUNC | H5F_ACC_SWMR_WRITE, H5P_DEFAULT, H5P_DEFAULT);
  
  // Create compound data type
  hid_t datatype = H5Tcreate(H5T_COMPOUND, sizeof(data_t));
  H5Tinsert(datatype, "Data", HOFFSET(data_t, data), H5T_NATIVE_DOUBLE);
  H5Tinsert(datatype, "Timestamp", HOFFSET(data_t, timestamp), H5T_NATIVE_LLONG);

      /* Create dataspace for creating datasets */
      if((sid = H5Screate_simple(2, dims, max_dims)) < 0)
          return 1;

   /* Create dataset creation property list */
      if((dcpl = H5Pcreate(H5P_DATASET_CREATE)) < 0)
          return -1;
      if(H5Pset_chunk(dcpl, 2, chunk_dims) < 0)
          return -1;

  printf("Creating %d datasets\n", NUM_DATASETS);
  // Create datasets
  for (i = 0; i < NUM_DATASETS; i++) {
    sprintf(dname,"dset_%d",i);
     if((pdsets[i] = H5Dcreate2(fid, dname, datatype, sid, H5P_DEFAULT, dcpl, H5P_DEFAULT)) < 0)
             return 1;
    if(H5Dclose(pdsets[i]) < 0)
             return -1;
  }

  printf("Closing everything\n");

  if(H5Pclose(dcpl) < 0)
        return -1;
  if(H5Sclose(sid) < 0)
        return -1;
    if(H5Tclose(datatype) < 0)
        return -1;
    if(H5Fclose(fid) < 0)
        return -1;

  printf("After closing...\n");

     return 0;
}
---------------------------------------------------

Thank you.

Rodrigo
----------------------------
Confidencialidad:
Este mensaje y sus ficheros adjuntos se dirige exclusivamente a su destinatario y puede contener informaci�n privilegiada o confidencial. Si no es vd. el destinatario indicado, queda notificado de que la utilizaci�n, divulgaci�n y/o copia sin autorizaci�n est� prohibida en virtud de la legislaci�n vigente. Si ha recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente respondiendo al mensaje y proceda a su destrucci�n.

Disclaimer:
This message and its attached files is intended exclusively for its recipients and may contain confidential information. If you received this e-mail in error you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited and may be unlawful. In this case, please notify us by a reply and delete this email and its contents immediately.
----------------------------

Hi Rodrigo,
  Sounds like the automatic memory manager in HDF5 is not what you want / need. You can force the memory to be freed early by calling H5garbage_collect() (https://support.hdfgroup.org/HDF5/doc/RM/RM_H5.html#Library-GarbageCollect) or can disable the feature in the library you build by passing the —enable-using-memchecker flag to configure when you build the package.

    Quincey

···

On Apr 21, 2017, at 1:29 PM, Castro Rojo, Rodrigo <rodrigo.castro@ciemat.es> wrote:

Hi,

I am using HDF5 storage in SWMR mode. I am open files with huge number of datasets and then I close it but after H5Fclose the amount of memory reserved by the process is the same.

The same effect happens when data is written to a data set. All memory reserved by the process (for chunks, caches, etc) is not freed after file is closed (H5Fclose).

I have checked with valgrind but no memory leak is detected. It seams there is a free of memory before the process finishes but I need this free when file is closed.

Is possible to have memory free after H5Fclose without finishing the process?

A simplified example of my code follows the typical sequence:

———————————————————————————————
#include "hdf5.h"
#include "hdf5_hl.h"
#include <stdlib.h>

// Number of tables to create
#define NUM_DATASETS 10000
#define CHUNK_SIZE 100

typedef struct
{
  double data;
       long long timestamp;
} data_t;

int main(void)
{
  hid_t fid;
  hid_t sid;
  hid_t dcpl;
  hid_t pdsets[NUM_DATASETS];
  char dname[300];
     hsize_t dims[2] = {1, 0}; /* Dataset starting dimensions */
     hsize_t max_dims[2] = {1, H5S_UNLIMITED}; /* Dataset maximum dimensions */
  hsize_t chunk_dims[2] = {1, CHUNK_SIZE}; /* Chunk dimensions */
  int i;

  printf("Creating file\n");

  // Open file
  fid = H5Fcreate("packet.h5", H5F_ACC_TRUNC | H5F_ACC_SWMR_WRITE, H5P_DEFAULT, H5P_DEFAULT);
  
  // Create compound data type
  hid_t datatype = H5Tcreate(H5T_COMPOUND, sizeof(data_t));
  H5Tinsert(datatype, "Data", HOFFSET(data_t, data), H5T_NATIVE_DOUBLE);
  H5Tinsert(datatype, "Timestamp", HOFFSET(data_t, timestamp), H5T_NATIVE_LLONG);

     /* Create dataspace for creating datasets */
     if((sid = H5Screate_simple(2, dims, max_dims)) < 0)
         return 1;

   /* Create dataset creation property list */
     if((dcpl = H5Pcreate(H5P_DATASET_CREATE)) < 0)
         return -1;
     if(H5Pset_chunk(dcpl, 2, chunk_dims) < 0)
         return -1;

  printf("Creating %d datasets\n", NUM_DATASETS);
  // Create datasets
  for (i = 0; i < NUM_DATASETS; i++) {
    sprintf(dname,"dset_%d",i);
    if((pdsets[i] = H5Dcreate2(fid, dname, datatype, sid, H5P_DEFAULT, dcpl, H5P_DEFAULT)) < 0)
            return 1;
    if(H5Dclose(pdsets[i]) < 0)
            return -1;
  }

  printf("Closing everything\n");

  if(H5Pclose(dcpl) < 0)
       return -1;
  if(H5Sclose(sid) < 0)
       return -1;
   if(H5Tclose(datatype) < 0)
       return -1;
   if(H5Fclose(fid) < 0)
       return -1;

  printf("After closing...\n");

    return 0;
}
---------------------------------------------------

Thank you.

Rodrigo
----------------------------
Confidencialidad:
Este mensaje y sus ficheros adjuntos se dirige exclusivamente a su destinatario y puede contener informaci�n privilegiada o confidencial. Si no es vd. el destinatario indicado, queda notificado de que la utilizaci�n, divulgaci�n y/o copia sin autorizaci�n est� prohibida en virtud de la legislaci�n vigente. Si ha recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente respondiendo al mensaje y proceda a su destrucci�n.

Disclaimer:
This message and its attached files is intended exclusively for its recipients and may contain confidential information. If you received this e-mail in error you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited and may be unlawful. In this case, please notify us by a reply and delete this email and its contents immediately.
----------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi Rodrigo,

From what I understand, glibc (I'm using Linux as an example) can either service memory requests using sbrk(2) to increase the process break (smaller allocations) or mmap(2) (larger allocations, over ~128k on most systems but I think this is dynamic now). Anything served from mmap() should, in theory, be immediately reclaimable by the kernel after a call to free() but I believe this is done lazily. You can probably check this by writing to drop_caches (as described here: https://unix.stackexchange.com/questions/17936/setting-proc-sys-vm-drop-caches-to-clear-cache ) after the H5Fclose() call and seeing what happens to your program's memory footprint. I would image that most of what is causing your problem (chunk caches, etc.) will be larger than the memory allocator inflection point and is simply being cached. I've always been under the impression that the OS typically discards those freed pages easily when other processes need the memory so you shouldn't be forced to go to the disk for swap space.

Dana Robinson
Software Engineer
The HDF Group

···

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Castro Rojo, Rodrigo
Sent: Friday, April 21, 2017 1:30 PM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] How to free memory after H5Fclose

Hi,

I am using HDF5 storage in SWMR mode. I am open files with huge number of datasets and then I close it but after H5Fclose the amount of memory reserved by the process is the same.

The same effect happens when data is written to a data set. All memory reserved by the process (for chunks, caches, etc) is not freed after file is closed (H5Fclose).

I have checked with valgrind but no memory leak is detected. It seams there is a free of memory before the process finishes but I need this free when file is closed.

Is possible to have memory free after H5Fclose without finishing the process?

A simplified example of my code follows the typical sequence:

———————————————————————————————
#include "hdf5.h"
#include "hdf5_hl.h"
#include <stdlib.h>

// Number of tables to create
#define NUM_DATASETS 10000
#define CHUNK_SIZE 100

typedef struct
{
  double data;
        long long timestamp;
} data_t;

int main(void)
{
  hid_t fid;
  hid_t sid;
  hid_t dcpl;
   hid_t pdsets[NUM_DATASETS];
  char dname[300];
      hsize_t dims[2] = {1, 0}; /* Dataset starting dimensions */
      hsize_t max_dims[2] = {1, H5S_UNLIMITED}; /* Dataset maximum dimensions */
  hsize_t chunk_dims[2] = {1, CHUNK_SIZE}; /* Chunk dimensions */
  int i;

  printf("Creating file\n");

   // Open file
  fid = H5Fcreate("packet.h5", H5F_ACC_TRUNC | H5F_ACC_SWMR_WRITE, H5P_DEFAULT, H5P_DEFAULT);
  
  // Create compound data type
  hid_t datatype = H5Tcreate(H5T_COMPOUND, sizeof(data_t));
  H5Tinsert(datatype, "Data", HOFFSET(data_t, data), H5T_NATIVE_DOUBLE);
  H5Tinsert(datatype, "Timestamp", HOFFSET(data_t, timestamp), H5T_NATIVE_LLONG);

      /* Create dataspace for creating datasets */
      if((sid = H5Screate_simple(2, dims, max_dims)) < 0)
          return 1;

   /* Create dataset creation property list */
      if((dcpl = H5Pcreate(H5P_DATASET_CREATE)) < 0)
          return -1;
      if(H5Pset_chunk(dcpl, 2, chunk_dims) < 0)
          return -1;

  printf("Creating %d datasets\n", NUM_DATASETS);
  // Create datasets
  for (i = 0; i < NUM_DATASETS; i++) {
    sprintf(dname,"dset_%d",i);
     if((pdsets[i] = H5Dcreate2(fid, dname, datatype, sid, H5P_DEFAULT, dcpl, H5P_DEFAULT)) < 0)
             return 1;
    if(H5Dclose(pdsets[i]) < 0)
             return -1;
  }

  printf("Closing everything\n");

  if(H5Pclose(dcpl) < 0)
        return -1;
  if(H5Sclose(sid) < 0)
        return -1;
    if(H5Tclose(datatype) < 0)
        return -1;
    if(H5Fclose(fid) < 0)
        return -1;

  printf("After closing...\n");

     return 0;
}
---------------------------------------------------

Thank you.

Rodrigo
----------------------------
Confidencialidad:
Este mensaje y sus ficheros adjuntos se dirige exclusivamente a su destinatario y puede contener informaci�n privilegiada o confidencial. Si no es vd. el destinatario indicado, queda notificado de que la utilizaci�n, divulgaci�n y/o copia sin autorizaci�n est� prohibida en virtud de la legislaci�n vigente. Si ha recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente respondiendo al mensaje y proceda a su destrucci�n.

Disclaimer:
This message and its attached files is intended exclusively for its recipients and may contain confidential information. If you received this e-mail in error you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited and may be unlawful. In this case, please notify us by a reply and delete this email and its contents immediately.
----------------------------

Hi Dana,

Thank you very much for your proposal. The type of memory consume that you describe is an effect that we realised in some performance tests we designed for our archiving system. In these tests, we were able to reproduce this memory behaviour of the linux OS with a simple “dd if=/dev/none of=file bs=4k count=1000000”, and as you perfectly describe, the OS tries to get as much amount of memory as is possible until a limit, but if a process requires it then the OS provides it to this process.

In the case that you have commented, I have made some tests and although the linux OS of the machine takes almost all the memory but this memory is not associated to the process that makes the file operations. Even if caches are cleaned, no change in memory consume of the process is produced. Also it is important to remark that the memory usage I need to free produces finally a critical failure in the process.

Regards,
Rodrigo

···

El 24 abr 2017, a las 19:13, Dana Robinson <derobins@hdfgroup.org> escribió:

Hi Rodrigo,

From what I understand, glibc (I'm using Linux as an example) can either service memory requests using sbrk(2) to increase the process break (smaller allocations) or mmap(2) (larger allocations, over ~128k on most systems but I think this is dynamic now). Anything served from mmap() should, in theory, be immediately reclaimable by the kernel after a call to free() but I believe this is done lazily. You can probably check this by writing to drop_caches (as described here: https://unix.stackexchange.com/questions/17936/setting-proc-sys-vm-drop-caches-to-clear-cache ) after the H5Fclose() call and seeing what happens to your program's memory footprint. I would image that most of what is causing your problem (chunk caches, etc.) will be larger than the memory allocator inflection point and is simply being cached. I've always been under the impression that the OS typically discards those freed pages easily when other processes need the memory so you shouldn't be forced to go to the disk for swap space.

Dana Robinson
Software Engineer
The HDF Group

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Castro Rojo, Rodrigo
Sent: Friday, April 21, 2017 1:30 PM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] How to free memory after H5Fclose

Hi,

I am using HDF5 storage in SWMR mode. I am open files with huge number of datasets and then I close it but after H5Fclose the amount of memory reserved by the process is the same.

The same effect happens when data is written to a data set. All memory reserved by the process (for chunks, caches, etc) is not freed after file is closed (H5Fclose).

I have checked with valgrind but no memory leak is detected. It seams there is a free of memory before the process finishes but I need this free when file is closed.

Is possible to have memory free after H5Fclose without finishing the process?

A simplified example of my code follows the typical sequence:

———————————————————————————————
#include "hdf5.h"
#include "hdf5_hl.h"
#include <stdlib.h>

// Number of tables to create
#define NUM_DATASETS 10000
#define CHUNK_SIZE 100

typedef struct
{
  double data;
       long long timestamp;
} data_t;

int main(void)
{
  hid_t fid;
  hid_t sid;
  hid_t dcpl;
  hid_t pdsets[NUM_DATASETS];
  char dname[300];
     hsize_t dims[2] = {1, 0}; /* Dataset starting dimensions */
     hsize_t max_dims[2] = {1, H5S_UNLIMITED}; /* Dataset maximum dimensions */
  hsize_t chunk_dims[2] = {1, CHUNK_SIZE}; /* Chunk dimensions */
  int i;

  printf("Creating file\n");

  // Open file
  fid = H5Fcreate("packet.h5", H5F_ACC_TRUNC | H5F_ACC_SWMR_WRITE, H5P_DEFAULT, H5P_DEFAULT);
  
  // Create compound data type
  hid_t datatype = H5Tcreate(H5T_COMPOUND, sizeof(data_t));
  H5Tinsert(datatype, "Data", HOFFSET(data_t, data), H5T_NATIVE_DOUBLE);
  H5Tinsert(datatype, "Timestamp", HOFFSET(data_t, timestamp), H5T_NATIVE_LLONG);

     /* Create dataspace for creating datasets */
     if((sid = H5Screate_simple(2, dims, max_dims)) < 0)
         return 1;

   /* Create dataset creation property list */
     if((dcpl = H5Pcreate(H5P_DATASET_CREATE)) < 0)
         return -1;
     if(H5Pset_chunk(dcpl, 2, chunk_dims) < 0)
         return -1;

  printf("Creating %d datasets\n", NUM_DATASETS);
  // Create datasets
  for (i = 0; i < NUM_DATASETS; i++) {
    sprintf(dname,"dset_%d",i);
    if((pdsets[i] = H5Dcreate2(fid, dname, datatype, sid, H5P_DEFAULT, dcpl, H5P_DEFAULT)) < 0)
            return 1;
    if(H5Dclose(pdsets[i]) < 0)
            return -1;
  }

  printf("Closing everything\n");

  if(H5Pclose(dcpl) < 0)
       return -1;
  if(H5Sclose(sid) < 0)
       return -1;
   if(H5Tclose(datatype) < 0)
       return -1;
   if(H5Fclose(fid) < 0)
       return -1;

  printf("After closing...\n");

    return 0;
}
---------------------------------------------------

Thank you.

Rodrigo
----------------------------
Confidencialidad:
Este mensaje y sus ficheros adjuntos se dirige exclusivamente a su destinatario y puede contener informaci�n privilegiada o confidencial. Si no es vd. el destinatario indicado, queda notificado de que la utilizaci�n, divulgaci�n y/o copia sin autorizaci�n est� prohibida en virtud de la legislaci�n vigente. Si ha recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente respondiendo al mensaje y proceda a su destrucci�n.

Disclaimer:
This message and its attached files is intended exclusively for its recipients and may contain confidential information. If you received this e-mail in error you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited and may be unlawful. In this case, please notify us by a reply and delete this email and its contents immediately.
----------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi Quincey,

I have run several tests based on the solutions you propose and I couldn’t free the memory. Anyway let me share some interesting results. I am using 10.0.1-pre2.

1- Use of H5garbage_collect() after H5Fclose has no effect in memory usage of the file.

2- Build version of the library with "—enable-using-memchecker”, has no effect in memory consume in my example.

3- H5Pset_evict_on_evict has no effect

4- If every dataset is closed just after creating, the use of memory is much less (27MB) that if all datasets are created and then all of them are closed (600MB).

5- After writing data in all datasets, H5Fflush(fid,H5F_SCOPE_GLOBAL) has very different times depending on open close dataset strategy. For 40K datasets and 300 records to write per dataset:
a) Open and close every dataset when data is written: 7 seconds (including flush time) that is a latency very similar that I got in previous release.
   b) Keep all datasets open and Write in all datasets and then flush with “H5Fflush(fid,H5F_SCOPE_GLOBAL)”: 95 seconds.

I hope this information could help.

Thank you very much.

Regards,
Rodrigo

···

El 24 abr 2017, a las 20:11, Quincey Koziol <koziol@lbl.gov<mailto:koziol@lbl.gov>> escribió:

Hi Rodrigo,
Sounds like the automatic memory manager in HDF5 is not what you want / need. You can force the memory to be freed early by calling H5garbage_collect() (https://support.hdfgroup.org/HDF5/doc/RM/RM_H5.html#Library-GarbageCollect) or can disable the feature in the library you build by passing the—enable-using-memchecker flag to configure when you build the package.

Quincey

On Apr 21, 2017, at 1:29 PM, Castro Rojo, Rodrigo <rodrigo.castro@ciemat.es<mailto:rodrigo.castro@ciemat.es>> wrote:

Hi,

I am using HDF5 storage in SWMR mode. I am open files with huge number of datasets and then I close it but after H5Fclose the amount of memory reserved by the process is the same.

The same effect happens when data is written to a data set. All memory reserved by the process (for chunks, caches, etc) is not freed after file is closed (H5Fclose).

I have checked with valgrind but no memory leak is detected. It seams there is a free of memory before the process finishes but I need this free when file is closed.

Is possible to have memory free after H5Fclose without finishing the process?

A simplified example of my code follows the typical sequence:

———————————————————————————————
#include "hdf5.h"
#include "hdf5_hl.h"
#include <stdlib.h>

// Number of tables to create
#define NUM_DATASETS 10000
#define CHUNK_SIZE 100

typedef struct
{
double data;
      long long timestamp;
} data_t;

int main(void)
{
hid_t fid;
hid_t sid;
hid_t dcpl;
hid_t pdsets[NUM_DATASETS];
char dname[300];
    hsize_t dims[2] = {1, 0}; /* Dataset starting dimensions */
    hsize_t max_dims[2] = {1, H5S_UNLIMITED}; /* Dataset maximum dimensions */
hsize_t chunk_dims[2] = {1, CHUNK_SIZE}; /* Chunk dimensions */
int i;

printf("Creating file\n");

// Open file
fid = H5Fcreate("packet.h5", H5F_ACC_TRUNC | H5F_ACC_SWMR_WRITE, H5P_DEFAULT, H5P_DEFAULT);

// Create compound data type
hid_t datatype = H5Tcreate(H5T_COMPOUND, sizeof(data_t));
H5Tinsert(datatype, "Data", HOFFSET(data_t, data), H5T_NATIVE_DOUBLE);
H5Tinsert(datatype, "Timestamp", HOFFSET(data_t, timestamp), H5T_NATIVE_LLONG);

    /* Create dataspace for creating datasets */
    if((sid = H5Screate_simple(2, dims, max_dims)) < 0)
        return 1;

/* Create dataset creation property list */
    if((dcpl = H5Pcreate(H5P_DATASET_CREATE)) < 0)
        return -1;
    if(H5Pset_chunk(dcpl, 2, chunk_dims) < 0)
        return -1;

printf("Creating %d datasets\n", NUM_DATASETS);
// Create datasets
for (i = 0; i < NUM_DATASETS; i++) {
sprintf(dname,"dset_%d",i);
if((pdsets[i] = H5Dcreate2(fid, dname, datatype, sid, H5P_DEFAULT, dcpl, H5P_DEFAULT)) < 0)
            return 1;
if(H5Dclose(pdsets[i]) < 0)
            return -1;
}

printf("Closing everything\n");

if(H5Pclose(dcpl) < 0)
       return -1;
if(H5Sclose(sid) < 0)
       return -1;
   if(H5Tclose(datatype) < 0)
       return -1;
   if(H5Fclose(fid) < 0)
       return -1;

printf("After closing...\n");

   return 0;
}
---------------------------------------------------

Thank you.

Rodrigo
----------------------------
Confidencialidad:
Este mensaje y sus ficheros adjuntos se dirige exclusivamente a su destinatario y puede contener informaci�n privilegiada o confidencial. Si no es vd. el destinatario indicado, queda notificado de que la utilizaci�n, divulgaci�n y/o copia sin autorizaci�n est� prohibida en virtud de la legislaci�n vigente. Si ha recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente respondiendo al mensaje y proceda a su destrucci�n.

Disclaimer:
This message and its attached files is intended exclusively for its recipients and may contain confidential information. If you received this e-mail in error you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited and may be unlawful. In this case, please notify us by a reply and delete this email and its contents immediately.
----------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi Rodrigo,

See some comments embedded below...

"Hdf-forum on behalf of Castro Rojo, Rodrigo" wrote:

Hi Quincey,

I have run several tests based on the solutions you propose and I couldn’t free the memory. Anyway let me share some interesting results. I am using 10.0.1-pre2.

Are you doing chunked, compact or contiguous datasets. If chunked, a serious issue with that is that each chunked dataset is managed by a "chunk cache" and you might wanna have a look at H5Pset_chunk_cache to affect behavior of that.

1- Use of H5garbage_collect() after H5Fclose has no effect in memory usage of the file.

this will help *only* for closed objects. Garbage collect will have no effect on objects left open. Hence, one of your use cases of keeping all datasets open will not benefit from a call to H5garbage_collect().

2- Build version of the library with "—enable-using-memchecker”, has no effect in memory consume in my example.

3- H5Pset_evict_on_evict has no effect

Again, this should/will work *only* for closed objects.

4- If every dataset is closed just after creating, the use of memory is much less (27MB) that if all datasets are created and then all of them are closed (600MB).

This makes sense, each HDF5 object is managed by metadata. And, depending on the layout of datasets, even raw data for those objects may be managed in memory (e.g chunk cache).

5- After writing data in all datasets, H5Fflush(fid,H5F_SCOPE_GLOBAL) has very different times depending on open close dataset strategy. For 40K datasets and 300 records to write per dataset:
a) Open and close every dataset when data is written: 7 seconds (including flush time) that is a latency very similar that I got in previous release.
   b) Keep all datasets open and Write in all datasets and then flush with “H5Fflush(fid,H5F_SCOPE_GLOBAL)”: 95 seconds.

I am not surprised by these results. I think best time and space performance of HDF5 is likely when you close objects as soon as practical.

As an aside, I wonder how things would be if you attempted some adjustments to the metadata cache algorithm via a call to H5Pset_mdc_config()...

https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetMdcConfig

set_initial_size=1
initial_size=16384
min_size=8192
epoch_length=3000
lower_hr_threshold=1e-5

possibly adjust upwards initial_size and min_size to something that represents 2-5x whatever size '300 records is'. Again, I suspect this will help *only* for the open and close case.

If you try it, would be interested to know what you find.

I hope this information could help.

Thank you very much.

Regards,
Rodrigo

···

El 24 abr 2017, a las 20:11, Quincey Koziol <koziol@lbl.gov<mailto:koziol@lbl.gov>> escribió:

Hi Rodrigo,
Sounds like the automatic memory manager in HDF5 is not what you want / need. You can force the memory to be freed early by calling H5garbage_collect() (https://support.hdfgroup.org/HDF5/doc/RM/RM_H5.html#Library-GarbageCollect) or can disable the feature in the library you build by passing the—enable-using-memchecker flag to configure when you build the package.

Quincey

On Apr 21, 2017, at 1:29 PM, Castro Rojo, Rodrigo <rodrigo.castro@ciemat.es<mailto:rodrigo.castro@ciemat.es>> wrote:

Hi,

I am using HDF5 storage in SWMR mode. I am open files with huge number of datasets and then I close it but after H5Fclose the amount of memory reserved by the process is the same.

The same effect happens when data is written to a data set. All memory reserved by the process (for chunks, caches, etc) is not freed after file is closed (H5Fclose).

I have checked with valgrind but no memory leak is detected. It seams there is a free of memory before the process finishes but I need this free when file is closed.

Is possible to have memory free after H5Fclose without finishing the process?

A simplified example of my code follows the typical sequence:

———————————————————————————————
#include "hdf5.h"
#include "hdf5_hl.h"
#include <stdlib.h>

// Number of tables to create
#define NUM_DATASETS 10000
#define CHUNK_SIZE 100

typedef struct
{
double data;
      long long timestamp;
} data_t;

int main(void)
{
hid_t fid;
hid_t sid;
hid_t dcpl;
hid_t pdsets[NUM_DATASETS];
char dname[300];
    hsize_t dims[2] = {1, 0}; /* Dataset starting dimensions */
    hsize_t max_dims[2] = {1, H5S_UNLIMITED}; /* Dataset maximum dimensions */
hsize_t chunk_dims[2] = {1, CHUNK_SIZE}; /* Chunk dimensions */
int i;

printf("Creating file\n");

// Open file
fid = H5Fcreate("packet.h5", H5F_ACC_TRUNC | H5F_ACC_SWMR_WRITE, H5P_DEFAULT, H5P_DEFAULT);

// Create compound data type
hid_t datatype = H5Tcreate(H5T_COMPOUND, sizeof(data_t));
H5Tinsert(datatype, "Data", HOFFSET(data_t, data), H5T_NATIVE_DOUBLE);
H5Tinsert(datatype, "Timestamp", HOFFSET(data_t, timestamp), H5T_NATIVE_LLONG);

    /* Create dataspace for creating datasets */
    if((sid = H5Screate_simple(2, dims, max_dims)) < 0)
        return 1;

/* Create dataset creation property list */
    if((dcpl = H5Pcreate(H5P_DATASET_CREATE)) < 0)
        return -1;
    if(H5Pset_chunk(dcpl, 2, chunk_dims) < 0)
        return -1;

printf("Creating %d datasets\n", NUM_DATASETS);
// Create datasets
for (i = 0; i < NUM_DATASETS; i++) {
sprintf(dname,"dset_%d",i);
if((pdsets[i] = H5Dcreate2(fid, dname, datatype, sid, H5P_DEFAULT, dcpl, H5P_DEFAULT)) < 0)
            return 1;
if(H5Dclose(pdsets[i]) < 0)
            return -1;
}

printf("Closing everything\n");

if(H5Pclose(dcpl) < 0)
       return -1;
if(H5Sclose(sid) < 0)
       return -1;
   if(H5Tclose(datatype) < 0)
       return -1;
   if(H5Fclose(fid) < 0)
       return -1;

printf("After closing...\n");

   return 0;
}
---------------------------------------------------

Thank you.

Rodrigo
----------------------------
Confidencialidad:
Este mensaje y sus ficheros adjuntos se dirige exclusivamente a su destinatario y puede contener informaci�n privilegiada o confidencial. Si no es vd. el destinatario indicado, queda notificado de que la utilizaci�n, divulgaci�n y/o copia sin autorizaci�n est� prohibida en virtud de la legislaci�n vigente. Si ha recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente respondiendo al mensaje y proceda a su destrucci�n.

Disclaimer:
This message and its attached files is intended exclusively for its recipients and may contain confidential information. If you received this e-mail in error you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited and may be unlawful. In this case, please notify us by a reply and delete this email and its contents immediately.
----------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi Mark,

Please, see my comments below:

···

El 25 abr 2017, a las 5:02, Miller, Mark C. <miller86@llnl.gov<mailto:miller86@llnl.gov>> escribió:

Hi Rodrigo,

See some comments embedded below...

"Hdf-forum on behalf of Castro Rojo, Rodrigo" wrote:

Hi Quincey,

I have run several tests based on the solutions you propose and I couldn’t free the memory. Anyway let me share some interesting results. I am using 10.0.1-pre2.

Are you doing chunked, compact or contiguous datasets. If chunked, a serious issue with that is that each chunked dataset is managed by a "chunk cache" and you might wanna have a look at H5Pset_chunk_cache to affect behavior of that.

We have tested a lot of combination of this cache and at the beginning memory effect is better at the beginning but performance and memory problems persist in the case of:
- “opening all datasets”
- N times (
* “write records” in all datasets
* “flush”
- “close all datasets”

I agree that not keeping open so many datasets will help.

1- Use of H5garbage_collect() after H5Fclose has no effect in memory usage of the file.

this will help *only* for closed objects. Garbage collect will have no effect on objects left open. Hence, one of your use cases of keeping all datasets open will not benefit from a call to H5garbage_collect().

I am testing H5garbage_collect() after everything is closed (including file) and no way. We have tried to use this primitive many times but no way.

2- Build version of the library with "—enable-using-memchecker”, has no effect in memory consume in my example.

3- H5Pset_evict_on_evict has no effect

Again, this should/will work *only* for closed objects.

Yes. But also no it doesn’t work. Memory is mot freed.

4- If every dataset is closed just after creating, the use of memory is much less (27MB) that if all datasets are created and then all of them are closed (600MB).

This makes sense, each HDF5 object is managed by metadata. And, depending on the layout of datasets, even raw data for those objects may be managed in memory (e.g chunk cache).

Perfect

5- After writing data in all datasets, H5Fflush(fid,H5F_SCOPE_GLOBAL) has very different times depending on open close dataset strategy. For 40K datasets and 300 records to write per dataset:
a) Open and close every dataset when data is written: 7 seconds (including flush time) that is a latency very similar that I got in previous release.
   b) Keep all datasets open and Write in all datasets and then flush with “H5Fflush(fid,H5F_SCOPE_GLOBAL)”: 95 seconds.

I am not surprised by these results. I think best time and space performance of HDF5 is likely when you close objects as soon as practical.

Ok. Then we are going to keep this line. Anyway, the issue of freeing memory after closing everything (including file) it is very important for us, so if you have any clue don’t doubt tell us to try.

Just comment that this huge latency in case b) it was not present in "hdf5-10.0.0-alpha1”.

As an aside, I wonder how things would be if you attempted some adjustments to the metadata cache algorithm via a call to H5Pset_mdc_config()...

https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetMdcConfig

set_initial_size=1
initial_size=16384
min_size=8192
epoch_length=3000
lower_hr_threshold=1e-5

possibly adjust upwards initial_size and min_size to something that represents 2-5x whatever size '300 records is'. Again, I suspect this will help *only* for the open and close case.

Yes. I agree with your bet.

Thank you very much for your quick answer and support.

I have 4 tests files to cover this case with very simple programs. What would be the best way to share them with the forum?

Regards,
Rodrigo

El 24 abr 2017, a las 20:11, Quincey Koziol <koziol@lbl.gov<mailto:koziol@lbl.gov>> escribió:

Hi Rodrigo,
Sounds like the automatic memory manager in HDF5 is not what you want / need. You can force the memory to be freed early by calling H5garbage_collect() (https://support.hdfgroup.org/HDF5/doc/RM/RM_H5.html#Library-GarbageCollect) or can disable the feature in the library you build by passing the—enable-using-memchecker flag to configure when you build the package.

Quincey

On Apr 21, 2017, at 1:29 PM, Castro Rojo, Rodrigo <rodrigo.castro@ciemat.es<mailto:rodrigo.castro@ciemat.es>> wrote:

Hi,

I am using HDF5 storage in SWMR mode. I am open files with huge number of datasets and then I close it but after H5Fclose the amount of memory reserved by the process is the same.

The same effect happens when data is written to a data set. All memory reserved by the process (for chunks, caches, etc) is not freed after file is closed (H5Fclose).

I have checked with valgrind but no memory leak is detected. It seams there is a free of memory before the process finishes but I need this free when file is closed.

Is possible to have memory free after H5Fclose without finishing the process?

A simplified example of my code follows the typical sequence:

———————————————————————————————
#include "hdf5.h"
#include "hdf5_hl.h"
#include <stdlib.h>

// Number of tables to create
#define NUM_DATASETS 10000
#define CHUNK_SIZE 100

typedef struct
{
double data;
      long long timestamp;
} data_t;

int main(void)
{
hid_t fid;
hid_t sid;
hid_t dcpl;
hid_t pdsets[NUM_DATASETS];
char dname[300];
    hsize_t dims[2] = {1, 0}; /* Dataset starting dimensions */
    hsize_t max_dims[2] = {1, H5S_UNLIMITED}; /* Dataset maximum dimensions */
hsize_t chunk_dims[2] = {1, CHUNK_SIZE}; /* Chunk dimensions */
int i;

printf("Creating file\n");

// Open file
fid = H5Fcreate("packet.h5", H5F_ACC_TRUNC | H5F_ACC_SWMR_WRITE, H5P_DEFAULT, H5P_DEFAULT);

// Create compound data type
hid_t datatype = H5Tcreate(H5T_COMPOUND, sizeof(data_t));
H5Tinsert(datatype, "Data", HOFFSET(data_t, data), H5T_NATIVE_DOUBLE);
H5Tinsert(datatype, "Timestamp", HOFFSET(data_t, timestamp), H5T_NATIVE_LLONG);

    /* Create dataspace for creating datasets */
    if((sid = H5Screate_simple(2, dims, max_dims)) < 0)
        return 1;

/* Create dataset creation property list */
    if((dcpl = H5Pcreate(H5P_DATASET_CREATE)) < 0)
        return -1;
    if(H5Pset_chunk(dcpl, 2, chunk_dims) < 0)
        return -1;

printf("Creating %d datasets\n", NUM_DATASETS);
// Create datasets
for (i = 0; i < NUM_DATASETS; i++) {
sprintf(dname,"dset_%d",i);
if((pdsets[i] = H5Dcreate2(fid, dname, datatype, sid, H5P_DEFAULT, dcpl, H5P_DEFAULT)) < 0)
            return 1;
if(H5Dclose(pdsets[i]) < 0)
            return -1;
}

printf("Closing everything\n");

if(H5Pclose(dcpl) < 0)
       return -1;
if(H5Sclose(sid) < 0)
       return -1;
   if(H5Tclose(datatype) < 0)
       return -1;
   if(H5Fclose(fid) < 0)
       return -1;

printf("After closing...\n");

   return 0;
}
---------------------------------------------------

Thank you.

Rodrigo
----------------------------
Confidencialidad:
Este mensaje y sus ficheros adjuntos se dirige exclusivamente a su destinatario y puede contener informaci�n privilegiada o confidencial. Si no es vd. el destinatario indicado, queda notificado de que la utilizaci�n, divulgaci�n y/o copia sin autorizaci�n est� prohibida en virtud de la legislaci�n vigente. Si ha recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente respondiendo al mensaje y proceda a su destrucci�n.

Disclaimer:
This message and its attached files is intended exclusively for its recipients and may contain confidential information. If you received this e-mail in error you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited and may be unlawful. In this case, please notify us by a reply and delete this email and its contents immediately.
----------------------------

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

"Hdf-forum on behalf of Castro Rojo, Rodrigo" wrote:

this will help *only* for closed objects. Garbage collect will have no effect on objects left open. Hence, one of your use cases of keeping all datasets open will not benefit from a call to H5garbage_collect().

I am testing H5garbage_collect() after everything is closed (including file) and no way. We have tried to use this primitive many times but no way.

Ok, well, closing the file implies the file has also been "garbage collected". So, I would not expect H5garbage_collect() *after* H5Fclose to have much effect. It might have some, maybe. But, H5garbabe_collect() is, IMHO, the *best* you can do in terms of forcing HDF5 to free up memory just short of actually closing the file. Also, be aware that H5Fclose will NOT NECESSARILY actually close your file. If you have any objects in the file left open, H5Fclose will silently *ignore* your request to close the file. You need to have opened the file with H5P_CLOSE_SEMI property...see https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetFcloseDegree

2- Build version of the library with "—enable-using-memchecker”, has no effect in memory consume in my example.

3- H5Pset_evict_on_evict has no effect

Again, this should/will work *only* for closed objects.

Yes. But also no it doesn’t work. Memory is mot freed.

What tool(s) are you using to measure available memory? valgrind / massif?

5- After writing data in all datasets, H5Fflush(fid,H5F_SCOPE_GLOBAL) has very different times depending on open close dataset strategy. For 40K datasets and 300 records to write per dataset:
a) Open and close every dataset when data is written: 7 seconds (including flush time) that is a latency very similar that I got in previous release.
   b) Keep all datasets open and Write in all datasets and then flush with “H5Fflush(fid,H5F_SCOPE_GLOBAL)”: 95 seconds.

I am not surprised by these results. I think best time and space performance of HDF5 is likely when you close objects as soon as practical.

Ok. Then we are going to keep this line. Anyway, the issue of freeing memory after closing everything (including file) it is very important for us, so if you have any clue don’t doubt tell us to try.

Just comment that this huge latency in case b) it was not present in "hdf5-10.0.0-alpha1”.

I know there was a lot of work done on the metadata cache recently and I wonder if that work could have lead to this performance regression?

As an aside, I wonder how things would be if you attempted some adjustments to the metadata cache algorithm via a call to H5Pset_mdc_config()...

https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetMdcConfig

set_initial_size=1
initial_size=16384
min_size=8192
epoch_length=3000
lower_hr_threshold=1e-5

possibly adjust upwards initial_size and min_size to something that represents 2-5x whatever size '300 records is'. Again, I suspect this will help *only* for the open and close case.

Yes. I agree with your bet.

Thank you very much for your quick answer and support.

I have 4 tests files to cover this case with very simple programs. What would be the best way to share them with the forum?

I think it would be great if it would be possible to contribute these test codes back to The HDF Group for them to included in the regular performance testing.

I think it would be great if it would be possible to contribute these test codes back to The HDF Group for them to included in the regular performance testing.

Ditto. We would be more than happy to accept.

Elena

···

On Apr 26, 2017, at 4:32 PM, Miller, Mark C. <miller86@llnl.gov<mailto:miller86@llnl.gov>> wrote:

"Hdf-forum on behalf of Castro Rojo, Rodrigo" wrote:

this will help *only* for closed objects. Garbage collect will have no effect on objects left open. Hence, one of your use cases of keeping all datasets open will not benefit from a call to H5garbage_collect().

I am testing H5garbage_collect() after everything is closed (including file) and no way. We have tried to use this primitive many times but no way.

Ok, well, closing the file implies the file has also been "garbage collected". So, I would not expect H5garbage_collect() *after* H5Fclose to have much effect. It might have some, maybe. But, H5garbabe_collect() is, IMHO, the *best* you can do in terms of forcing HDF5 to free up memory just short of actually closing the file. Also, be aware that H5Fclose will NOT NECESSARILY actually close your file. If you have any objects in the file left open, H5Fclose will silently *ignore* your request to close the file. You need to have opened the file with H5P_CLOSE_SEMI property...see https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetFcloseDegree

2- Build version of the library with "—enable-using-memchecker”, has no effect in memory consume in my example.

3- H5Pset_evict_on_evict has no effect

Again, this should/will work *only* for closed objects.

Yes. But also no it doesn’t work. Memory is mot freed.

What tool(s) are you using to measure available memory? valgrind / massif?

5- After writing data in all datasets, H5Fflush(fid,H5F_SCOPE_GLOBAL) has very different times depending on open close dataset strategy. For 40K datasets and 300 records to write per dataset:
a) Open and close every dataset when data is written: 7 seconds (including flush time) that is a latency very similar that I got in previous release.
   b) Keep all datasets open and Write in all datasets and then flush with “H5Fflush(fid,H5F_SCOPE_GLOBAL)”: 95 seconds.

I am not surprised by these results. I think best time and space performance of HDF5 is likely when you close objects as soon as practical.

Ok. Then we are going to keep this line. Anyway, the issue of freeing memory after closing everything (including file) it is very important for us, so if you have any clue don’t doubt tell us to try.

Just comment that this huge latency in case b) it was not present in "hdf5-10.0.0-alpha1”.

I know there was a lot of work done on the metadata cache recently and I wonder if that work could have lead to this performance regression?

As an aside, I wonder how things would be if you attempted some adjustments to the metadata cache algorithm via a call to H5Pset_mdc_config()...

https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetMdcConfig

set_initial_size=1
initial_size=16384
min_size=8192
epoch_length=3000
lower_hr_threshold=1e-5

possibly adjust upwards initial_size and min_size to something that represents 2-5x whatever size '300 records is'. Again, I suspect this will help *only* for the open and close case.

Yes. I agree with your bet.

Thank you very much for your quick answer and support.

I have 4 tests files to cover this case with very simple programs. What would be the best way to share them with the forum?

I think it would be great if it would be possible to contribute these test codes back to The HDF Group for them to included in the regular performance testing.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5