memory leak in HDF Java


Our company has been working with the HDF Java (both the object and jni
layers) for some time now. We've been quiet please with how HDF has been
performing for us. However, recently we've been writing and reading much
higher volumes of data and we've noticed what seems to be a memory leak in
the either the JNI layer or below. We have altered (tests that comes with the download) to demonstrate
the problem.

First some details, we primarily use Windows XP Pro as our development
environment, but we have also seen the memory leak in Ubuntu Linux. We have
seen the problem in hdf-java-2.5p2 and hdf-java-2.5p3.
Let's take a look at the first. In this test we didn't
alter much of the program. However, we did notice a problem with the code.
We had an ArrayIndexOutOfBoundsException with the following code:

final Dataset[] dsets = new Dataset[10];
dsets[10] = file.createScalarDS (NAME_DATASET_STR_VLEN, null, typeStrVlen,
DIMs, null, CHUNKs, 9, DATA_STR);

We commented out the code related to the str vlen to get the test to work.
In addition, we added a simple println to get the jvm heap memory size:

long count = 0;
    while (true) {
      if (count % 100 == 0) {
        System.out.println("\tloop count: " + count);
            + "usedMemory: " + ((Runtime.getRuntime().totalMemory() -
Runtime.getRuntime().freeMemory()) / MB)
            + ", totalMemory: " + (Runtime.getRuntime().totalMemory() / MB)
            + ", freeMemory: " + (Runtime.getRuntime().freeMemory() / MB)
            + ", maxMemory: " + (Runtime.getRuntime().maxMemory() / MB));

We believe that the TestH5MemoryLeak test works as written, in that
"Possible memory leak. Some objects are still open." is never printed.
However, we noticed that the Windows process Memory Usage keeps on growing.
After hours of running the memory grows to several hundred MB, even when the
java memory looks something like

usedMemory: 2.793609619140625, totalMemory: 31.75, freeMemory:
28.956390380859375, maxMemory: 493.0625

So, while it seems that the HDF5Constants.H5F_OBJ_LOCAL check is okay, there
seems to be a memory leak somewhere at a lower level. Eventually the process
will grow so big that the test will crash.

Here are our questions:
Can you reproduce this problem?
Is this the intended behavior?
Are we doing something incorrect when running the program?
Our test case is much more complicated. We changed the
code to better match how we are using the library. (see the bottom of the
email for the source code). In our test case, our goal was to write as much
data to a single dataset as possible. When running the program we see that
the (1) jvm heap memory is relatively low and constant, (2) the windows
process gradually grows and (3) the program eventually crashes (we didn't
get a crash on linux, but the memory leak was definitely visible) when the
process grows to some point over 1GB (note it takes a while for this to
happen, over an hour of running depending on your machine). Here is some
example output of our program:


time: 80000
        currentDims: 800000000, ensuring dataset size: 800010000, startDims:
        usedMemory: 5.879608154296875, totalMemory: 204.6875, freeMemory:
198.80789184570312, maxMemory: 493.0625
        2 objects still open:
                id: 16777217, /, H5I_FILE
                id: 33554433, /group, H5I_GROUP
time: 90000
        currentDims: 900000000, ensuring dataset size: 900010000, startDims:
        usedMemory: 1.3919525146484375, totalMemory: 211.375, freeMemory:
209.98304748535156, maxMemory: 493.0625
        2 objects still open:
                id: 16777217, /, H5I_FILE
                id: 33554433, /group, H5I_GROUP

The speed and jvm heap memory usage of the program look fine to us.
Everything seems okay except (1) for some reason the group doesn't close and
(2) the windows process keeps growing and growing.

Here is the crash:

time: 3310000
        1.0610079E7 values a second
        currentDims: 33100000000, ensuring dataset size: 33100010000,
startDims: 33100000000
        usedMemory: 6.0684051513671875, totalMemory: 199.25, freeMemory:
193.1815948486328, maxMemory: 493.0625
        2 objects still open:
                id: 16777217, null, H5I_FILE
                id: 33554433, null, H5I_GROUP
time: 3320000
        8704735.0 values a second
        currentDims: 33200000000, ensuring dataset size: 33200010000,
startDims: 33200000000
        usedMemory: 6.9367828369140625, totalMemory: 199.25, freeMemory:
192.31321716308594, maxMemory: 493.0625
        2 objects still open:
                id: 16777217, null, H5I_FILE
                id: 33554433, null, H5I_GROUP
ncsa.hdf.hdf5lib.exceptions.HDF5ResourceUnavailableException: No space
available for allocation
HDF5-DIAG: Error detected in HDF5 (1.8.2) thread 0:
  #000: ..\..\..\src\H5D.c line 386 in H5Dclose(): can't free
    major: Dataset
    minor: Unable to initialize object
  #001: ..\..\..\src\H5Dint.c line 1552 in H5D_close(): unable to flush
cached dataset info
    major: Dataset
    minor: Write failed
  #002: ..\..\..\src\H5Dint.c line 2443 in H5D_flush_real(): unable to flush
raw data cache
    major: Object cache
    minor: Unable to flush data from cache
  #003: ..\..\..\src\H5Dchunk.c line 2728 in H5D_chunk_flush(): unable to
flush one or more raw data chunks
    major: Low-level I/O
    minor: Unable to flush data from cache
  #004: ..\..\..\src\H5Dchunk.c line 2060 in H5D_chunk_flush_entry(): memory
allocation failed for pipeline
    major: Resource unavailable
    minor: No space available for allocation
ncsa.hdf.hdf5lib.exceptions.HDF5ResourceUnavailableException: No space
available for allocation
        at ncsa.hdf.hdf5lib.H5.H5Dwrite_long(Native Method)
        at ncsa.hdf.hdf5lib.H5.H5Dwrite(
        at ncsa.hdf.hdf5lib.H5.H5Dwrite(
        at ncsa.hdf.object.h5.H5ScalarDS.write(
        at h5.TestHDF5Write.main(

At some point we tried a couple things. Closing the H5File and using things
like H5.H5garbage_collect(), however those did not appear to work.

Here are our questions:
Can you reproduce this problem?
Is this the intended behavior?
Are we using the API incorrectly? Is there some method that we should be
calling or not be calling?
We couldn't figure out why the group never closes. Anyone have any idea why?
I don't think that the group being left open contributes to the memory leak
much, because when we take out the group we still see the leak.

Any input or even duplication of this problem would be greatly appreciated.

On a side note, we've seen a big performance difference in Windows versus
Linux. For example, Windows read performance seems to be very susceptible to
file size causing a large variation in read speeds; writes seem to be fine.
Using the same tests, linux seems to perform well and stay consistent during
reads. Are others seeing this as well?


Aaron Kagawa
Engineering Supervisor
Referentia Systems Incorporated

package h5;

import java.util.Arrays;

import ncsa.hdf.hdf5lib.H5;
import ncsa.hdf.hdf5lib.HDF5Constants;
import ncsa.hdf.object.Dataset;
import ncsa.hdf.object.Datatype;
import ncsa.hdf.object.FileFormat;
import ncsa.hdf.object.Group;
import ncsa.hdf.object.h5.H5File;

* Implements a simple test writes to a dataset in a loop. This test is
meant to test the memory
* used in a windows process. it seems that the windows process continues to
grow, while the
* java heap space stays constant.
public class TestHDF5Write {

  private static final int INSERT_SIZE = 10000;
  private static final long NUMBER_OF_LOOPS = 100000000000;
  private static final int PRINTLN_INTERVAL = 10000;
  private static final double MB = 1024.0 * 1024.0;
  public static void main(String[] args) {
    long numberOfLoops = NUMBER_OF_LOOPS;
    int printlnInterval = PRINTLN_INTERVAL;
    if (args.length == 1) {
      numberOfLoops = Long.parseLong(args[0]);
    if (args.length == 2) {
      printlnInterval = Integer.parseInt(args[0]);
    System.out.println("INSERT_SIZE: " + INSERT_SIZE);
    System.out.println("TIMES: " + numberOfLoops);
    try {
      // create a new file
      File javaFile = new File("TestHDF5Write-" + System.currentTimeMillis()
+ ".h5");
      FileFormat fileFormat =
      H5File h5File = (H5File)
      int fapl = H5.H5Pcreate(HDF5Constants.H5P_FILE_ACCESS);
      H5.H5Pset_fclose_degree(fapl, HDF5Constants.H5F_CLOSE_STRONG);
      int fid =;
      // create group (there is no good reason for us to have a group here)
      Group group = h5File.createGroup("/group", null);
      int gid =;

      // create data set
      long[] initialSize = new long[] { 1 };
      long[] maxSize = new long[] { Long.MAX_VALUE };
      long[] chunkSize = new long[] { 3000 };
      int gzipCompressionLevel = 2;
      Datatype datatype = h5File.createDatatype(Datatype.CLASS_INTEGER, 8,
Datatype.NATIVE, Datatype.SIGN_NONE);
      Dataset dataset = h5File.createScalarDS("/Dataset1", group, datatype,
initialSize, maxSize, chunkSize,
          gzipCompressionLevel, null);

      for (long loopIndex = 0; loopIndex < numberOfLoops; loopIndex++) {
        long currentDims = dataset.getDims()[0];

        // extend the dataset
        int did =;
        H5.H5Dextend(did, new long[] { (loopIndex +1) * (long) INSERT_SIZE
dataset.getDims(), null);
        // make the data to add
        long[] newDataArray = new long[INSERT_SIZE];
        Arrays.fill(newDataArray, System.currentTimeMillis());
        // set where to add the data
        dataset.getStartDims()[0] = loopIndex * INSERT_SIZE;
        dataset.getSelectedDims()[0] = INSERT_SIZE;
        dataset.getStride()[0] = 1;

        if (loopIndex % printlnInterval == 0) {
          System.out.println("time: " + loopIndex);
          System.out.println("\tcurrentDims: " + currentDims
              + ", ensuring dataset size: " + ((loopIndex +1) * INSERT_SIZE)

              + ", startDims: " + (loopIndex * INSERT_SIZE));
              + "usedMemory: " + ((Runtime.getRuntime().totalMemory() -
Runtime.getRuntime().freeMemory()) / MB)
              + ", totalMemory: " + (Runtime.getRuntime().totalMemory() /
              + ", freeMemory: " + (Runtime.getRuntime().freeMemory() / MB)
              + ", maxMemory: " + (Runtime.getRuntime().maxMemory() / MB));

        // write the data
    catch (Exception e) {
  /** print the open hdf5 objects associated with the hdf5 file */
  public static void printOpenHDF5Objects(int fid) {
    try {
      int count;
      count = H5.H5Fget_obj_count(fid, HDF5Constants.H5F_OBJ_ALL);
      int[] objs = new int[count];
      H5.H5Fget_obj_ids(fid, HDF5Constants.H5F_OBJ_ALL, count, objs);
      String[] name = new String[1];
      System.out.println("\t" + count + " objects still open:");
      for (int i = 0; i < count; i++) {
        int type = H5.H5Iget_type(objs[i]);
        long status = H5.H5Iget_name(objs[i], name, 1024);
        System.out.print("\t\tid: " + objs[i] + ", " + name[0]);
        if (HDF5Constants.H5I_DATASET == type) {
          System.out.println(", H5I_DATASET");
        else if (HDF5Constants.H5I_FILE == type) {
          System.out.println(", H5I_FILE");
        else if (HDF5Constants.H5I_GROUP == type) {
          System.out.println(", H5I_GROUP");
        else if (HDF5Constants.H5I_DATATYPE == type) {
          System.out.println(", H5I_DATATYPE");
        else if (HDF5Constants.H5I_ATTR == type) {
          System.out.println(", H5I_ATTR");
        else {
          System.out.println(", UNKNOWN " + type);
    catch (Exception e) {


Looks like there is a problem. We will try to reproduce it.
Thank you for reporting it. -- Peter

There is a memory leak at hdf-java when you create a new group. The fix will be
in the next release (around middle December 2009).

You will not be able to see the memory leak from the JVM heap. You have
to look at the memory from the OS level. Below is the sample code I added to

If you build hdf-java from the source, the fix is simple. Make the change
at H5Group.create()


        int gid = H5.H5Gcreate(, fullPath, -1);
        try {H5.H5Gclose(gid);} catch (Exception ex) {}

We are still checking the code to make sure there isn't any other memory leak.


        int count = 0;
        long KB = 1024;
        System.out.println("\n\nNo. of loops\tUsed(KB)\tTotal(KB)\tFree(KB)\tMax(KB)\n"+
            count ++;
            if (count % 100 == 0) { osm = ( ManagementFactory.getOperatingSystemMXBean() ;
                  df.format(count) + " \t" + df.format((osm.getCommittedVirtualMemorySize()) / KB) + " \t" + df.format(osm.getTotalPhysicalMemorySize() / KB) + " \t" + df.format(osm.getFreePhysicalMemorySize() / KB) + " \t" + df.format(Runtime.getRuntime().maxMemory() / KB));
            } ==========================

I've fixed the memory leak in the code. The group code might have been a
leak, but that was not what was contributing to the large growth in memory.
I tested this by taking out the group and still observing a large memory

Basically, it has to do with how we were extending the dataset size with
this code:

        // extend the dataset
        int did =;
        H5.H5Dextend(did, new long[] { (loopIndex +1) * (long) INSERT_SIZE
dataset.getDims(), null);

its hard to see but I think the leak was in this specific line:

dataset.getDims(), null);

We are creating a space object (is that the right terminology?) but we
weren't closing it. I think what we did by nesting the H5Dget_space in the
H5Sget_simple_extent_dims method is bad practice. So, my fix looks like

        // extend the dataset
        int did =;
        H5.H5Dextend(did, new long[] { currentDims + INSERT_SIZE});
        dataset.getDims()[0] = currentDims + INSERT_SIZE;

This seemed to work well. When I ran the test, I saw numbers like 367
billion values with only 65MB in the windows process. So, it definitely
seems like my fixed worked. (Note I didn't use H5ScalarDS.extend() method on
purpose, because it seemed like the check of the size was excessive)

However, I got process crash after writing 392 billion values.

Stack: [0x01bd0000,0x01c20000], sp=0x01c1f9e4, free space=318k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
C [jhdf5.dll+0x150b05]
C [jhdf5.dll+0x150bbe]

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J ncsa.hdf.hdf5lib.H5.H5Fflush(II)I
J ncsa.hdf.object.h5.H5ScalarDS.close(I)V
J ncsa.hdf.object.h5.H5ScalarDS.write(Ljava/lang/Object;)V
J h5.TestHDF5Write.main([Ljava/lang/String;)V
v ~BufferBlob::StubRoutines (1)

I'm not sure what happened there.

So, I rewrote my test to use only the ncsa.hdf.hdf5lib.H5 methods. And I'm
currently running a test that is on 617 billion values with a 57 MB windows
process. That's awesome.

I plan to rerun my original test (the one that includes the ncsa.hdf.object
classes) to see if I get the crash again. I wonder if I discovered something
wrong with the object layer. OR maybe its our fault again...

Thanks, Aaron Kagawa


There is a memory leak at hdf-java when you create a new group. The fix
will be
in the next release (around middle December 2009).

You will not be able to see the memory leak from the JVM heap. You have
to look at the memory from the OS level. Below is the sample code I added to

If you build hdf-java from the source, the fix is simple. Make the change
at H5Group.create()

        int gid = H5.H5Gcreate(, fullPath, -1);
        try {H5.H5Gclose(gid);} catch (Exception ex) {}

We are still checking the code to make sure there isn't any other memory


        int count = 0;
        long KB = 1024;
        System.out.println("\n\nNo. of
            count ++;
            if (count % 100 == 0) {
                osm = (
ManagementFactory.getOperatingSystemMXBean() ;
                  df.format(count) + " \t" +
                  df.format((osm.getCommittedVirtualMemorySize()) / KB)
+ " \t" +
                  df.format(osm.getTotalPhysicalMemorySize() / KB) +
" \t" +
                  df.format(osm.getFreePhysicalMemorySize() / KB) +
" \t" +
                  df.format(Runtime.getRuntime().maxMemory() / KB));

Super!!! Yes, H5.H5Sget_simple_extent_dims(H5.H5Dget_space(did),
dataset.getDims(), null) creates the memory leak.


I've fixed the memory leak in the code. The group code might have been a
leak, but that was not what was contributing to the large growth in memory.
I tested this by taking out the group and still observing a large memory

Basically, it has to do with how we were extending the dataset size with
this code:

        // extend the dataset
        int did =;
        H5.H5Dextend(did, new long[] { (loopIndex +1) * (long) INSERT_SIZE
dataset.getDims(), null);

its hard to see but I think the leak was in this specific line:

dataset.getDims(), null);

We are creating a space object (is that the right terminology?) but we
weren't closing it. I think what we did by nesting the H5Dget_space in the
H5Sget_simple_extent_dims method is bad practice. So, my fix looks like

        // extend the dataset
        int did =;
        H5.H5Dextend(did, new long[] { currentDims + INSERT_SIZE});
        dataset.getDims()[0] = currentDims + INSERT_SIZE;

This seemed to work well. When I ran the test, I saw numbers like 367
billion values with only 65MB in the windows process. So, it definitely
seems like my fixed worked. (Note I didn't use H5ScalarDS.extend() method on
purpose, because it seemed like the check of the size was excessive)

However, I got process crash after writing 392 billion values.

Stack: [0x01bd0000,0x01c20000], sp=0x01c1f9e4, free space=318k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
C [jhdf5.dll+0x150b05]
C [jhdf5.dll+0x150bbe]

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J ncsa.hdf.hdf5lib.H5.H5Fflush(II)I
J ncsa.hdf.object.h5.H5ScalarDS.close(I)V
J ncsa.hdf.object.h5.H5ScalarDS.write(Ljava/lang/Object;)V
J h5.TestHDF5Write.main([Ljava/lang/String;)V
v ~BufferBlob::StubRoutines (1)

I'm not sure what happened there.

So, I rewrote my test to use only the ncsa.hdf.hdf5lib.H5 methods. And I'm
currently running a test that is on 617 billion values with a 57 MB windows
process. That's awesome.

I plan to rerun my original test (the one that includes the ncsa.hdf.object
classes) to see if I get the crash again. I wonder if I discovered something
wrong with the object layer. OR maybe its our fault again...

Thanks, Aaron Kagawa

There is a memory leak at hdf-java when you create a new group. The fix will be
in the next release (around middle December 2009).

You will not be able to see the memory leak from the JVM heap. You have
to look at the memory from the OS level. Below is the sample code I added to

If you build hdf-java from the source, the fix is simple. Make the change
at H5Group.create()

        int gid = H5.H5Gcreate(, fullPath, -1);
        try {H5.H5Gclose(gid);} catch (Exception ex) {}

We are still checking the code to make sure there isn't any other memory leak.


        int count = 0;
        long KB = 1024;
        System.out.println("\n\nNo. of loops\tUsed(KB)\tTotal(KB)\tFree(KB)\tMax(KB)\n"+
            count ++;
            if (count % 100 == 0) { osm = ( ManagementFactory.getOperatingSystemMXBean() ;
                  df.format(count) + " \t" + df.format((osm.getCommittedVirtualMemorySize()) / KB) + " \t" + df.format(osm.getTotalPhysicalMemorySize() / KB) + " \t" + df.format(osm.getFreePhysicalMemorySize() / KB) + " \t" + df.format(Runtime.getRuntime().maxMemory() / KB));
            } ==========================

