HDFView Out of Memory Error

Dear HDF-Forum,

I have a few questions. Also, I am new to using HDF(5), and have been browsing the user manuals as well as the forum archive, but am stuck with my first hdf5 file.

I submitted a query to the HDF support team, but thought this may also be a good place to get help.

First question: without writing my own scripts to grab web content, is there a way to search all the archives simultaneously? It seems one can only browse month by month, and there are no search options.

Now my hdf5 question. I have created a file using pytables (also created it with h5py and find same behavior). The file is only 105 MB - however, when I try and open the file with HDFViewer, it crashes with the following error

java.lang.OutOfMemoryError: Java heap space

The group structure of my file looks like

root
  name
    src
      snk
        cfg
          data

the data = 1 x 48 array of floats
the cfg are tags for different configurations, in my case, 1000, 1010, ... 20100 (1911 in all)
and the rest of the tags are metadata describing the underlying data.

Similarly, when I try and open this file with Mathematica, it also crashes - without giving any error message at all. Given the HDFViewer error, I presume Mathematica is having a similar issue.

If I use h5ls, h5check, or ptdump, all these tools have no problem with my data file. Also, I can open and manipulate them with pytables as well.

If I take the exact same data, and remove the "cfg" sub-group, and so package the data as a 1911 x 48 array of floats, then HDFViewer can open it, as well as Mathematica.

However, for various reasons, it is much more desirable for me to have the structure mentioned above.

It surprises me that such a small file, with such a simple seaming group structure, is causing this crash. I have not been able to locate any info yet on limits of the size of the group structure.

Am I actually running into some size limitations?

Have I done something unwise in creating the file?

To help answer the second question - I give below details of my python script which are relevant when I created the file, in case anyone is familiar with python and pytables.

Thanks,

Andre

import tables as pyt
import personal_calls_to_numpy as pc
import os

corrs = ['name1','name2',...]

f = pyt.openFile('nplqcd_iso_old.h5','w')
root = f.root
for corr in corrs:
    cg = f.createGroup(root,corr.split('_')[-1])
    src = f.createGroup(cg,'Src_GaussSmeared')
    for s in ['S','P']:
        if os.path.exists('concatonated/'+corr+'_'+tag+'_'+s+'.dat'):
            print('adding '+corr+'_'+tag+'_'+s+'.dat')
            h,c = pc.read_corr('concatonated/'+corr+'_'+tag+'_'+s+'.dat')
            Ncfg = int(h[0]); NT = int(h[1])
            snk = f.createGroup(src,'Snk_'+s)
            #data = f.createArray(snk,'real',c)
            for cfg in range(Ncfg):
                gc = f.createGroup(snk,dirs[cfg])
                data = f.createArray(gc,'real',c[cfg])
        else:
            print('concatonated/'+corr+'_'+tag+'_'+s+'.dat DOES NOT EXIST')
f.close()

Hi Andre,

Thank you for your file (that you sent to the Helpdesk). I can look at
your file with HDFView (though it takes a while to bring up on my
machine), and I can see that there are many, many groups in the file.

Every object in HDF5 is treated in HDFView as a GUI component, and it
will take up extra memory. If you have many, many groups in your file,
it can therefore cause all of the memory to be used up and it will fail.
You may be able to increase the java heap to alleviate the problem, but
it will not solve the issue.

It is possible to make HDFView handle this case properly instead of
failing. I will enter a bug report so that we take a look at this.

Thanks!
-Barbara

···

Dear HDF-Forum,

I have a few questions. Also, I am new to using HDF(5), and have been browsing the user manuals as well as the forum archive, but am stuck with my first hdf5 file.

I submitted a query to the HDF support team, but thought this may also be a good place to get help.

First question: without writing my own scripts to grab web content, is there a way to search all the archives simultaneously? It seems one can only browse month by month, and there are no search options.

Now my hdf5 question. I have created a file using pytables (also created it with h5py and find same behavior). The file is only 105 MB - however, when I try and open the file with HDFViewer, it crashes with the following error

java.lang.OutOfMemoryError: Java heap space

The group structure of my file looks like

root
name
   src
     snk
       cfg
         data

the data = 1 x 48 array of floats
the cfg are tags for different configurations, in my case, 1000, 1010, ... 20100 (1911 in all)
and the rest of the tags are metadata describing the underlying data.

Similarly, when I try and open this file with Mathematica, it also crashes - without giving any error message at all. Given the HDFViewer error, I presume Mathematica is having a similar issue.

If I use h5ls, h5check, or ptdump, all these tools have no problem with my data file. Also, I can open and manipulate them with pytables as well.

If I take the exact same data, and remove the "cfg" sub-group, and so package the data as a 1911 x 48 array of floats, then HDFViewer can open it, as well as Mathematica.

However, for various reasons, it is much more desirable for me to have the structure mentioned above.

It surprises me that such a small file, with such a simple seaming group structure, is causing this crash. I have not been able to locate any info yet on limits of the size of the group structure.

Am I actually running into some size limitations?

Have I done something unwise in creating the file?

To help answer the second question - I give below details of my python script which are relevant when I created the file, in case anyone is familiar with python and pytables.

Thanks,

Andre

import tables as pyt
import personal_calls_to_numpy as pc
import os

corrs = ['name1','name2',...]

f = pyt.openFile('nplqcd_iso_old.h5','w')
root = f.root
for corr in corrs:
   cg = f.createGroup(root,corr.split('_')[-1])
   src = f.createGroup(cg,'Src_GaussSmeared')
   for s in ['S','P']:
       if os.path.exists('concatonated/'+corr+'_'+tag+'_'+s+'.dat'):
           print('adding '+corr+'_'+tag+'_'+s+'.dat')
           h,c = pc.read_corr('concatonated/'+corr+'_'+tag+'_'+s+'.dat')
           Ncfg = int(h[0]); NT = int(h[1])
           snk = f.createGroup(src,'Snk_'+s)
           #data = f.createArray(snk,'real',c)
           for cfg in range(Ncfg):
               gc = f.createGroup(snk,dirs[cfg])
               data = f.createArray(gc,'real',c[cfg])
       else:
           print('concatonated/'+corr+'_'+tag+'_'+s+'.dat DOES NOT EXIST')
f.close()

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Barbara Jones
bljones@hdfgroup.org