Hi,
Some time ago, a Pytables user complained about that the next simple
operation was hogging gigantics amounts of memory:
import tables, numpy
N = 600
f = tables.openFile('foo.h5', 'w')
f.createCArray(f.root, 'huge_array',
tables.Float64Atom(),
shape = (2,2,N,N,50,50))
for i in xrange(50):
for j in xrange(50):
f.root.huge_array[:,:,:,:,j,i] = \
numpy.array([[1,0],[0,1]])[:,:,None,None]
and I think that the problem could be in the HDF5 side.
The point is that, for the 6-th dimensional 'huge_array' dataset,
Pytables computed an 'optimal' chunkshape of (1, 1, 1, 6, 50, 50).
Then, the user wanted to update the array starting in the trailing
dimensions (instead of using the leading ones, which is the recommended
practice for C-ordered arrays). This results in Pytables asking HDF5
to do the update using the traditional procedure:
/* Create a simple memory data space */
if ( (mem_space_id = H5Screate_simple( rank, count, NULL )) < 0 )
return -3;
/* Get the file data space */
if ( (space_id = H5Dget_space( dataset_id )) < 0 )
return -4;
/* Define a hyperslab in the dataset */
if ( rank != 0 && H5Sselect_hyperslab( space_id, H5S_SELECT_SET, start,
step, count, NULL) < 0 )
return -5;
if ( H5Dwrite( dataset_id, type_id, mem_space_id, space_id,
H5P_DEFAULT, data ) < 0 )
return -6;
While I understand that this approach is suboptimal (2*2*600*100=240000
chunks has to 'updated' for each update operation in the loop), I don't
understand completely the reason why the user reports that the script
is consuming so much memory (the script crashes, but perhaps it is
asking for several GB). My guess is that perhaps HDF5 is trying to
load all the affected chunks in-memory before trying to update them,
but I thought it is best to report this here just in case this is a
bug, or, if not, the huge demand of memory can be somewhat alleviated.
In case you need more information, you may find it by following the
details of the discussion in the next thread:
http://www.mail-archive.com/pytables-users@lists.sourceforge.net/msg00722.html
Thanks!
···
--
0,0< Francesc Altet http://www.carabos.com/
V V Cárabos Coop. V. Enjoy Data
"-"
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.