Slab access across chunks uses too much memory

Hi all,

We have a general data exploration GUI that allows users to slice (or take slabs) across any dimensions of a dataset. With a file that contains a 4D dataset which is a 2D scan where each scan point is an image, the data is chunked in the last dimension. Looking a 2D slices across 3rd and 4th dimensions works fine.

However, slicing across 2nd and 3rd dimensions causes memory usage to peak unusually high - in fact, high enough to cause a JVM crash in our Java GUI when it runs out of heap memory. We can demonstrate this effect too using h5py by comparing a single 2D access to a line-by-line access to the dataset:

-----8<-----
import h5py

f = h5py.File('/dls/i22/data/2011/sw5604-1/i22-34808-Pilatus2M.h5', 'r')

print 'Version:', h5py.h5.get_libversion() print 'Driver:', f.driver

d = f.get('/entry/instrument/detector/data')

pl = d.id.get_create_plist()

print 'Chunking:', pl.get_chunk()

from time import time
import numpy as np

s = d.shape
print 'Shape:', s

n = -time()
l = []
for i in range(s[1]):
  l.append(d[0,i,:,0])
b = np.vstack(l)
n += time()
print 'Line-by-line slice:', n

n = -time()
a = d[0,:,:,0]
n += time()
print 'Whole slice:', n

print 'All same:', np.all(a == b)

print 'Sum:', a.sum(), b.sum()
-----8<-----

Gives output:
-----8<-----
Version: (1L, 8L, 7L)
Driver: sec2
Chunking: (1, 1, 1, 1475)
Shape: (1, 120, 1679, 1475)
Line-by-line slice: 31.2174389362
Whole slice: 4.58652997017
All same: True
Sum: 219855 219855
[src77879@ws042 ~]$ python testh5.py
Version: (1L, 8L, 7L)
Driver: sec2
Chunking: (1, 1, 1, 1475)
Shape: (1, 120, 1679, 1475)
Line-by-line slice: 3.69943618774
Whole slice: 4.54985308647
All same: True
Sum: 219855 219855
-----8<-----

This shows that line-by-line access is quicker and monitoring memory usage with Gnome's system monitor illustrates the problem in the attached image. The first bump in user memory is when the script is run to warm up the file cache (the file lives off an NFS mount). The second bump has a very small leading shoulder when the line-by-line slicing occurs and the main rise is caused by the whole slab access.

This was using h5py version 2.0.0 with hdf5 1.8.7 on a 32-bit RHEL 5 Core2 Duo box with 4G RAM. The kernel is 2.6.18-274.el5PAE.

Given that we see this with the HL Java library and also h5py, I believe this is a low-level issue rather than a binding wrapper issue. Can any developers replicate this behaviour? If so, is it fixable?

Thanks in advance,
Peter

···

--
Dr Peter Chang
T:+44 1235 778092; F:+44 1235 778468
Data Analyst Mathematical & Statistical Software Developer Diamond Light Source Ltd, Diamond House, Harwell Science & Innovation Campus, Didcot, Oxfordshire
OX11 0DE

--
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom