Errors accessing HDF5 over CIFS and/or NFS


#1

Hi,
I’ve been using HDF5 and h5py for many years to read files on CIFS and/or NFS filesystems, and rarely had problems accessing files that were already opened and accessed. Just to be clear, I’m not trying to use SMWR, and am ensuring that exactly one application on one machine has any file open, thought that app will usually be over a networked filesystem using CIFS or NFS. This has worked well for us for many years, but over the past couple months, I’ve been seeing occasional errors that effectively mean that no HDF5 files can be accessed without re-starting the application. The sorts of errors I get are

One error message (from Windows10):
out = grp[roiname][:]
File “h5py_objects.pyx”, line 54, in h5py._objects.with_phil.wrapper
File “h5py_objects.pyx”, line 55, in h5py._objects.with_phil.wrapper
File “C:\Users\xas_user\AppData\Local\Continuum\anaconda3\lib\site-packages\h5py_hl\dataset.py”, line 496, in getitem
self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
File “h5py_objects.pyx”, line 54, in h5py._objects.with_phil.wrapper
File “h5py_objects.pyx”, line 55, in h5py._objects.with_phil.wrapper
File “h5py\h5d.pyx”, line 181, in h5py.h5d.DatasetID.read
File “h5py_proxy.pyx”, line 130, in h5py._proxy.dset_rw
File “h5py_proxy.pyx”, line 84, in h5py._proxy.H5PY_H5Dread
OSError: Can’t read data (file read failed: time = Mon Oct 21 09:46:59 2019
, filename = ‘X:/XXX/XXX/XXXX.h5’, file descriptor = 79, errno = 22, error message =
‘Invalid argument’, buf = 0000024C832996C0, total read
size = 232, bytes this sub-read = 232, bytes actually read = 18446744073709551615, offset = 560639)

Another error message (this time from linux):
pos = self.xrmmap[‘positions/pos’][:, :, index]
File “h5py/_objects.pyx”, line 54, in h5py._objects.with_phil.wrapper
File “h5py/_objects.pyx”, line 55, in h5py._objects.with_phil.wrapper
File “/home/xas_user/anaconda3/lib/python3.7/site-packages/h5py/_hl/dataset.py”, line 573, in getitem
self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
File “h5py/_objects.pyx”, line 54, in h5py._objects.with_phil.wrapper
File “h5py/_objects.pyx”, line 55, in h5py._objects.with_phil.wrapper
File “h5py/h5d.pyx”, line 181, in h5py.h5d.DatasetID.read
File “h5py/_proxy.pyx”, line 130, in h5py._proxy.dset_rw
File “h5py/_proxy.pyx”, line 84, in h5py._proxy.H5PY_H5Dread
OSError: Can’t read data (inflate() failed

I see this on Windows 10 with Python3.7 from Anaconda Python or from Python.org, where the filesystem (Centos7) is mounted with CIFS. I also see such errors from a different Centos7 box with Python 3.7 from Anaconda Python. I have seen the errors with both h5py 2.8.0 and 2.9.0. An example of print(h5py.version.info) on seeing the last error message above is:

h5py    2.9.0
HDF5    1.10.4
Python  3.7.4 (default, Aug 13 2019, 20:35:49) 
[GCC 7.3.0]
sys.platform    linux
sys.maxsize     9223372036854775807
numpy   1.17.2

It seems that the common feature is HDF5 1.10.4. As is perfectly reasonable, the h5py folks claim this is not really a problem they can fix.

For the linux machine, the mount information in /etc/mtab to the filesystem holding the HDF5 files is

XXXX.XXX.aps.anl.gov:/home /xxx/home nfs rw,nosuid,nodev,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=164.54.xxx.xxx,mountvers=3,mountport=20048,mountproto=udp,local_lock=all,addr=164.54.xxx.xxx 0 0

This happens infrequently (maybe once every few days over which we’re reading 100s of HDF5 files and accessing different groups and datasets many thousand times), but I’ve now seen this enough times on each platform over the past months to be sure that this is a new problem that I have not seen over the past couple years.

Has anyone else seen such problems? Any suggestions on how to avoid or even better diagnose this problem?

Thanks.