Hi everyone!
I'm trying to run simulations at a nehalem-cluster which is using a lustre
system for I/O. My code uses parallel HDF5 output writing timestep-groups
with 3D data in one file, thus leading to one big outputfile when the
simulation is done.
The problem is, that parallel HDF5 needs file locking which must be provided
by the lustre via some daemon or something else. This daemon will lead to
massive performance losses of the lustre up to 50%. This is why the
cluster-admins refuse to enable file locking at their lustre system.
Consequently I will not be able to write anything. When I try it stops with:
File locking failed in ADIOI_Set_lock(fd 18,cmd F_SETLKW/7,type
F_WRLCK/1,whence 0) with return value FFFFFFFF and errno 26.
If the file system is NFS, you need to use NFS version 3, ensure that the
lockd daemon is running on all the machines, and mount the directory with
the 'noac' option (no attribute caching).
ADIOI_Set_lock:: Function not implemented
ADIOI_Set_lock:offset 6488, length 96
and so on.
Is there any workaround for this or is HDF5 reliant on file locking
respectively? Otherwise I will be not able to use this cluster.
Thanks and best regards
Sebastian
···
--
View this message in context: http://hdf-forum.184993.n3.nabble.com/File-locking-of-parallel-HDF5-on-lustre-without-file-locking-support-tp2553896p2553896.html
Sent from the hdf-forum mailing list archive at Nabble.com.