HDF5 Compatibility with NFS and Fault Tolerance

We have a distributed application that uses HDF5 files on a NFS host. Multiple hosts all connect to the single NFS host and read and write HDF5 files. We have limited the ability for concurrent write access to a single file. What we have encountered is several instances of low-level HDF5 file corruption. My initial determination was that network interruptions or partitions cause corruption. In the sequence of events for updating a dataset or adding an attribute to a dataset there are multiple seek and write operations at the low level sec2 driver. There is no transactional support or atomicity in any data mutations that occur.

My question is this, is there an option in the HDF5 API that can support transactions and eliminate any corruption in the event of a network interruption (basically if write returns -1 at any point). Whether it be atomic write operations, or a cleverly ordered set of writes that result in file consistency at an atomic level? Or maybe I should be using a different driver for access.

Have anyone else experienced these kinds of issues with HDF5 on top of NFS? I spoke with the h5py developer and he informed me that a common solution is to copy the file to a local hard disk, make the changes and copy it back (and then move it, which is atomic).

Thanks!

Luke Campbell
Software Engineer
RPS ASA
55 Village Square Drive
South Kingstown RI 02879-8248 USA
Tel: +1 (401) 789-6224 ext 359
Cell: (860) 381-0387

Hi Luke

Do you lock the files? What nfs version?

Regards

Dimitris

ยทยทยท

2013/8/26, Luke Campbell <LCampbell@asascience.com>:

We have a distributed application that uses HDF5 files on a NFS host.
Multiple hosts all connect to the single NFS host and read and write HDF5
files. We have limited the ability for concurrent write access to a single
file. What we have encountered is several instances of low-level HDF5 file
corruption. My initial determination was that network interruptions or
partitions cause corruption. In the sequence of events for updating a
dataset or adding an attribute to a dataset there are multiple seek and
write operations at the low level sec2 driver. There is no transactional
support or atomicity in any data mutations that occur.

My question is this, is there an option in the HDF5 API that can support
transactions and eliminate any corruption in the event of a network
interruption (basically if write returns -1 at any point). Whether it be
atomic write operations, or a cleverly ordered set of writes that result in
file consistency at an atomic level? Or maybe I should be using a different
driver for access.

Have anyone else experienced these kinds of issues with HDF5 on top of NFS?
I spoke with the h5py developer and he informed me that a common solution is
to copy the file to a local hard disk, make the changes and copy it back
(and then move it, which is atomic).

Thanks!

Luke Campbell
Software Engineer
RPS ASA
55 Village Square Drive
South Kingstown RI 02879-8248 USA
Tel: +1 (401) 789-6224 ext 359
Cell: (860) 381-0387