HDF5 Compatibility with NFS and Fault Tolerance

We introduced locking in a recent version to assist in dealing with the concurrent access issues but the larger concern I have is with file consistency with a network interruption or failure.

I was debugging HDF5 at a low level and worked only on one use case: change an existing variable length string attribute in a dataset of a HDF5 file. I added some print statements around the write system calls and identified that there are at a minimum four write calls that occur for this use case. One at the superblock which updates the file's end of file address for the new global heap structure at the bottom of the file, which is just a 4k offset from the previous value, the next one is ambiguous to me since the values were identical to the previous file's contents but I think it was either in the symbol table or the b-tree, I can't recall off the top of my head. There is a 4096 bytes written to the end of the file that contains the new global heap. And lastly a 120 bytes written to the local heap where an attribute is either created or updated to point to the new global heap at the end of the file.

The issue I see with NFS as the underlying (almost posix compliant) filesystem is that no matter what mutable HDF operation you wish to perform, a network interruption or application crash can result in file corruption.

The NFS settings we're using are:

resvport,rw,noexec,sync,wsize=32768,rsize=32768,nfsvers=3,soft,nolocallocks,intr

I don't see a solution unless we can setup an all-or-nothing style transaction within HDF5 and as the HDF group has already posted that HDF5 is not transactional, I don't know how to proceed. Even if you were to somehow tell NFS to write the entire file to disk (or in this case to NFS) POSIX only guarantees atomicity for 512 bytes. I thought for a while that if a write failed the client would just try to rewrite the data and that the client would get a response for how many bytes were written successfully, unfortunately with NFS this is not necessarily the case, and from experimentation I have determined that it depends heavily on the client implementation and options used.

Any suggestions or ideas on the combinations of options for both NFS and HDF that would eliminate the possibility of file corruption would be greatly appreciated!

Hi Luke

Do you lock the files? What nfs version?

Regards

Dimitris

Luke Campbell

Hi all,

My answers are interspersed below. This is the first I've seen of this thread, so I apologize if I'm missing any context here. Other HDF Group personnel can weigh in if there's anything I'm missing.

Dana

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Luke Campbell
Sent: Tuesday, August 27, 2013 1:18 PM
To: Dimitris Servis
Cc: HDF Users Discussion List
Subject: Re: [Hdf-forum] HDF5 Compatibility with NFS and Fault Tolerance

We introduced locking in a recent version to assist in dealing with the concurrent access issues but the larger concern I have is with file consistency with a network interruption or failure.

[Dana Robinson] At this time, the HDF5 library does not support concurrent access when a writer is involved. In the future HDF5 1.10.0 release (release date: TBA), we plan to include a feature that will allow concurrent access by a single writer and any number of readers (the Single Writer/Multiple Reader - SWMR pattern). This feature is under active development but is not ready for production use. Let me know if you'd like to know more about the feature and its status.

I was debugging HDF5 at a low level and worked only on one use case: change an existing variable length string attribute in a dataset of a HDF5 file. I added some print statements around the write system calls and identified that there are at a minimum four write calls that occur for this use case. One at the superblock which updates the file's end of file address for the new global heap structure at the bottom of the file, which is just a 4k offset from the previous value, the next one is ambiguous to me since the values were identical to the previous file's contents but I think it was either in the symbol table or the b-tree, I can't recall off the top of my head. There is a 4096 bytes written to the end of the file that contains the new global heap. And lastly a 120 bytes written to the local heap where an attribute is either created or updated to point to the new global heap at the end of the file.

The issue I see with NFS as the underlying (almost posix compliant) filesystem is that no matter what mutable HDF operation you wish to perform, a network interruption or application crash can result in file corruption.

The NFS settings we're using are:

resvport,rw,noexec,sync,wsize=32768,rsize=32768,nfsvers=3,soft,nolocallocks,intr

I don't see a solution unless we can setup an all-or-nothing style transaction within HDF5 and as the HDF group has already posted that HDF5 is not transactional, I don't know how to proceed. Even if you were to somehow tell NFS to write the entire file to disk (or in this case to NFS) POSIX only guarantees atomicity for 512 bytes. I thought for a while that if a write failed the client would just try to rewrite the data and that the client would get a response for how many bytes were written successfully, unfortunately with NFS this is not necessarily the case, and from experimentation I have determined that it depends heavily on the client implementation and options used.

[Dana Robinson]The SWMR feature will introduce write ordering that will always leave the file in a consistent, but not necessarily up-to-date state. This will not require transactions. As for atomicity, the library will use checksums to retry when torn (non-atomic) writes are encountered under SWMR read conditions. This will fix the lack of write-call-level atomicity in a file system. Write ordering at the file system level will still be required, though. We have also been discussing a transaction feature for HDF5, but this is at the preliminary design stage and is currently unfunded. Contact us if you are interested in supporting it :slight_smile:
Any suggestions or ideas on the combinations of options for both NFS and HDF that would eliminate the possibility of file corruption would be greatly appreciated!

Hi Luke

Do you lock the files? What nfs version?

Regards

Dimitris

Luke Campbell

Hi Luke

I am afraid the corruptions cannot be avoided, especially with older
nfs where even locks fail... I also have problems with network
interruptions ...

Dana, I have the impression tha SWMR will not be supportrd on NFS.

I hope some of the upcoming API in 1.10.0 will alow for safer metadata
management that will rationalise these risks

Best

-- dimitris

···

2013/8/27, Dana Robinson <derobins@hdfgroup.org>:

Hi all,

My answers are interspersed below. This is the first I've seen of this
thread, so I apologize if I'm missing any context here. Other HDF Group
personnel can weigh in if there's anything I'm missing.

Dana

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of
Luke Campbell
Sent: Tuesday, August 27, 2013 1:18 PM
To: Dimitris Servis
Cc: HDF Users Discussion List
Subject: Re: [Hdf-forum] HDF5 Compatibility with NFS and Fault Tolerance

We introduced locking in a recent version to assist in dealing with the
concurrent access issues but the larger concern I have is with file
consistency with a network interruption or failure.

[Dana Robinson] At this time, the HDF5 library does not support concurrent
access when a writer is involved. In the future HDF5 1.10.0 release
(release date: TBA), we plan to include a feature that will allow concurrent
access by a single writer and any number of readers (the Single
Writer/Multiple Reader - SWMR pattern). This feature is under active
development but is not ready for production use. Let me know if you'd like
to know more about the feature and its status.

I was debugging HDF5 at a low level and worked only on one use case: change
an existing variable length string attribute in a dataset of a HDF5 file. I
added some print statements around the write system calls and identified
that there are at a minimum four write calls that occur for this use case.
One at the superblock which updates the file's end of file address for the
new global heap structure at the bottom of the file, which is just a 4k
offset from the previous value, the next one is ambiguous to me since the
values were identical to the previous file's contents but I think it was
either in the symbol table or the b-tree, I can't recall off the top of my
head. There is a 4096 bytes written to the end of the file that contains the
new global heap. And lastly a 120 bytes written to the local heap where an
attribute is either created or updated to point to the new global heap at
the end of the file.

The issue I see with NFS as the underlying (almost posix compliant)
filesystem is that no matter what mutable HDF operation you wish to perform,
a network interruption or application crash can result in file corruption.

The NFS settings we're using are:

resvport,rw,noexec,sync,wsize=32768,rsize=32768,nfsvers=3,soft,nolocallocks,intr

I don't see a solution unless we can setup an all-or-nothing style
transaction within HDF5 and as the HDF group has already posted that HDF5 is
not transactional, I don't know how to proceed. Even if you were to somehow
tell NFS to write the entire file to disk (or in this case to NFS) POSIX
only guarantees atomicity for 512 bytes. I thought for a while that if a
write failed the client would just try to rewrite the data and that the
client would get a response for how many bytes were written successfully,
unfortunately with NFS this is not necessarily the case, and from
experimentation I have determined that it depends heavily on the client
implementation and options used.

[Dana Robinson]The SWMR feature will introduce write ordering that will
always leave the file in a consistent, but not necessarily up-to-date state.
This will not require transactions. As for atomicity, the library will use
checksums to retry when torn (non-atomic) writes are encountered under SWMR
read conditions. This will fix the lack of write-call-level atomicity in a
file system. Write ordering at the file system level will still be
required, though. We have also been discussing a transaction feature for
HDF5, but this is at the preliminary design stage and is currently unfunded.
Contact us if you are interested in supporting it :slight_smile:
Any suggestions or ideas on the combinations of options for both NFS and HDF
that would eliminate the possibility of file corruption would be greatly
appreciated!

Hi Luke

Do you lock the files? What nfs version?

Regards

Dimitris

Luke Campbell