Once again: HDF5 concurrency and file corruption avoidance

Christian_Eder · October 13, 2010, 2:25pm

Hi,

I'm using HDF5 in my company to store hierarchical organized log data
from a control system.
Up to now we had the following setup:

There is 1 control PC which operates the machine and continuously stores
the data to a local HDF5 file. As there are several departments in our
company that need the data for analysis and visualization there is also
the need to provide read access to this file for a number of client PCs.
Requirements are
.) the data provided to the clients should be quite up-to-date. It
doesn't have to be live or real-time data, but it should be at least ~1h
old
.) Since a HDF5 file that is opened by writer process cannot be read
consistenty, we cannot provide direct access to this file to Clients.
Currently we have a cronjob that perdiodically (once every 15min)
copies the HDF5 file to the fileserver. Clients use this copy for
read-only analysis and visualization.

Although this setup satisfies more or less our needs, there are several
issues we suffer from and we want to solve better in the next project.

1) HDF5 files cannot be read (and therefore not be copied) consistently
when another process write to it. As our control PC continuously writes
new data, this is quite painfully when the cronjob tries to copy the
file to the server as a simple "cp" is not sufficient. We have
implemented some kind of primitive non-blocking-write protocol between
the control process and our custom file copy tool so that the file is
copied only when the write process has flushed the file and not yet
started a new write operation. If the copy finishes before the writer
starts writing again, everything is fine, otherwise it is retried.

2) The copy on the server is read-only in fact, since changes would get
overwritten during the next sync copy (and also multiple
non-synchronized clients must be able to open the file). Sometimes it
would be nice if additional data (mostly comments) could be saved to
the file during analysis.

3) Whenever the write process on the control PC crashes (and that
happens from time to time, as it is ongoing development) the HDF5 file
might be corrupted, therefore losing all previously saved data. We
therefore have invested some shell script effort to detect such
corruptions and restore the last good file from the server to keep the
data loss small, but its far from perfect and sometimes means a data
loss of the whole day (since the last fileserver backup)

To solve issues 1 and 2, we plan a different setup for the next project:

We want to have a single HDF5 server which provides read and write
access to the file via some kind of RPC protocol
(we already have such a protocol for other tasks in our control system
which has good performance, so we will use that, as there seems to be no
existing HDF5 server implementation around that also supports remote
write access). Concurrency should be no problem then as multiple clients
are handled by
a single server process which synchronizes its threads via
read-write-locks. Multithread support is also enabled in the HDF5 library.
In fact this would be some kind of database server, except that we want
to keep HDF5 as a file format for its performance in manipulating large
data sets and its hierarchical structure.

A first implementation seems to work quite well except that we still
suffer from the problem that a crash of the server (or even a server
thread) still corrupts the whole file. Of course, one could argue that a
server should be stable enough to prevent such situations, but it's
unrealistic to assume that there will never be any bug in the software
nor any other failure (power loss,...) that causes such a corruption.
As for the next project we have stricter requirements concerning
reliability and dependability of the log data than for the current
project (a loss of a few hours up to one day is currently acceptable,
but won't be acceptable then) this might turn out a showstopper (or at
least means a lot of additional effort to implement a more sophisticated
custom detection and recovery framework).

Now that I told (or spamed you enough with what we do with HDF5 and
what we plan, I have some concrete questions:

.) You announced support for a single-writer-multiple-reader approach in
version 1.10. Do you already have any detailed information about how you
plan to implement this and what will be the limitations ? This feature
might address most of our issues, so it has great impact on how much
effort we should invest in our server solution (it does not make sense
to implement features that we can get natively from HDF5 then).

.) Is there already an existing project that offers remote access to
HDF5 files in read and write mode. (I only found read-only or
write-local, read-remote implementations) that I did not find ?

.) In some post in August, you mentioned that your are already finishing
the implementation of metadata journaling . This is quite a must-have
for our project, so I'm really interested in when this will be
available. Is there any chance that this will be backported to 1.8 or do
we have to wait for 1.10 ? Looking at the current 1.9 snapshot, I cannot
find any hints of this journaling features, so how much is still missing ?

Thanks
chris

Quincey_Koziol · October 13, 2010, 3:05pm

Hi Christian,

Hi,

I'm using HDF5 in my company to store hierarchical organized log data
from a control system.
Up to now we had the following setup:

There is 1 control PC which operates the machine and continuously stores
the data to a local HDF5 file. As there are several departments in our
company that need the data for analysis and visualization there is also
the need to provide read access to this file for a number of client PCs.
Requirements are
.) the data provided to the clients should be quite up-to-date. It
doesn't have to be live or real-time data, but it should be at least ~1h
old
.) Since a HDF5 file that is opened by writer process cannot be read
consistenty, we cannot provide direct access to this file to Clients.
Currently we have a cronjob that perdiodically (once every 15min)
copies the HDF5 file to the fileserver. Clients use this copy for
read-only analysis and visualization.

Although this setup satisfies more or less our needs, there are several
issues we suffer from and we want to solve better in the next project.

1) HDF5 files cannot be read (and therefore not be copied) consistently
when another process write to it. As our control PC continuously writes
new data, this is quite painfully when the cronjob tries to copy the
file to the server as a simple "cp" is not sufficient. We have
implemented some kind of primitive non-blocking-write protocol between
the control process and our custom file copy tool so that the file is
copied only when the write process has flushed the file and not yet
started a new write operation. If the copy finishes before the writer
starts writing again, everything is fine, otherwise it is retried.

2) The copy on the server is read-only in fact, since changes would get
overwritten during the next sync copy (and also multiple
non-synchronized clients must be able to open the file). Sometimes it
would be nice if additional data (mostly comments) could be saved to
the file during analysis.

3) Whenever the write process on the control PC crashes (and that
happens from time to time, as it is ongoing development) the HDF5 file
might be corrupted, therefore losing all previously saved data. We
therefore have invested some shell script effort to detect such
corruptions and restore the last good file from the server to keep the
data loss small, but its far from perfect and sometimes means a data
loss of the whole day (since the last fileserver backup)

To solve issues 1 and 2, we plan a different setup for the next project:

We want to have a single HDF5 server which provides read and write
access to the file via some kind of RPC protocol
(we already have such a protocol for other tasks in our control system
which has good performance, so we will use that, as there seems to be no
existing HDF5 server implementation around that also supports remote
write access). Concurrency should be no problem then as multiple clients
are handled by
a single server process which synchronizes its threads via
read-write-locks. Multithread support is also enabled in the HDF5 library.
In fact this would be some kind of database server, except that we want
to keep HDF5 as a file format for its performance in manipulating large
data sets and its hierarchical structure.

A first implementation seems to work quite well except that we still
suffer from the problem that a crash of the server (or even a server
thread) still corrupts the whole file. Of course, one could argue that a
server should be stable enough to prevent such situations, but it's
unrealistic to assume that there will never be any bug in the software
nor any other failure (power loss,...) that causes such a corruption.
As for the next project we have stricter requirements concerning
reliability and dependability of the log data than for the current
project (a loss of a few hours up to one day is currently acceptable,
but won't be acceptable then) this might turn out a showstopper (or at
least means a lot of additional effort to implement a more sophisticated
custom detection and recovery framework).

This is reasonable, although it must have been a lot of work. What server are you using? (Or is it custom in-house?)

Now that I told (or spamed you enough with what we do with HDF5 and
what we plan, I have some concrete questions:

.) You announced support for a single-writer-multiple-reader approach in
version 1.10. Do you already have any detailed information about how you
plan to implement this and what will be the limitations ? This feature
might address most of our issues, so it has great impact on how much
effort we should invest in our server solution (it does not make sense
to implement features that we can get natively from HDF5 then).

Work on this project continues and our goal is to have all metadata modifications supported for SWMR access in the 1.10.0 release. The primary limitation is that the file system must support POSIX-compliant I/O in order to allow lock-free concurrent access, which currently rules out NFS & CIFS access.

Note that this will still not allow multiple writers to access the same HDF5 file concurrently.

.) Is there already an existing project that offers remote access to
HDF5 files in read and write mode. (I only found read-only or
write-local, read-remote implementations) that I did not find ?

I don't think so.

.) In some post in August, you mentioned that your are already finishing
the implementation of metadata journaling . This is quite a must-have
for our project, so I'm really interested in when this will be
available. Is there any chance that this will be backported to 1.8 or do
we have to wait for 1.10 ? Looking at the current 1.9 snapshot, I cannot
find any hints of this journaling features, so how much is still missing ?

I'm on the last two phases of merging the metadata journaling support into the trunk. Barring any surprises, this should be ready in 6-8 weeks.

Quincey

···

On Oct 13, 2010, at 9:25 AM, Christian Eder wrote:

Christian_Eder · October 14, 2010, 2:12pm

We're using a custom inhouse server. But we're thinking about changing
to some free RPC library (maybe XML-RPC, depends on performance tests),
cause our server is optimized for performance but has some limitations
concerning the maximal size of the argument types of the RPC calls which
might get us into trouble if we read and save long arrays to the HDF
File in a single call.

···

On 10/13/2010 05:05 PM, Quincey Koziol wrote:

This is reasonable, although it must have been a lot of work. What server are you using? (Or is it custom in-house?)