Can I copy an HDF5 file while it is being written to?


#1

Hi,

I got a corrupted HDF5-File when I opened it with HDF-View while it is being written to. To do this without corrupted files I need SWMR, right?

But is it possible to copy the file and open the copy with HDF-View (or other Programs) without compromise the original file? Just to look at the results so far and delete the file afterwards.


#2

Can you give us some background on your OS, file system, and versions of the software?

HDFView does not currently support this use case (reading from an HDF5 that’s actively being written by another process), SWMR or not.

When opened in SWMR write mode (mode H5F_ACC_SWMR_WRITE), readers accessing the file in SWMR read mode (mode H5F_ACC_RDONLY | H5F_ACC_SWMR_READ) are guaranteed a valid view of the file, i.e., they will not encounter invalid or partially written data. You can use a tool such as h5watch to get a tail -f -like progressive view of the content of a dataset to which elements are being appended. Maybe that’s all you need?

It is certainly “possible,” but may be more work than you’d expect. Unlike SWMR, you’d need some kind of interprocess communication (IPC) between the writing and reading processes. In the simplest case, this might be done as follows: The writer periodically signals potential reader(s) whenever a complete file state has been flushed to disk and pause. The readers then could make a copy and would notify the writer once they are done to resume writing. This is of course very inefficient but highlights the issue that the state of an “active” HDF5 file is more than bytes on disk and that’s where a lot of the complexity stems from.

If you don’t have any control over the writer, I’m afraid there’s not much you can do.
Otherwise, maybe you can tell us a little more about your use case?

Best, G.


#3

Thanks for your response.
I’m working with Windows 10 (NTFS) and Debian (ext4). Reading the file with HDFView 3.1.0 for Windows on Windows with HDF5 1.10.5 and writing the file on Debian with HDF5 1.10.4.

This is good to know. Will HDFView support this in the near future?

I have control over the writer and I already have an own reader on Windows in Pyhton (with h5py) which simply plots certain data. Since h5py supports SWMR (http://docs.h5py.org/en/stable/swmr.html) I could open the file in swmr-mode without any risk with my own viewer, right?

After I noticed the problem with HDFView I always copied the file and opened the copy. There was never a problem with that. Was that just luck?


#4

OK, it appears that there’s some kind of networking involved (NFS, SMB, …)? That’s more likely the culprit. Generally, there is no simple causal relationship between the corruption of an HDF5 file that’s being written and attempting to open/read from it. For example, on a local file system, when opening a file w/ HDFView in read-only mode, it is much more likely that HDFView will crash, because it’s reading what temporarily appears as an inconsistent HDF5 file (e.g., metadata pointing to data that hasn’t been flushed, etc.). In a networked scenario, things are much trickier when it comes to locking, caching, timeouts, etc. This setup is not supported even w/ SWMR, which depends on POSIX write() semantics.

It is conceivable that HDFView by default might open a file in H5F_ACC_RDONLY | H5F_ACC_SWMR_READ, but that would have an effect only with a “cooperating SWMR writer.” There might be already a corresponding Jira issue/improvement request.

Yes, as long as the file is on a SWMR-supported file system.

Yes, because there is no guarantee that the bytes you’ve copied represent a consistent HDF5 file. The writing process may still hold unflushed (meta-)data not reflected in the bytes on disk.

Best, G.


#5

Yes, writing and reading was via a network. Sorry, I should mention that earlier…

I’m trying to summarize:

  • Don’t open/read a file via network that’s being written.
  • Locally I could open/read a file that’s being written, but the reading-process may fail. The file itself is not in danger. Or is it a grey area?
  • Locally (on a SWMR-supported file system) I can open/read a file that’s being written with SWMR (write and read mode)

For those who want to read about it in the documentation:

In my case I cannot read that file via network. I have to wait till the end or stop and restart the writing process.


#6

Yes to all, with the current implementation. (Opening a file read-only that’s being written by another process is fine, but unpredictable and, hence, useless in production.)

As a footnote, SWMR works with parallel file systems, such as Lustre and GPFS, where clients clearly write and read over a network. The networked aspect per se is not the issue.

This article is always a good read: https://www.nextplatform.com/2017/09/11/whats-bad-posix-io/

There are a few new developments though. As we announced in the October’19 Webinar (https://www.hdfgroup.org/2019/10/webinar-announcement-new-hdf5-features-coming-in-2020-2021/), there are two related extensions (virtual file drivers - VFD) in the pipeline: Mirror VFD and SWMR VFD. Mirror VFD will let you write an HDF5 file locally and “simulcast” it (via TCP) to a remote machine. SWMR VFD is a VFD-based implementation of full SWMR (i.e., including object creation,etc.) and removes some restrictions of the first implementation.

To be clear, neither will solve your current problem with the existing tools (HDFView, etc.). This is an important issue and we’ll keep trying, but it goes beyond HDF5.

G.


#7

Depending on your requirements, it might be worthwhile to eplore another implementation of HDF5 (the data model) in a RESTful service interface: https://github.com/HDFGroup/hsds . You can deploy it in your environment and in the cloud. You can check it out at https://hdflab.hdfgroup.org/. Not only does it do SWMR, it does MWMR and a few other things. It comes with an h5py compatible client, but, if you are fluent in “REST-speak”, to Emacs, mobile devices, etc.

G.