HDF5 files - backup solutions (rsync, etc.)


I am wondering, since the last thread about HDF5 and rsync (about 7 years ago), whether anything has changed, or if anyone has come up with any clever solutions, for backing up HDF5 files. I have a large collection of HDF5 files (in the thousands) that are updated daily, and I was interested in a way to back the files up daily without copying entire files every time they change - as rsync seems to do. I do use the gzip compression filter - but from what I gather, chunking should make it such that compression is not a disqualifier for a smarter backup solution.

I did trip over some talk about HDF5 journaling - but that does not seem to have materialized (yet).

Thanks for any thoughts or ideas here.

I do some similar file transfers (not really backups) with multi TB data sets split into ~1GB files. rsync -avhP is what I use. It only copies new files, but I haven’t checked what happens if you modify a large file just a little bit. rsync with these options don’t do any checksumming AFAIK, but relies on timestamps I think. It works for me since I don’t edit my .h5 files (well, not the huge ones). Checking thousands of files takes just a few seconds with these options.

I haven’t seen the old thread so can’t comment on changes in rsync/hdf5.