How to disable file IO caching on POSIX with SEC2 driver


#1

Guaranteed cache-disabling is helpful to estimate throughput for given internal setting treating HDF5 library as black box.
In my interpretation sec2 driver is an unbuffered IO with a regular posix handle. However when I tried to set in H5FDsec2.c near line #343
o_flags |= (O_SYNC | O_DIRECT); to disable OS caching I bumped into errors, posted below, gcc screaming at me this is not the right nor is the easy way. What is the best strategy to disable ALL read and write caching using HDF5 CAPI?

HDF5-DIAG: Error detected in HDF5 (1.10.4) thread 0:
  #000: H5F.c line 444 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1364 in H5F__create(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #002: H5Fint.c line 1817 in H5F_open(): problems closing file
    major: File accessibilty
    minor: Unable to close file
  #003: H5Fint.c line 1279 in H5F__dest(): problems closing file
    major: File accessibilty
    minor: Unable to release object
  #004: H5Faccum.c line 1070 in H5F__accum_reset(): can't flush metadata accumulator
    major: File accessibilty
    minor: Unable to flush data from cache
  #005: H5Faccum.c line 1033 in H5F__accum_flush(): file write failed
    major: Low-level I/O
    minor: Write failed
  #006: H5FDint.c line 258 in H5FD_write(): driver write request failed
    major: Virtual File Layer
    minor: Write failed
  #007: H5FDsec2.c line 812 in H5FD_sec2_write(): file write failed: time = Fri Nov  2 14:18:10 2018
, filename = 'tick.h5', file descriptor = 3, errno = 22, error message = 'Invalid argument', buf = 0xff4458, total write size = 800, bytes this sub-write = 800, bytes actually written = 18446744073709551615, offset = 0
    major: Low-level I/O
    minor: Write failed
  #008: H5Fint.c line 1783 in H5F_open(): unable to flush superblock
    major: File accessibilty
    minor: Unable to flush data from cache
  #009: H5Fio.c line 198 in H5F_flush_tagged_metadata(): can't reset accumulator
    major: Low-level I/O
    minor: Can't reset object
  #010: H5Faccum.c line 1070 in H5F__accum_reset(): can't flush metadata accumulator
    major: File accessibilty
    minor: Unable to flush data from cache
  #011: H5Faccum.c line 1033 in H5F__accum_flush(): file write failed
    major: Low-level I/O
    minor: Write failed
  #012: H5FDint.c line 258 in H5FD_write(): driver write request failed
    major: Virtual File Layer
    minor: Write failed
  #013: H5FDsec2.c line 812 in H5FD_sec2_write(): file write failed: time = Fri Nov  2 14:18:10 2018
, filename = 'tick.h5', file descriptor = 3, errno = 22, error message = 'Invalid argument', buf = 0xff3a38, total write size = 96, bytes this sub-write = 96, bytes actually written = 18446744073709551615, offset = 0
    major: Low-level I/O
    minor: Write failed
terminate called after throwing an instance of 'h5::error::io::file::create'
  what():  ../../h5cpp/H5Fcreate.hpp line#  57 : couldn't create file...
Aborted


#2

Steven, how are you? Your patch looks good except the O_DIRECT, which “requires aligned buffers
and probably won’t work.” Here’s a patch provided by Quincey that uses an environment variable
to make it a little more “user-friendly.” WARNING: We do not recommend to use “hacked” versions of the
library in a production setup.

--- a/src/H5FDsec2.c
+++ b/src/H5FDsec2.c
@@ -340,6 +340,14 @@ H5FD_sec2_open(const char *name, unsigned flags, hid_t fapl_id, haddr_t maxaddr)
     if(H5F_ACC_EXCL & flags)
         o_flags |= O_EXCL;
 
+{
+    const char  *val;
+
+    val = HDgetenv("HDF5_NO_SYNC");
+    if(val)
+        o_flags |= O_SYNC;
+}
+
     /* Open the file */
     if((fd = HDopen(name, o_flags, H5_POSIX_CREATE_MODE_RW)) < 0) {
         int myerrno = errno;

#3

Hey Gerd, thank you for looking into it. The patch came late to the party, because I did some black box analysis by writing varying size of data to disk in four different ways:

  • HDF5™ CAPI plain write built in pipeline/filter chain + space selection
  • H5CPP with custom pipeline and HDF5™ chunk write
  • H5CPP append/iterative write
  • armadillo linalg system direct IO/raw write

Data blocks of 35MB, 350MB and 3500MB were written onto disk then throughput measured.The CAPI write call throughput seemed to be capped at 35MB/sec regardless of size, indicating there is some code path in CAPI regular write limiting throughput before hitting disk IO. The other three alternatives were in the same ballpark: ~2GB/sec for small ( buffered ) writes, and ~280MB/sec for large datasize.

cheers,
steven