Write cache?

Hello!

HDF5 implements data caching which improves read performance substantially:
https://support.hdfgroup.org/HDF5/doc/H5.user/Caching.html

However, it turns our like there is no write cache. Consider the following Pascal procedure which fills a 4Kx4K matrix row-wise:

procedure Test;
var
   Dll: THDF5Dll;
   dims, start, count: array of hsize_t;
   n: hsize_t;
   f: hid_t;
   d: hid_t;
   mems, s: hid_t;
   cpl: hid_t;
   v: array of Double;
   i: Integer;
begin
   Dll := THDF5Dll.Create('hdf5.dll');
   f := Dll.H5Fopen('test.hdf5', H5F_ACC_RDWR or H5F_ACC_CREAT, H5P_DEFAULT);

   SetLength(dims, 2);
   dims[0] := 4096;
   dims[1] := 4096;
   s := Dll.H5Screate_simple(2, Phsize_t(dims), nil);

   cpl := Dll.H5Pcreate(Dll.H5P_DATASET_CREATE);
   dims[0] := 1;
   dims[1] := 4096;
   Dll.H5Pset_chunk(cpl, 2, Phsize_t(dims));
   d := Dll.H5Dcreate2(f, 'matrix', Dll.H5T_INTEL_F64, s,
     H5P_DEFAULT, cpl, H5P_DEFAULT);

   Random(6031986);
   SetLength(v, 4096);
   for i := 0 to 4095 do
     v[i] := Random;
   n := 4096;
   mems := Dll.H5Screate_simple(1, @n, nil);

   dims[1] := 4096;
   SetLength(start, 2);
   start[1] := 0;
   SetLength(count, 2);
   count[0] := 1;
   count[1] := 4096;

   for i := 0 to 4095 do
   begin
     s := Dll.H5Dget_space(d);
     start[0] := i;
     Dll.H5Sselect_hyperslab(s, H5S_SELECT_SET,
       Phsize_t(start), nil, Phsize_t(count), nil);
     Dll.H5Dwrite(d, Dll.H5T_NATIVE_DOUBLE, mems, s, H5P_DEFAULT, PDouble(v));
     Dll.H5Sclose(s);
   end;
   Dll.H5Fflush(d, H5F_SCOPE_LOCAL);
end;

It's a minimal example, so memleaks are possible; but anyway the program finishes very quickly.
However, when I change

1)cpl := Dll.H5Pcreate(Dll.H5P_DATASET_CREATE);
   dims[0] := 1;
   dims[1] := 4096;
   Dll.H5Pset_chunk(cpl, 2, Phsize_t(dims));

to

2)cpl := Dll.H5Pcreate(Dll.H5P_DATASET_CREATE);
   dims[0] := 64;
   dims[1] := 64;
   Dll.H5Pset_chunk(cpl, 2, Phsize_t(dims));

I see the program slowing down by a factor of >30!

The detailed stats for the former and latter cases reveal the dramatic difference:

1) 1x4K chunks
Clock time (sec): 0.171
CPU time (sec): 0.141
I/O read (MB): 0.000
I/O read ops: 0
I/O write (MB): 134.416
I/O write ops: 4171

2) 64x64 chunks
Clock time (sec): 6.046
CPU time (sec): 5.969
I/O read (MB): 8455.717
I/O read ops: 258048
I/O write (MB): 8590.132
I/O write ops: 262222

It appears that for 64x64 case the data is written/read multiple times yielding a whooping 17G total I/O.
Is it possible that data caching is also implemented for write?

Best wishes,
Andrey Paramonov

···

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Hi Andrey,
  I believe that the chunk cache size is set too small (by default, it’s 1MB) for your I/O pattern when you make the dataset chunk size less aligned with the I/O pattern (by making them square, but still reading by rows). Can you try either increasing the chunk size (with H5Pset_chunk_cache - https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetChunkCache) or changing your access pattern to be more aligned with the chunk dimensions?

  Quincey

···

On May 2, 2017, at 6:09 AM, Андрей Парамонов <paramon@acdlabs.ru> wrote:

Hello!

HDF5 implements data caching which improves read performance substantially:
https://support.hdfgroup.org/HDF5/doc/H5.user/Caching.html

However, it turns our like there is no write cache. Consider the following Pascal procedure which fills a 4Kx4K matrix row-wise:

procedure Test;
var
Dll: THDF5Dll;
dims, start, count: array of hsize_t;
n: hsize_t;
f: hid_t;
d: hid_t;
mems, s: hid_t;
cpl: hid_t;
v: array of Double;
i: Integer;
begin
Dll := THDF5Dll.Create('hdf5.dll');
f := Dll.H5Fopen('test.hdf5', H5F_ACC_RDWR or H5F_ACC_CREAT, H5P_DEFAULT);

SetLength(dims, 2);
dims[0] := 4096;
dims[1] := 4096;
s := Dll.H5Screate_simple(2, Phsize_t(dims), nil);

cpl := Dll.H5Pcreate(Dll.H5P_DATASET_CREATE);
dims[0] := 1;
dims[1] := 4096;
Dll.H5Pset_chunk(cpl, 2, Phsize_t(dims));
d := Dll.H5Dcreate2(f, 'matrix', Dll.H5T_INTEL_F64, s,
   H5P_DEFAULT, cpl, H5P_DEFAULT);

Random(6031986);
SetLength(v, 4096);
for i := 0 to 4095 do
   v[i] := Random;
n := 4096;
mems := Dll.H5Screate_simple(1, @n, nil);

dims[1] := 4096;
SetLength(start, 2);
start[1] := 0;
SetLength(count, 2);
count[0] := 1;
count[1] := 4096;

for i := 0 to 4095 do
begin
   s := Dll.H5Dget_space(d);
   start[0] := i;
   Dll.H5Sselect_hyperslab(s, H5S_SELECT_SET,
     Phsize_t(start), nil, Phsize_t(count), nil);
   Dll.H5Dwrite(d, Dll.H5T_NATIVE_DOUBLE, mems, s, H5P_DEFAULT, PDouble(v));
   Dll.H5Sclose(s);
end;
Dll.H5Fflush(d, H5F_SCOPE_LOCAL);
end;

It's a minimal example, so memleaks are possible; but anyway the program finishes very quickly.
However, when I change

1)cpl := Dll.H5Pcreate(Dll.H5P_DATASET_CREATE);
dims[0] := 1;
dims[1] := 4096;
Dll.H5Pset_chunk(cpl, 2, Phsize_t(dims));

to

2)cpl := Dll.H5Pcreate(Dll.H5P_DATASET_CREATE);
dims[0] := 64;
dims[1] := 64;
Dll.H5Pset_chunk(cpl, 2, Phsize_t(dims));

I see the program slowing down by a factor of >30!

The detailed stats for the former and latter cases reveal the dramatic difference:

1) 1x4K chunks
Clock time (sec): 0.171
CPU time (sec): 0.141
I/O read (MB): 0.000
I/O read ops: 0
I/O write (MB): 134.416
I/O write ops: 4171

2) 64x64 chunks
Clock time (sec): 6.046
CPU time (sec): 5.969
I/O read (MB): 8455.717
I/O read ops: 258048
I/O write (MB): 8590.132
I/O write ops: 262222

It appears that for 64x64 case the data is written/read multiple times yielding a whooping 17G total I/O.
Is it possible that data caching is also implemented for write?

Best wishes,
Andrey Paramonov

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

06.05.2017 8:43, Quincey Koziol пишет:

Hi Andrey,
  I believe that the chunk cache size is set too small (by default, it’s 1MB) for your I/O pattern when you make the dataset chunk size less aligned with the I/O pattern (by making them square, but still reading by rows). Can you try either increasing the chunk size (with H5Pset_chunk_cache - https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetChunkCache) or changing your access pattern to be more aligned with the chunk dimensions?

Hello Quincey!

It could of course be possible that I implemented outer layer of caching to help HDF5 behave more efficiently, but I like when HDF5 just does it right by itself :wink:

Fortunately, increasing cache size to 2MB (the total size of "volatile" chunks) helps! Thank you for the pointer.

Upon a bit further investigation it shocked me that the behavior changes dramatically when the cache size is even 1 byte short.

For cache size of 2097152 bytes, I get:
Clock time (sec): 0.533
CPU time (sec): 0.531
I/O read (MB): 0.000
I/O read ops: 0
I/O write (MB): 134.416
I/O write ops: 4171

While for cache size of 2097151 bytes:
Clock time (sec): 4.580
CPU time (sec): 4.578
I/O read (MB): 8455.717
I/O read ops: 258048
I/O write (MB): 8590.132
I/O write ops: 262219

I would expect more gradual change in speed when changing cache size from 0 bytes to 2MB. Doesn't it look like a performance problem?

Best wishes,
Andrey Paramonov

···

On May 2, 2017, at 6:09 AM, Андрей Парамонов <paramon@acdlabs.ru> wrote:

Hello!

HDF5 implements data caching which improves read performance substantially:
https://support.hdfgroup.org/HDF5/doc/H5.user/Caching.html

However, it turns our like there is no write cache. Consider the following Pascal procedure which fills a 4Kx4K matrix row-wise:

procedure Test;
var
  Dll: THDF5Dll;
  dims, start, count: array of hsize_t;
  n: hsize_t;
  f: hid_t;
  d: hid_t;
  mems, s: hid_t;
  cpl: hid_t;
  v: array of Double;
  i: Integer;
begin
  Dll := THDF5Dll.Create('hdf5.dll');
  f := Dll.H5Fopen('test.hdf5', H5F_ACC_RDWR or H5F_ACC_CREAT, H5P_DEFAULT);

  SetLength(dims, 2);
  dims[0] := 4096;
  dims[1] := 4096;
  s := Dll.H5Screate_simple(2, Phsize_t(dims), nil);

  cpl := Dll.H5Pcreate(Dll.H5P_DATASET_CREATE);
  dims[0] := 1;
  dims[1] := 4096;
  Dll.H5Pset_chunk(cpl, 2, Phsize_t(dims));
  d := Dll.H5Dcreate2(f, 'matrix', Dll.H5T_INTEL_F64, s,
    H5P_DEFAULT, cpl, H5P_DEFAULT);

  Random(6031986);
  SetLength(v, 4096);
  for i := 0 to 4095 do
    v[i] := Random;
  n := 4096;
  mems := Dll.H5Screate_simple(1, @n, nil);

  dims[1] := 4096;
  SetLength(start, 2);
  start[1] := 0;
  SetLength(count, 2);
  count[0] := 1;
  count[1] := 4096;

  for i := 0 to 4095 do
  begin
    s := Dll.H5Dget_space(d);
    start[0] := i;
    Dll.H5Sselect_hyperslab(s, H5S_SELECT_SET,
      Phsize_t(start), nil, Phsize_t(count), nil);
    Dll.H5Dwrite(d, Dll.H5T_NATIVE_DOUBLE, mems, s, H5P_DEFAULT, PDouble(v));
    Dll.H5Sclose(s);
  end;
  Dll.H5Fflush(d, H5F_SCOPE_LOCAL);
end;

It's a minimal example, so memleaks are possible; but anyway the program finishes very quickly.
However, when I change

1)cpl := Dll.H5Pcreate(Dll.H5P_DATASET_CREATE);
  dims[0] := 1;
  dims[1] := 4096;
  Dll.H5Pset_chunk(cpl, 2, Phsize_t(dims));

to

2)cpl := Dll.H5Pcreate(Dll.H5P_DATASET_CREATE);
  dims[0] := 64;
  dims[1] := 64;
  Dll.H5Pset_chunk(cpl, 2, Phsize_t(dims));

I see the program slowing down by a factor of >30!

The detailed stats for the former and latter cases reveal the dramatic difference:

1) 1x4K chunks
Clock time (sec): 0.171
CPU time (sec): 0.141
I/O read (MB): 0.000
I/O read ops: 0
I/O write (MB): 134.416
I/O write ops: 4171

2) 64x64 chunks
Clock time (sec): 6.046
CPU time (sec): 5.969
I/O read (MB): 8455.717
I/O read ops: 258048
I/O write (MB): 8590.132
I/O write ops: 262222

It appears that for 64x64 case the data is written/read multiple times yielding a whooping 17G total I/O.
Is it possible that data caching is also implemented for write?

Best wishes,
Andrey Paramonov

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.