Bug in handling short reads for external files

Hello,

the code for handling external file reads and writes currently does not handle the case when the read() or write() operation returns a number smaller than the requested amount.

Linux currently transfers at most 0x7ffff000 bytes per read() or write(), meaning that if more than 2GB are read from an external file only 2GB are read and the rest is filled up with zeros (because HDF5 thinks that it has reached EOF).
http://man7.org/linux/man-pages/man2/read.2.html#NOTES
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=758170#10
(Note that the problem does not occur if the files datatype and the memory datatype do not match, because in this case HDF5 will use smaller reads into a buffer for converting the values.)

I've attached a patch which will restart the read() or write() operation if a number smaller than the requested amount is returned (similar to e.g. the code in H5FDcore.c). I've also added logic which will restart the syscall after EINTR.

Best regards,
Steffen Kieß

efl-short-read.patch (1.95 KB)

Hi Steffen,

We reviewed the patch today. It looks good. Could you please send us a test? We would love to have it in HDF5 1.10.0. For your reference the issue number is HDFFV-9634.

Thanks a lot!

Elena

···

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Dec 18, 2015, at 6:04 AM, Steffen Kieß <Steffen.Kiess@ipvs.uni-stuttgart.de<mailto:Steffen.Kiess@ipvs.uni-stuttgart.de>> wrote:

Hello,

the code for handling external file reads and writes currently does not handle the case when the read() or write() operation returns a number smaller than the requested amount.

Linux currently transfers at most 0x7ffff000 bytes per read() or write(), meaning that if more than 2GB are read from an external file only 2GB are read and the rest is filled up with zeros (because HDF5 thinks that it has reached EOF).
http://man7.org/linux/man-pages/man2/read.2.html#NOTES
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=758170#10
(Note that the problem does not occur if the files datatype and the memory datatype do not match, because in this case HDF5 will use smaller reads into a buffer for converting the values.)

I've attached a patch which will restart the read() or write() operation if a number smaller than the requested amount is returned (similar to e.g. the code in H5FDcore.c). I've also added logic which will restart the syscall after EINTR.

Best regards,
Steffen Kieß
<efl-short-read.patch>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hello,

I've attached a simple test case, which fails on my system (Linux 3.13) with:

Value: 0
Error: Value should be 42

strace shows:
open("external-file", O_RDONLY) = 5
lseek(5, 0, SEEK_SET) = 0
read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2165309440) = 2147479552
close(5) = 0

However, the test will need >2GB RAM, so I'm not sure whether it is a good idea to add it to the test suite.

The test case also verifies that bytes after EOF are read as '0'.

Best regards,
Steffen Kie�

test.c (1.47 KB)

···

On 2015-12-19 00:24, Elena Pourmal wrote:

Hi Steffen,

We reviewed the patch today. It looks good. Could you please send us a
test? We would love to have it in HDF5 1.10.0. For your reference the
issue number is HDFFV-9634.

Thanks a lot!

Elena
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Dec 18, 2015, at 6:04 AM, Steffen Kie� > <Steffen.Kiess@ipvs.uni-stuttgart.de > <mailto:Steffen.Kiess@ipvs.uni-stuttgart.de>> wrote:

Hello,

the code for handling external file reads and writes currently does
not handle the case when the read() or write() operation returns a
number smaller than the requested amount.

Linux currently transfers at most 0x7ffff000 bytes per read() or
write(), meaning that if more than 2GB are read from an external file
only 2GB are read and the rest is filled up with zeros (because HDF5
thinks that it has reached EOF).
read(2) - Linux manual page
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=758170#10
(Note that the problem does not occur if the files datatype and the
memory datatype do not match, because in this case HDF5 will use
smaller reads into a buffer for converting the values.)

I've attached a patch which will restart the read() or write()
operation if a number smaller than the requested amount is returned
(similar to e.g. the code in H5FDcore.c). I've also added logic which
will restart the syscall after EINTR.

Best regards,
Steffen Kie�
<efl-short-read.patch>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi Steffen,

Thank you for the test. We will make it conditional. The test will not run when there is not enough memory.

Elena

···

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Dec 21, 2015, at 4:26 AM, Steffen Kieß <Steffen.Kiess@ipvs.uni-stuttgart.de<mailto:Steffen.Kiess@ipvs.uni-stuttgart.de>> wrote:

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Steffen Kieß
Sent: Monday, December 21, 2015 5:27 AM
To: hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] Bug in handling short reads for external files

Hello,

I've attached a simple test case, which fails on my system (Linux 3.13)
with:

Value: 0
Error: Value should be 42

strace shows:
open("external-file", O_RDONLY) = 5
lseek(5, 0, SEEK_SET) = 0
read(5,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
2165309440) = 2147479552
close(5) = 0

However, the test will need >2GB RAM, so I'm not sure whether it is a good idea to add it to the test suite.

The test case also verifies that bytes after EOF are read as '0'.

Best regards,
Steffen Kieß

This issue is still open and caught us by surprise while using external links to store large tensors in ML models. Tensors greater than 2GiB are silently truncated at 2GiB-1kB, and strace shows its exactly this failure mode:

In [1]: import numpy as np, h5py
   ...: big = np.ones(2**29, dtype=np.int32)
   ...: with open("external", "w") as e:
   ...:     big.tofile(e)
   ...: f = h5py.File("test.hdf5", "w")
   ...: internal = f.create_dataset("internal", data=big)
   ...: external_combo = f.create_dataset("external_combo", (2**29), dtype='i4', external=[("external", 0, 2**31)])
   ...: external_split = f.create_dataset("external_split", (2**29), dtype='i4', external=[("external", 0, 2**30),("external", 2**30, 2**30)])
   ...: print(internal[...])
   ...: print(external_combo[...])
   ...: print(external_split[...])
[1 1 1 ... 1 1 1]
[1 1 1 ... 0 0 0]
[1 1 1 ... 1 1 1]

With

#include "hdf5.h"
int buffer[1<<29];

int main(int argc, char** argv) {
   hid_t file, dset;
   if (argc < 3) {
       fprintf(stderr, "Usage: test filename dset_name");
       return 1;
   }

   unsigned mode = H5F_ACC_RDONLY;

   if ((file = H5Fopen(argv[1], mode, H5P_DEFAULT)) == H5I_INVALID_HID) {
       fprintf(stderr, "Couldn't open file %s\n", argv[1]);
       return 1;
   }
   if ((dset = H5Dopen2(file, argv[2], H5P_DEFAULT)) == H5I_INVALID_HID) {
       fprintf(stderr, "Couldn't open dset %s\n", argv[2]);
       return 1;
   }
   // read all dataset elements
   if (H5Dread(dset, H5T_NATIVE_INT, H5S_ALL, H5S_ALL, H5P_DEFAULT, buffer) < 0) {
       fprintf(stderr, "Couldn't read from dset %s", argv[2]);
       return 1;
   }
   printf("%d\n", buffer[(1<<29)-1]);

   // do something w/ the dataset elements

   H5Dclose(dset);
   H5Fclose(file);
   return 0;
}

we see :

$ strace -k -e read ./test test.hdf5 external_combo
read(4, "\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0"..., 2147483648) = 2147479552
 > libc-2.30.so(__read+0xe) [0xe898e]
 > HDF_Group/HDF5/1.14.1/lib/libhdf5.so.310.1.0(H5D__efl_readvv_cb+0x240) [0xd57c0]
 > HDF_Group/HDF5/1.14.1/lib/libhdf5.so.310.1.0(H5VM_opvv+0x1d2) [0x363c82]
...
0
+++ exited with 0 +++