File size and ulimit


#1

I have a piece of code that writes very large files (>4TB) in parallel. I am worried about the behaviour of the library when the file size grows beyond what is set at the file-system level by ulimit -f. This is not specified in the documentation as far as I can tell.

Will hdf5 produce some form of error message during a write? Or will it silently fail and produce truncated files? Or is that limit somehow bypassed?

Thanks in advance for any clarification of the behaviour.


#2

Good question! I set the file limit to low value ulimit -f 10000 then ran a profile example with 12GB HDF5 file size, HDF5 error stack set to default (printing enabled):
File size limit exceeded (core dumped) trying to dump the created file leads to:
h5dump error: unable to open file "tick.h5"
It got this far:
-rw-r--r-- 1 steven steven 102400000 Nov 18 09:07 tick.h5

Distributor ID: LinuxMint
Description: Linux Mint 19.1 Tessa
Release: 19.1
Codename: tessa
HDF5: v1.10.5

best: steve


#3

Thanks for coming back with the answer.

So I should expect an error message when this happens.


#4

As it appears, not from the HDF5 CAPI. Notice that the posted message is from the shell. The HDF5 library appears to crash and there is no HDF5 error stack printed. On my system, one has to be certain there is enough room left on device before starting IO operations.


#5

Ah thanks. I was confused by your earlier answer and thought the h5dump was only used to look at the file after creation.

When you say enough room, do you mean space on the device or that the file needs to be smaller than then ulimit? Or both?


#6

From the application’s perspective there must be enough room. Here is the setrlimit on it:

RLIMIT_FSIZE
              This is the maximum size in bytes of files that the process
              may create.  Attempts to extend a file beyond this limit
              result in delivery of a SIGXFSZ signal.  By default, this sig‐
              nal terminates a process, but a process can catch this signal
              instead, in which case the relevant system call (e.g.,
              write(2), truncate(2)) fails with the error EFBIG.

and the signal man page:

Standard signals:
...
SIGXFSZ      P2001      Core    File size limit exceeded (4.2BSD);
                                       see setrlimit(2)

hope it helps.
steve