How to properly close HDF files when disk is full?

If a disk becomes full while writing datasets, you get an OSError with errno 28, which is expected. However, h5py.File.close() also throws an OSError exception. If you terminate the application, you get further OSErrors with same errno, probably from open datasets.

If you remove the file on OS level with ‘rm’, it is no longer listed, however, the disk is still full (as shown with ‘df’). Only if the application is terminated, the disk space is freed.

What is the proper way to close an HDF file, if the disk is full?

I have put together a small example script to demonstrate the problem:
diskfull.py (1.6 KB)

I used a fixed path inside the script to point to /tmp dir, which is rather small. When running it on my PC, I get this output:

xspadmin@xspdemo03:~/tmp$ ./diskfull.py
error at index 559: No space left on device
close failed: No space left on device
Segmentation fault

The segfault is probably during destruction of the Writer class and the h5py.File class therein.

The other problem is, that as long as the script is running (i.e. you have some code to wait for user input before deleting ‘w’), then the HDF file is not released and you still see it in ‘lsof’, meaning the disk is still full, even if you just called File.close().

Surround the critical parts of your code, especially file operations, with try-except blocks to catch and handle exceptions. When you encounter a disk full condition, log the exception, and consider closing the file properly.

import h5py

try:
# Your HDF file operations here
with h5py.File(‘your_file.hdf5’, ‘w’) as file:
# Your dataset writing operations
# …

except OSError as e:
if e.errno == 28: # Disk full error
print(“Disk full error. Closing the file.”)
# Close the file if it’s still open
if file:
file.close()
else:
# Handle other OSError cases
print(f"Error: {e}")

except OSError as e:
if e.errno == 28: # Disk full error
print(“Disk full error. Closing the file.”)
# Close the file if it’s still open
if file:
file.close()
else:
# Handle other OSError cases
print(f"Error: {e}")

There is currently no “proper” way to close an HDF5 file when no storage is left on the device(s) you write to. Leaving aside the question of what “proper” could mean, the problem is that the state of an HDF5 file that was opened for writing consists of modified or new bytes in memory and bytes in storage. In a “disk full”-situation, you ask the library to decide what part of the state to scrap to “save your skin.” In the current implementation, the library has no concept of transactions or logic to make such decisions. Could this be implemented? Of course. Who wants to contribute and support the development?

Best, G.

Unfortunately, file.close() also throws an OSError exception, thus a single try-except is not working in all situations. Especially, if you have many datasets opened, then close() will throw again for each one while trying to close them.

Thanks a lot for your explanation. Will see, whether I can find some workaround, or even better some solution, which I then could possibly contribute.

Best regards,
Andreas

I think it’s pretty likely that this is a bug, probably in the HDF5 library but it might just be limited to h5py. I’m assuming that in @andreas.beckmann 's application the failure occurred after the f5fclose() on the last open HDF5 file ID and there were no other open objects associated with the file which would prevent closing even without the disk full.

The documentation on the h5fclose function is silent about the behavior of the function in a disk full situation.

That said, in unix OS’s when a file is deleted (unlinked) the disk space isn’t recovered until all the the file descriptors associated with it are closed. This is a feature in that it allows an application to open a file, delete the file (without closing), and then use the disk as temporary data storage area via the open file descriptor. The disk space is automatically recovered when the application exits–even if it crashes. It’s the last bit which is nice because it prevents the disk from filling up if a buggy application creates a lot of temporary files and then crashes before closing (as long as the file was unlinked after open).

This functionality isn’t the goal here, however. It should be possible for an HDF5 application to encounter a disk full situation and handle it–including recovering the disk space–without exiting. In other words, call unlink to delete the file if h5fclose() returns a error. The implication from this report is that can’t be done and that’s the bug.

If there were a mechanism to extract the unix file descriptor corresponding to the HDF5 then the user could close the file by bypassing the HDF5 library, but I think the best solution would be for h5fclose to always close the unix file id on disk full, returning an error if necessary.

The first step is probably to reproduce the problem in C to be rule out h5py as a cause.

1 Like

There is H5Fget_vfd_handle() to get the underlying file descriptor, but just closing this is a very bad idea, because HDF5 will still try to use the fd number. If your process opens another file afterwards, the fd number will be reused, so HDF5 might write to & close a completely different file!

A full disk isn’t the only scenario where you can have this problem - e.g. if your file is on a network filesystem and the connection goes down, writing can fail. We’ve also seen it in h5py where people wrap a file like object and then close that object before closing the HDF5 file.

To cope with this properly, I think HDF5 needs some way to ‘abandon’ an open file - i.e. discarding any unwritten data, accepting that the file on disk is likely corrupt or incomplete.

I agree with your last statement. Main goal should be to release all file handles, so that users are able to remove the file outside the context of the application to free up disk space.