File closure hangs



I was thinking about whether to put it into h5py or here, but since I’m using the low-level API I guess this would happen even with only the C library.

I am referencing a very similar issue we were unable to solve for quite some time and had very similar symptoms.

I will try a short way first - I have a parallel program writing into datasets created in serial mode by a master process. These are created with ALLOC_TIME_EARLY and FILL_TIME_NEVER properties I have verified that. The program finishes the writing phase and then hangs on file closure, stack trace below:

#0  0x00002b8757394591 in opal_progress ()
   from /apps/all/OpenMPI/4.1.4-GCC-11.3.0/lib/
#1  0x00002b875739a7c5 in ompi_sync_wait_mt ()
   from /apps/all/OpenMPI/4.1.4-GCC-11.3.0/lib/
#2  0x00002b8755615edb in ompi_request_default_wait ()
   from /apps/all/OpenMPI/4.1.4-GCC-11.3.0/lib/
#3  0x00002b87556683e4 in ompi_coll_base_bcast_intra_generic ()
   from /apps/all/OpenMPI/4.1.4-GCC-11.3.0/lib/
#4  0x00002b87bea1f3ec in ompi_coll_tuned_bcast_intra_dec_fixed ()
   from /apps/all/OpenMPI/4.1.4-GCC-11.3.0/lib/openmpi/
#5  0x00002b87556295f8 in PMPI_Bcast ()
   from /apps/all/OpenMPI/4.1.4-GCC-11.3.0/lib/
#6  0x00002b8755a863c5 in H5FD__mpio_truncate ()
   from /home/caucau/SDSSCube/ext_lib/hdf5-1.13.1/hdf5/lib/
#7  0x00002b8755870c3a in H5FD_truncate ()
   from /home/caucau/SDSSCube/ext_lib/hdf5-1.13.1/hdf5/lib/
#8  0x00002b875585959c in H5F__flush_phase2 ()
   from /home/caucau/SDSSCube/ext_lib/hdf5-1.13.1/hdf5/lib/
#9  0x00002b8755859bbf in H5F__dest ()
   from /home/caucau/SDSSCube/ext_lib/hdf5-1.13.1/hdf5/lib/
#10 0x00002b875585c627 in H5F_try_close.localalias ()
   from /home/caucau/SDSSCube/ext_lib/hdf5-1.13.1/hdf5/lib/
#11 0x00002b875585c91c in H5F__close ()
   from /home/caucau/SDSSCube/ext_lib/hdf5-1.13.1/hdf5/lib/
#12 0x00002b8755a5007f in H5VL__native_file_close ()
   from /home/caucau/SDSSCube/ext_lib/hdf5-1.13.1/hdf5/lib/
#13 0x00002b8755a3ea6e in H5VL_file_close ()
   from /home/caucau/SDSSCube/ext_lib/hdf5-1.13.1/hdf5/lib/
#14 0x00002b8755859886 in H5F__close_cb ()
   from /home/caucau/SDSSCube/ext_lib/hdf5-1.13.1/hdf5/lib/
#15 0x00002b87558cf6da in H5I_dec_app_ref ()
   from /home/caucau/SDSSCube/ext_lib/hdf5-1.13.1/hdf5/lib/
#16 0x00002b87558cbfd2 in H5Idec_ref ()
   from /home/caucau/SDSSCube/ext_lib/hdf5-1.13.1/hdf5/lib/
#17 0x00002b8782daccd0 in __pyx_f_4h5py_4defs_H5Idec_ref (
    at /home/caucau/SDSSCube/ext_lib/h5py/h5py/defs.c:17945
#18 0x00002b8783040d85 in __pyx_pf_4h5py_3h5f_6FileID_2_close_open_objects (
    __pyx_v_types=<optimized out>, __pyx_v_self=<optimized out>)
    at /home/caucau/SDSSCube/ext_lib/h5py/h5py/h5f.c:5291
#19 __pyx_pw_4h5py_3h5f_6FileID_3_close_open_objects (
    __pyx_v_self=<optimized out>, __pyx_arg_types=<optimized out>)
    at /home/caucau/SDSSCube/ext_lib/h5py/h5py/h5f.c:5151
#20 0x00002b8755cf2c9f in method_vectorcall_O () at Objects/descrobject.c:416
#21 0x00002b8755cf0fcd in PyVectorcall_Call () at Objects/call.c:199
#22 0x00002b8782df7415 in __Pyx_PyObject_Call (kw=0x2b88b25eea80, 
    arg=0x2b88b25eb600, func=<optimized out>)
    at /home/caucau/SDSSCube/ext_lib/h5py/h5py/_objects.c:11697
#23 __pyx_pf_4h5py_8_objects_9with_phil_wrapper (__pyx_v_kwds=0x2b88662fd040, 
    __pyx_v_args=0x2b88b25eb600, __pyx_self=<optimized out>)
    at /home/caucau/SDSSCube/ext_lib/h5py/h5py/_objects.c:4253
#24 __pyx_pw_4h5py_8_objects_9with_phil_1wrapper (__pyx_self=<optimized out>, 
    __pyx_args=0x2b88b25eb600, __pyx_kwds=<optimized out>)
    at /home/caucau/SDSSCube/ext_lib/h5py/h5py/_objects.c:4172

This happened earlier when I try to write in parallel to datasets that are not allocated early in parallel. Interesting is that this runs fine with ~ 32 processes on 8 nodes, but when run with 128 processes per node on 2 nodes, it hangs. However, before diving into the code further, I have a generic question - is there a way to determine what is the reason for the processes hanging on file closure?

I am a little confused the HDF5 library is obviously letting me do stuff when writing the dataset which is not consistent and will not pass the file closure, shouldn’t this be throwing errors when I try to do the action rather than just hanging on file close? Maybe there is some debug mode that would tell me this and I missed that?

Thanks for your help.



And there is more funny part to it - at least some data (a lot) is written properly. Though I can’t verify that since I don’t have a reference file for this huge dataset. But the important part is that the dataset is consistent and I can read it even when I kill the processes that were stuck on the file closure.

This is one of the datasets being written.

DATASET "image_cutouts_data" {
              H5T_STRING {
                 STRSIZE 64;
                 STRPAD H5T_STR_NULLPAD;
                 CSET H5T_CSET_UTF8;
                 CTYPE H5T_C_S1;
              } "ds_path";
              H5T_STD_I64LE "ds_slice_idx";
              H5T_STD_I32LE "x_min";
              H5T_STD_I32LE "x_max";
              H5T_STD_I32LE "y_min";
              H5T_STD_I32LE "y_max";
           DATASPACE  SIMPLE { ( 4851200, 200 ) / ( 4851200, 200 ) }
           STORAGE_LAYOUT {
              CHUNKED ( 100, 200 )
              SIZE 85381120000
           FILTERS {
           FILLVALUE {
           ALLOCATION_TIME {


Hi everybody, I solved it and will share it here for others who might find it helpful.

In the end, the issue was trivial - not all the workers made it to the file closure barrier, which is why the closure hung. So if you see similar symptoms, it can actually be as simple as that :).