Open/Close is sometimes very slow for serial 1.10.2, but not 1.8.17

After migrating from hdf5 1.8.17 to 1.10.2, one of my routines is extremely slow. I have narrowed the issue down to the following fortran test case.

Using serial hdf5, I open and then close a single file repeatedly. Most cycles are fast (~0.01 sec), but every 1 in 50 or so, the operation is long (~10 seconds, up to 60 seconds). The issue does not manifest with 1.8.17. The sample code is:

! ----------------------------------------------------------------------
! -- Measures the time to repeatedly open and close an hdf5 file
! -- Some cycles are long with hdf5 1.10.2, none are long with 1.8.17
! ----------------------------------------------------------------------
program main
   use hdf5
   implicit none
   integer, parameter :: wp = selected_real_kind(15,307)

   ! -- Some hardcoded parameters
   character(*), parameter :: fname = 'tmp.h5'
   integer, parameter :: N_cycles = 400
   real(wp), parameter :: warn_time = 1._wp     ! -- Warn if cycle is longer than this

   integer :: i, iher, my_cr, my_cm, ct_cur, ct_prv, ct_start, ct_end
   real(wp) :: my_time
   integer(HID_T) :: id_file

   call h5open_f(iher)
   if (iher/=0) stop 2

   ! -- Create file, overwrite if it already exists
   call h5fcreate_f(fname, H5F_ACC_TRUNC_F, id_file, iher)
   if (iher/=0) stop 2

   call h5fclose_f(id_file, iher)
   if (iher/=0) stop 2

   ! -- Initialize timing
   call system_clock(count_rate = my_cr, count_max = my_cm)
   call system_clock(ct_start)
   ct_cur = ct_start

   ! -- Main loop to open/close N times
   do i=1,N_cycles
      ! -- Open with read/write then close
      call h5fopen_f(fname,H5F_ACC_RDWR_F,id_file,iher)
      if (iher/=0) stop 2

      call h5fclose_f(id_file, iher)
      if (iher/=0) stop 2

      ct_prv = ct_cur
      call system_clock(ct_cur)

      ! -- Compute time of this cycle
      my_time = real(ct_cur-ct_prv,wp) / real(my_cr,wp)

      ! -- Output warning if we have a long cycle time
      if (my_time > warn_time) then
         write(*,'(a,i5,a,f10.5)') &
            ' -- Iteration ', i, ' had long cycle time (seconds): ', my_time
      endif
   enddo

   ! -- We have completed all cycles, measure total time
   call system_clock(ct_end)
   my_time = real(ct_end-ct_start,wp) / real(my_cr,wp)

   write(*,'(a,f10.5)') ' -- Complete, total time (seconds): ', my_time
end program main

I compile the code with ifort -I$HDF5_HOME/include -L$HDF5_HOME/lib -lhdf5_fortran main.f90 -o main, where $HDF5_HOME is the installation directory.
Using 1.10.2, a typical output is:

chaud106@ln0003 [~/reaqct/Test_h5open/post] % h5ls --version
h5ls: Version 1.10.2
chaud106@ln0003 [~/reaqct/Test_h5open/post] % ifort -I$HDF5_HOME/include -L$HDF5_HOME/lib -lhdf5_fortran main.f90 -o main
chaud106@ln0003 [~/reaqct/Test_h5open/post] % ./main 
 -- Iteration    98 had long cycle time (seconds):   18.01450
 -- Iteration    99 had long cycle time (seconds):   13.86730
 -- Iteration   100 had long cycle time (seconds):   13.60210
 -- Iteration   101 had long cycle time (seconds):   10.00920
 -- Iteration   140 had long cycle time (seconds):   12.47370
 -- Iteration   198 had long cycle time (seconds):   22.01530
 -- Iteration   199 had long cycle time (seconds):    8.00830
 -- Iteration   200 had long cycle time (seconds):    7.67160
 -- Iteration   201 had long cycle time (seconds):    9.06810
 -- Iteration   202 had long cycle time (seconds):    6.68700
 -- Iteration   240 had long cycle time (seconds):   15.99550
 -- Iteration   296 had long cycle time (seconds):   30.01490
 -- Iteration   297 had long cycle time (seconds):    5.99770
 -- Iteration   298 had long cycle time (seconds):    6.00180
 -- Iteration   300 had long cycle time (seconds):    8.00110
 -- Iteration   339 had long cycle time (seconds):   18.82060
 -- Iteration   394 had long cycle time (seconds):   23.76350
 -- Iteration   395 had long cycle time (seconds):    3.38680
 -- Iteration   397 had long cycle time (seconds):    8.00930
 -- Iteration   398 had long cycle time (seconds):    5.66050
 -- Iteration   400 had long cycle time (seconds):    8.00710
 -- Complete, total time (seconds):  256.84700

Using 1.8.17, I have never observed a long cycle. The typical output is:

chaud106@ln0003 [~/reaqct/Test_h5open/post] % h5ls --version
h5ls: Version 1.8.17
chaud106@ln0003 [~/reaqct/Test_h5open/post] % ifort -I$HDF5_HOME/include -L$HDF5_HOME/lib -lhdf5_fortran main.f90 -o main
chaud106@ln0003 [~/reaqct/Test_h5open/post] % ./main 
 -- Complete, total time (seconds):    0.08220

I am compiling my test code and HDF5 using Intel 18.0.0. I have only observed this behavior on one high-performance file system, I can provide details if necessary. I was unable to replicate the behavior on a different lustre-based system or NFS-based system.

I’ve noticed that the long cycles occur near multiples of 100 (iterations 98 through 101). While this toy code is being run, the entire machine is extremely sluggish. The result seems independent of the size/contents of the hdf5 file - this test cases uses an empty file, but I got similar results on a 2 GB production database. Finally, I read an hdf5 whitepaper on rapid open/close slowdown, but it seemed to be only for parallel operations.

I’m aware that rapidly opening and closing a file in this way is non-optimal, and I have reworked my code so that this issue has less impact. However, given that the behavior on 1.8.17 is so different, I think it is worth investigating.

Thank you for the report! We will need to investigate.

Could you please provide details about the file system where you observe such behavior? Would it be possible for you to try a similar C program?

Thank you,

Elena

One thing you could try is setting the HDF5_USE_FILE_LOCKING environment variable to ‘FALSE’ and then running your application. Does that make a difference?
That may help us narrow down the issue.

Thanks!
-Barbara

Thanks for the quick response.

The storage is a Panasas high-performance PanFS filesystem. A press release about its installation is here, which includes some details. I have also opened a support ticket with my cluster provider (MSI) pointing them to this post. My understanding of storage details is limited, so perhaps they can provide more/better information. I will update the support ticket with details from the forum and also update the forum with details from the ticket, if necessary. Are there any specific questions you have about the filesystem?

The issue seems to be the same in c, the code is:

#include <stdio.h>
#include <stdlib.h>

#include <sys/timeb.h>
#include <hdf5.h>

int main()
{
   hid_t id_file;
   const int N_cycles = 400;
   char *filename = "tmp.h5";
   int n;
   long start,end,current,previous;
   struct timeb tb;
   herr_t iher;

   H5open();

   // -- Create file, overwrite if it already exists
   id_file = H5Fcreate( filename, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT );
   if( id_file < 0 ) {
      printf("Error creating file \"%s\"\n", filename );
   }

   iher = H5Fclose( id_file );
   if( iher < 0 ) {
      printf("Error closing file with handle: \"%d\"\n", id_file );
   }

   // -- Start time
   (void) ftime( &tb );
   start = ((long) tb.time )*1000 + ((long) tb.millitm );
   current = start;

   // -- Main loop to open/close N times
   for(n=0;n<N_cycles;++n) {

      // -- Open with read/write then close
      id_file = H5Fopen( filename, H5F_ACC_RDWR, H5P_DEFAULT );
      if( id_file < 0 ) {
         printf("Error opening file \"%s\"\n", filename );
      }

      iher = H5Fclose( id_file );
      if( iher < 0 ) {
         printf("Error closing file with handle: \"%d\"\n", id_file );
      }

      // -- Compute time of this iteration
      previous = current;
      (void) ftime( &tb );
      current = ((long) tb.time )*1000 + ((long) tb.millitm );

      // -- Output warning if we have a long cycle time
      if( current - previous > 1000 ) {
         printf(" -- Iteration %5d had long cycle time (seconds): %10.5f\n",n,
            (float) (current - previous)/1000.0);
      }
   }

   (void) ftime( &tb );
   end = ((long) tb.time )*1000 + ((long) tb.millitm );
   printf(" -- Complete, total time (seconds): %10.5f\n",
      (float) (end - start)/1000.0);

   H5close();

   return(0);
}

On 1.8.17, as before, there is no problem:

chaud106@ln0003 [~/reaqct/Test_h5open/post] % h5ls --version
h5ls: Version 1.8.17
chaud106@ln0003 [~/reaqct/Test_h5open/post] % icc -I$HDF5_HOME/include -L$HDF5_HOME/lib -lhdf5 main.c -o main_c
chaud106@ln0003 [~/reaqct/Test_h5open/post] % ./main_c
 -- Complete, total time (seconds):    0.07900

With 1.10.2, the timings are similar to with fortran, for example:

chaud106@ln0003 [~/reaqct/Test_h5open/post] % h5ls --version
h5ls: Version 1.10.2
chaud106@ln0003 [~/reaqct/Test_h5open/post] % icc -I$HDF5_HOME/include -L$HDF5_HOME/lib -lhdf5 main.c -o main_c
chaud106@ln0003 [~/reaqct/Test_h5open/post] % ./main_c
 -- Iteration    97 had long cycle time (seconds):   13.09900
 -- Iteration    98 had long cycle time (seconds):   44.02200
 -- Iteration   194 had long cycle time (seconds):   35.61100
 -- Iteration   196 had long cycle time (seconds):    1.59700
 -- Iteration   197 had long cycle time (seconds):   10.40300
 -- Iteration   198 had long cycle time (seconds):   14.19400
 -- Iteration   289 had long cycle time (seconds):   12.95100
 -- Iteration   290 had long cycle time (seconds):   16.01000
 -- Iteration   291 had long cycle time (seconds):    8.00800
 -- Iteration   293 had long cycle time (seconds):    6.02100
 -- Iteration   294 had long cycle time (seconds):    8.01000
 -- Iteration   296 had long cycle time (seconds):    8.00800
 -- Iteration   388 had long cycle time (seconds):   28.85200
 -- Iteration   389 had long cycle time (seconds):    7.62500
 -- Iteration   390 had long cycle time (seconds):    7.00300
 -- Iteration   392 had long cycle time (seconds):    7.99700
 -- Iteration   394 had long cycle time (seconds):    7.99800
 -- Complete, total time (seconds):  239.16900

Once again, the issue seems centered around near-100 iteration multiples (97, 98, 194-198 …).

It looks like disabling file locking makes the issue better for both c and fortran:

chaud106@ln0003 [~/reaqct/Test_h5open/post] % h5ls --version
h5ls: Version 1.10.2
chaud106@ln0003 [~/reaqct/Test_h5open/post] % ifort -I$HDF5_HOME/include -L$HDF5_HOME/lib -lhdf5_fortran main.f90 -o main
chaud106@ln0003 [~/reaqct/Test_h5open/post] % HDF5_USE_FILE_LOCKING="FALSE" ./main
 -- Complete, total time (seconds):    0.56700
chaud106@ln0003 [~/reaqct/Test_h5open/post] % icc -I$HDF5_HOME/include -L$HDF5_HOME/lib -lhdf5 main.c -o main_c
chaud106@ln0003 [~/reaqct/Test_h5open/post] % HDF5_USE_FILE_LOCKING="FALSE" ./main_c
 -- Complete, total time (seconds):    0.58900

I haven’t yet looked at what HDF5_USE_FILE_LOCKING does or whether it would be appropriate for my application.

I have two updates on this issue.

First, I have run the test on the same machine but a different file system (local RAM disk), and the delay is not present. Thus, the very slow cycles have only been observed on this machine on their main file system, a PanFS system.

Second, I have the following information to relay from the support team for my resources regarding the file system:

The Panasas ActiveStor 14 hardware consists of 20 shelves, each with 1 Director
blade and 10 Object Storage blades. Each Storage blade has two 4TB SATA drives
and one 120GB solid state drive, for a total usable system capacity of about
1.4PB.

Based on what we’ve seen (i.e., lags of ~1 minute for simple open/close on
PanFS, and no such lag on /dev/shm (which is the local RAM disk), I suspect the
important difference is the PanFS is a Network File System, technically it is a
pre-release of NFS 4.1 which Panasas (the company, see panasas.com) custom
wrote for it’s storage systems. From my reading of the blogs, it sounds like
the newer version of HDF5 tries to access the “super block”, which is only
really accessible on a local file system. It’s possible that the newer HDF5
stalls when attempting to access a super block on PanFS (which does not exist),
and eventually times out after 50 sec or so. since it does not hang forever,
but actually eventually completes the open/close, it likely that it falls back
so some other method which then does work. There may be a way to tell HDF5 to
expect a NFS file system, possibly as a build time flag. You might relay this
info to the HDF group.

Their support team is also interested in resolving this issue, so please let me know here what you think the next steps should be. Is this enough information about the file system?

HI,

“From my reading of the blogs, it sounds like
the newer version of HDF5 tries to access the “super block”, which is only
really accessible on a local file system.” - Sorry, if we somehow made this impression. This shouldn’t be the case.

When HDF5 creates a file and then closes it, the library flushes all metadata information including super bock to the storage and releases all resources. We need to understand what is going on PanFS. It may be an issue in HDF5, but there was no real big changes except file locking in between 1.8 and 1.10. for open/close operations.

Is there any way for The HDF Group to get access to the file system?

Thank you!

Elena

I think that giving an individual from the HDF Group access to the file system is possible, but something the support team wants to avoid unless necessary.

They are interested in disabling file locking:

Operations tells me the PanFS already does file locking, and the extra
level of file locking is not needed. They are on this ticket and can
confirm or deny this.

Once confirmed that an extra level of file locking is NOT needed, we can
include the environmental variable as as part of the hdf5 module setup.

MSI operations also tells me that we run PanFS as an appliance, and do not
have much control over how it behaves. I would recommend that the HDF team
contact Panasas directly to resolve this issue.

Are there any issues we might experience by wholesale disabling file locking as an option for most users? Are you aware/does it make sense that PanFS already does file locking?

It should be okay to disable file locking. The file locking is used with SWMR. However, SWMR does not work with NFS.