h5fcreate 1.10 unable to lock

The metadata-related changes in hdf5 1.10 have made it possible for my (massively parallel) simulation code to restart from a checkpoint.

However, h5fcreate fails to open a file in serial (on BG/Q). In a minimal test program, a simple h5fcreate call (in a serial program running on 1 processor) results in an error (in contrast, with hdf5 1.8.10, the file is successfully created):

CALL h5fcreate_f(fileName, H5F_ACC_EXCL_F, fileId, h5err)

fails to create the file (it has zero size), and yields the errors:

HDF5-DIAG: Error detected in HDF5 (1.10.0) thread 0:
   #000: H5F.c line 491 in H5Fcreate(): unable to create file
     major: File accessibilty
     minor: Unable to open file
   #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or
initialize file structure
     major: File accessibilty
     minor: Unable to open file
   #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
     major: Virtual File Layer
     minor: Can't update object
   #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file,
errno = 38, error message = 'Function not implemented'
     major: File accessibilty
     minor: Bad file ID accessed
  Tried to create file, err = -1
HDF5-DIAG: Error detected in HDF5 (1.10.0) thread 0:
   #000: H5F.c line 749 in H5Fclose(): not a file ID
     major: Invalid arguments to routine
     minor: Inappropriate type

Creating files in parallel does not seem to be a problem.

Is there a work-around? Even in the full simulation code, this is a straightforward serial open/read/write by a single process; there is no danger of multiple readers, etc.

Might this issue be fixed by the recent patch to 1.10?

Thanks,
Greg.

Hi Greg,

It looks like flock(2) is available as an API call, but is not implemented for that file system, so it returns a failure code. The HDF5 library only inspects the flock() return value and not errno, so we just note the failure and our API call fails in turn.

Just out of curiosity, is this a Lustre file system? I've heard that the overhead for locking is high, so admins often disable it.

Unfortunately, there is no work-around for the file-locking calls in either HDF5 1.10.0 or 1.10.0-patch1 aside from modifying the source. Also unfortunately, you are not the only person who is tripping over the file locking issue when it is unnecessary or unwanted.

For the very short term, I'm considering putting a source patch on our website that will disable the file locking. You'll have to apply the patch and build the library yourself, but this would fix your problem. Let me check into how to best accomplish this and I'll shoot for getting this out next week sometime.

Our current plan to really fix the issue is to start by generating an RFC describing the issue and our proposed solutions. After a brief period for comments, we'll implement the changes for HDF5 1.10.1, which should be released in the very near future (mid-summer, I believe). Before the release date you'll be able to use a snapshot to get the functionality. Since this is a problem that affects several users, I'm going to be keen on getting this into a snapshot ASAP so hopefully you won't have to wait long for official functionality that addresses your problem.

Dana Robinson
Software Engineer
The HDF Group

···

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Greg Werner
Sent: Thursday, June 2, 2016 11:29 AM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] h5fcreate 1.10 unable to lock

The metadata-related changes in hdf5 1.10 have made it possible for my (massively parallel) simulation code to restart from a checkpoint.

However, h5fcreate fails to open a file in serial (on BG/Q). In a minimal test program, a simple h5fcreate call (in a serial program running on 1
processor) results in an error (in contrast, with hdf5 1.8.10, the file is successfully created):

CALL h5fcreate_f(fileName, H5F_ACC_EXCL_F, fileId, h5err)

fails to create the file (it has zero size), and yields the errors:

HDF5-DIAG: Error detected in HDF5 (1.10.0) thread 0:
   #000: H5F.c line 491 in H5Fcreate(): unable to create file
     major: File accessibilty
     minor: Unable to open file
   #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
     major: File accessibilty
     minor: Unable to open file
   #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
     major: Virtual File Layer
     minor: Can't update object
   #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 38, error message = 'Function not implemented'
     major: File accessibilty
     minor: Bad file ID accessed
  Tried to create file, err = -1
HDF5-DIAG: Error detected in HDF5 (1.10.0) thread 0:
   #000: H5F.c line 749 in H5Fclose(): not a file ID
     major: Invalid arguments to routine
     minor: Inappropriate type

Creating files in parallel does not seem to be a problem.

Is there a work-around? Even in the full simulation code, this is a straightforward serial open/read/write by a single process; there is no danger of multiple readers, etc.

Might this issue be fixed by the recent patch to 1.10?

Thanks,
Greg.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

For the record, this problem (flock not being available on BG/Q with GPFS, IBM General Parallel File System) is fixed by the "Patch to Disable File Locking", a patch to 1.10-0-patch1, namely file-lock-removal.diff at

https://www.hdfgroup.org/HDF5/release/obtainsrc5110.html#patch

Greg.

···

On Thu, 2 Jun 2016, Dana Robinson wrote:

Hi Greg,

It looks like flock(2) is available as an API call, but is not implemented for that file system, so it returns a failure code. The HDF5 library only inspects the flock() return value and not errno, so we just note the failure and our API call fails in turn.

Just out of curiosity, is this a Lustre file system? I've heard that the overhead for locking is high, so admins often disable it.

Unfortunately, there is no work-around for the file-locking calls in either HDF5 1.10.0 or 1.10.0-patch1 aside from modifying the source. Also unfortunately, you are not the only person who is tripping over the file locking issue when it is unnecessary or unwanted.

For the very short term, I'm considering putting a source patch on our website that will disable the file locking. You'll have to apply the patch and build the library yourself, but this would fix your problem. Let me check into how to best accomplish this and I'll shoot for getting this out next week sometime.

Our current plan to really fix the issue is to start by generating an RFC describing the issue and our proposed solutions. After a brief period for comments, we'll implement the changes for HDF5 1.10.1, which should be released in the very near future (mid-summer, I believe). Before the release date you'll be able to use a snapshot to get the functionality. Since this is a problem that affects several users, I'm going to be keen on getting this into a snapshot ASAP so hopefully you won't have to wait long for official functionality that addresses your problem.

Dana Robinson
Software Engineer
The HDF Group

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Greg Werner
Sent: Thursday, June 2, 2016 11:29 AM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] h5fcreate 1.10 unable to lock

The metadata-related changes in hdf5 1.10 have made it possible for my (massively parallel) simulation code to restart from a checkpoint.

However, h5fcreate fails to open a file in serial (on BG/Q). In a minimal test program, a simple h5fcreate call (in a serial program running on 1
processor) results in an error (in contrast, with hdf5 1.8.10, the file is successfully created):

CALL h5fcreate_f(fileName, H5F_ACC_EXCL_F, fileId, h5err)

fails to create the file (it has zero size), and yields the errors:

HDF5-DIAG: Error detected in HDF5 (1.10.0) thread 0:
  #000: H5F.c line 491 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 38, error message = 'Function not implemented'
    major: File accessibilty
    minor: Bad file ID accessed
Tried to create file, err = -1
HDF5-DIAG: Error detected in HDF5 (1.10.0) thread 0:
  #000: H5F.c line 749 in H5Fclose(): not a file ID
    major: Invalid arguments to routine
    minor: Inappropriate type

Creating files in parallel does not seem to be a problem.

Is there a work-around? Even in the full simulation code, this is a straightforward serial open/read/write by a single process; there is no danger of multiple readers, etc.

Might this issue be fixed by the recent patch to 1.10?

Thanks,
Greg.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5