HDF5-1.10.0 and flock()


#1

Hi all,

I was wondering if HDF5 was going to be keep the 1.8.x branch going? Or is it recommend to move to the 1.10.x?

I'm asking as we all know for SWMR you need flock() and that you can not disable SWMR at compile time (I don't need it in my day to day use).

On one of the clusters I run on we've got a Lustre file-system. However the admin's have deemed that file locking is too expensive and have disabled it. Here's the mount information:

mds01ib@o2ib1:mds02ib@o2ib1:/scratch on /lustre/janus_scratch type lustre (rw,noauto,_netdev)

So when I run a very simple test to create a HDF5 with version 1.10.0 on this file system it fails:

janus-compile1 ~$ ./test /lustre/janus_scratch/tibr1099/foo.h5
HDF5-DIAG: Error detected in HDF5 (1.10.0) thread 0:
  #000: H5F.c line 491 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 38, error message = 'Function not implemented'
    major: File accessibilty
    minor: Bad file ID accessed
Unable to open: /lustre/janus_scratch/tibr1099/foo.h5: -1
1

When I strace the program I see it's because flock() failed:

open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
close(3) = 0
open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
flock(3, LOCK_EX|LOCK_NB) = -1 ENOSYS (Function not implemented)
close(3) = 0

Versus if I trace the program with version 1.8.15:

open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
close(3) = 0
open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
brk(0x235a000) = 0x235a000
mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f17252b8000

So my long winded example leads to three questions.
1) Do other HPC sites enable flock() on lustre? If so is it only localflock so as not to have the burden of a cluster wide flock?
2) Is there a path forward for sites that don't enable flock?
3) Is there the opposite of H5Fstart_swmr_write?

Thanks!
Tim

test.f90 (1.12 KB)


#2

Hi Tim,

Hi all,

I was wondering if HDF5 was going to be keep the 1.8.x branch going? Or is it recommend to move to the 1.10.x?

Yes, we will keep 1.8 going until we are satisfied with the quality of 1.10.x. Transition from 1.8 to 1.10 should be seamless for our users :slight_smile:

I'm asking as we all know for SWMR you need flock() and that you can not disable SWMR at compile time (I don't need it in my day to day use).

Hmm… HDF5 implements file locking in 1.10.x to prevent unauthorized access to an HDF5 file (for example, file is opened for writing (non-SWMR) and another process tries to write to it it). File locking is enabled if flock (or similar) is available on the system. Configure checks if file locking is available, but I think, we failed to check if it is disabled. We will take a look into this situation.

Thank you for reporting!

Elena

···

On May 13, 2016, at 10:55 AM, Timothy Brown <Timothy.Brown-1@Colorado.EDU> wrote:

On one of the clusters I run on we've got a Lustre file-system. However the admin's have deemed that file locking is too expensive and have disabled it. Here's the mount information:

mds01ib@o2ib1:mds02ib@o2ib1:/scratch on /lustre/janus_scratch type lustre (rw,noauto,_netdev)

So when I run a very simple test to create a HDF5 with version 1.10.0 on this file system it fails:

janus-compile1 ~$ ./test /lustre/janus_scratch/tibr1099/foo.h5
HDF5-DIAG: Error detected in HDF5 (1.10.0) thread 0:
#000: H5F.c line 491 in H5Fcreate(): unable to create file
   major: File accessibilty
   minor: Unable to open file
#001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
   major: File accessibilty
   minor: Unable to open file
#002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
   major: Virtual File Layer
   minor: Can't update object
#003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 38, error message = 'Function not implemented'
   major: File accessibilty
   minor: Bad file ID accessed
Unable to open: /lustre/janus_scratch/tibr1099/foo.h5: -1
1

When I strace the program I see it's because flock() failed:

open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
close(3) = 0
open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
flock(3, LOCK_EX|LOCK_NB) = -1 ENOSYS (Function not implemented)
close(3) = 0

Versus if I trace the program with version 1.8.15:

open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
close(3) = 0
open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
brk(0x235a000) = 0x235a000
mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f17252b8000

So my long winded example leads to three questions.
1) Do other HPC sites enable flock() on lustre? If so is it only localflock so as not to have the burden of a cluster wide flock?
2) Is there a path forward for sites that don't enable flock?
3) Is there the opposite of H5Fstart_swmr_write?

Thanks!
Tim<test.f90>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5


#3

If a suitable way to lock files cannot be determined at configure time, a no-op function is substituted. This is currently the case on Windows. File locking is just advisory, so this isn't a big deal.

As for disabling file locking, we talked about this and will try to get a configure-time mechanism for disabling file locking implemented for HDF5 1.10.1.

Dana Robinson
Software Engineer
The HDF Group

···

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Elena Pourmal
Sent: Sunday, May 15, 2016 9:01 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] HDF5-1.10.0 and flock()

Hi Tim,

On May 13, 2016, at 10:55 AM, Timothy Brown <Timothy.Brown-1@Colorado.EDU> wrote:

Hi all,

I was wondering if HDF5 was going to be keep the 1.8.x branch going? Or is it recommend to move to the 1.10.x?

Yes, we will keep 1.8 going until we are satisfied with the quality of 1.10.x. Transition from 1.8 to 1.10 should be seamless for our users :slight_smile:

I'm asking as we all know for SWMR you need flock() and that you can not disable SWMR at compile time (I don't need it in my day to day use).

Hmm... HDF5 implements file locking in 1.10.x to prevent unauthorized access to an HDF5 file (for example, file is opened for writing (non-SWMR) and another process tries to write to it it). File locking is enabled if flock (or similar) is available on the system. Configure checks if file locking is available, but I think, we failed to check if it is disabled. We will take a look into this situation.

Thank you for reporting!

Elena

On one of the clusters I run on we've got a Lustre file-system. However the admin's have deemed that file locking is too expensive and have disabled it. Here's the mount information:

mds01ib@o2ib1:mds02ib@o2ib1:/scratch on /lustre/janus_scratch type
lustre (rw,noauto,_netdev)

So when I run a very simple test to create a HDF5 with version 1.10.0 on this file system it fails:

janus-compile1 ~$ ./test /lustre/janus_scratch/tibr1099/foo.h5
HDF5-DIAG: Error detected in HDF5 (1.10.0) thread 0:
#000: H5F.c line 491 in H5Fcreate(): unable to create file
   major: File accessibilty
   minor: Unable to open file
#001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
   major: File accessibilty
   minor: Unable to open file
#002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
   major: Virtual File Layer
   minor: Can't update object
#003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 38, error message = 'Function not implemented'
   major: File accessibilty
   minor: Bad file ID accessed
Unable to open: /lustre/janus_scratch/tibr1099/foo.h5: -1
1

When I strace the program I see it's because flock() failed:

open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR) = 3 fstat(3,
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
close(3) = 0
open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR|O_CREAT|O_TRUNC,
0666) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
flock(3, LOCK_EX|LOCK_NB) = -1 ENOSYS (Function not implemented)
close(3) = 0

Versus if I trace the program with version 1.8.15:

open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR) = 3 fstat(3,
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
close(3) = 0
open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR|O_CREAT|O_TRUNC,
0666) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
brk(0x235a000) = 0x235a000
mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x7f17252b8000

So my long winded example leads to three questions.
1) Do other HPC sites enable flock() on lustre? If so is it only localflock so as not to have the burden of a cluster wide flock?
2) Is there a path forward for sites that don't enable flock?
3) Is there the opposite of H5Fstart_swmr_write?

Thanks!
Tim<test.f90>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.or
g
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org


Twitter: https://twitter.com/hdf5


#4

Hi Elena & Dana,

  shouldn't disabling file locking rather be a runtime mechanism? What if you want to use the same binary on the same hardware with different file system configurations, or one the same hardware writing to different file systems, or if the sysadmin changes their mind on a daily basis to enable or disable file locking?

            Werner

···

On 16.05.2016 18:03, Dana Robinson wrote:

If a suitable way to lock files cannot be determined at configure time, a no-op function is substituted. This is currently the case on Windows. File locking is just advisory, so this isn't a big deal.

As for disabling file locking, we talked about this and will try to get a configure-time mechanism for disabling file locking implemented for HDF5 1.10.1.

Dana Robinson
Software Engineer
The HDF Group

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Elena Pourmal
Sent: Sunday, May 15, 2016 9:01 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] HDF5-1.10.0 and flock()

Hi Tim,

On May 13, 2016, at 10:55 AM, Timothy Brown <Timothy.Brown-1@Colorado.EDU> wrote:

Hi all,

I was wondering if HDF5 was going to be keep the 1.8.x branch going? Or is it recommend to move to the 1.10.x?

Yes, we will keep 1.8 going until we are satisfied with the quality of 1.10.x. Transition from 1.8 to 1.10 should be seamless for our users :slight_smile:

I'm asking as we all know for SWMR you need flock() and that you can not disable SWMR at compile time (I don't need it in my day to day use).

Hmm... HDF5 implements file locking in 1.10.x to prevent unauthorized access to an HDF5 file (for example, file is opened for writing (non-SWMR) and another process tries to write to it it). File locking is enabled if flock (or similar) is available on the system. Configure checks if file locking is available, but I think, we failed to check if it is disabled. We will take a look into this situation.

Thank you for reporting!

Elena

On one of the clusters I run on we've got a Lustre file-system. However the admin's have deemed that file locking is too expensive and have disabled it. Here's the mount information:

mds01ib@o2ib1:mds02ib@o2ib1:/scratch on /lustre/janus_scratch type
lustre (rw,noauto,_netdev)

So when I run a very simple test to create a HDF5 with version 1.10.0 on this file system it fails:

janus-compile1 ~$ ./test /lustre/janus_scratch/tibr1099/foo.h5
HDF5-DIAG: Error detected in HDF5 (1.10.0) thread 0:
  #000: H5F.c line 491 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 38, error message = 'Function not implemented'
    major: File accessibilty
    minor: Bad file ID accessed
Unable to open: /lustre/janus_scratch/tibr1099/foo.h5: -1
1

When I strace the program I see it's because flock() failed:

open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR) = 3 fstat(3,
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
close(3) = 0
open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR|O_CREAT|O_TRUNC,
0666) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
flock(3, LOCK_EX|LOCK_NB) = -1 ENOSYS (Function not implemented)
close(3) = 0

Versus if I trace the program with version 1.8.15:

open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR) = 3 fstat(3,
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
close(3) = 0
open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR|O_CREAT|O_TRUNC,
0666) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
brk(0x235a000) = 0x235a000
mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x7f17252b8000

So my long winded example leads to three questions.
1) Do other HPC sites enable flock() on lustre? If so is it only localflock so as not to have the burden of a cluster wide flock?
2) Is there a path forward for sites that don't enable flock?
3) Is there the opposite of H5Fstart_swmr_write?

Thanks!
Tim<test.f90>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.or
g
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Center for Computation & Technology at Louisiana State University (CCT/LSU)
2019 Digital Media Center, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362


#5

Yeah. I've thought about this and I probably should create an API call. I was just hoping to avoid that.

Dana

···

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Werner Benger
Sent: Tuesday, May 17, 2016 5:30 AM
To: hdf-forum@lists.hdfgroup.org
Subject: Re: [Hdf-forum] HDF5-1.10.0 and flock()

Hi Elena & Dana,

  shouldn't disabling file locking rather be a runtime mechanism? What if you want to use the same binary on the same hardware with different file system configurations, or one the same hardware writing to different file systems, or if the sysadmin changes their mind on a daily basis to enable or disable file locking?

            Werner

On 16.05.2016 18:03, Dana Robinson wrote:

If a suitable way to lock files cannot be determined at configure time, a no-op function is substituted. This is currently the case on Windows. File locking is just advisory, so this isn't a big deal.

As for disabling file locking, we talked about this and will try to get a configure-time mechanism for disabling file locking implemented for HDF5 1.10.1.

Dana Robinson
Software Engineer
The HDF Group

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On
Behalf Of Elena Pourmal
Sent: Sunday, May 15, 2016 9:01 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] HDF5-1.10.0 and flock()

Hi Tim,

On May 13, 2016, at 10:55 AM, Timothy Brown <Timothy.Brown-1@Colorado.EDU> wrote:

Hi all,

I was wondering if HDF5 was going to be keep the 1.8.x branch going? Or is it recommend to move to the 1.10.x?

Yes, we will keep 1.8 going until we are satisfied with the quality of
1.10.x. Transition from 1.8 to 1.10 should be seamless for our users
:slight_smile:

I'm asking as we all know for SWMR you need flock() and that you can not disable SWMR at compile time (I don't need it in my day to day use).

Hmm... HDF5 implements file locking in 1.10.x to prevent unauthorized access to an HDF5 file (for example, file is opened for writing (non-SWMR) and another process tries to write to it it). File locking is enabled if flock (or similar) is available on the system. Configure checks if file locking is available, but I think, we failed to check if it is disabled. We will take a look into this situation.

Thank you for reporting!

Elena

On one of the clusters I run on we've got a Lustre file-system. However the admin's have deemed that file locking is too expensive and have disabled it. Here's the mount information:

mds01ib@o2ib1:mds02ib@o2ib1:/scratch on /lustre/janus_scratch type
lustre (rw,noauto,_netdev)

So when I run a very simple test to create a HDF5 with version 1.10.0 on this file system it fails:

janus-compile1 ~$ ./test /lustre/janus_scratch/tibr1099/foo.h5
HDF5-DIAG: Error detected in HDF5 (1.10.0) thread 0:
  #000: H5F.c line 491 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 38, error message = 'Function not implemented'
    major: File accessibilty
    minor: Bad file ID accessed
Unable to open: /lustre/janus_scratch/tibr1099/foo.h5: -1
1

When I strace the program I see it's because flock() failed:

open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR) = 3 fstat(3,
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
close(3) = 0
open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR|O_CREAT|O_TRUNC,
0666) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
flock(3, LOCK_EX|LOCK_NB) = -1 ENOSYS (Function not implemented)
close(3) = 0

Versus if I trace the program with version 1.8.15:

open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR) = 3 fstat(3,
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
close(3) = 0
open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR|O_CREAT|O_TRUNC,
0666) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
brk(0x235a000) = 0x235a000
mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x7f17252b8000

So my long winded example leads to three questions.
1) Do other HPC sites enable flock() on lustre? If so is it only localflock so as not to have the burden of a cluster wide flock?
2) Is there a path forward for sites that don't enable flock?
3) Is there the opposite of H5Fstart_swmr_write?

Thanks!
Tim<test.f90>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.o
r
g
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.or
g
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.or
g
Twitter: https://twitter.com/hdf5

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Center for Computation & Technology at Louisiana State University (CCT/LSU)
2019 Digital Media Center, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org


Twitter: https://twitter.com/hdf5


#6

Ok, so just curious, where is 1.10.x at with this?