Extra file_close callback called after end of user App by using the VOL plugin

Hello,

I have implemented a custom VOL plugin for HDF5 and I noticed that every time the application exits, the file_close callback is called one more time at the end. The contents of the object passed as a parameter are unspecified and sometimes the same as the previous objects, so it is hard for me to identify this redundant function call.

For example, having the following user application:

file_id = H5Fcreate(“myfile.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
status = H5Fclose(file_id);

The first API call will call the file_open callback through the VOL plugin.
The second API call will call the file_close callback through the VOL plugin with parameter the object specifying the file we want to close. This object contains the file name.
However, I see another call of the file_close callback at the very end with an invalid object (although not NULL) and file name garbage.
This causes my program to crash, since I am trying to close the underlying file which is invalid! I cannot think of a way to identify if this call is a redundant call and to ignore it.

Do you have nay idea of how to deal with this redundant callback?

Thanks,

Dimos

···

---
Dimokritos Stamatakis
PhD student,
Brandeis University

Thanks Mohamad!

How can I see if a handle is left open? In my file_close callback I free the custom file object I created and I return SUCCEED. This is how my H5VL_file_close looks like:

herr_t ret_value = SUCCEED;
H5RBDB_file_t* f;
    
FUNC_ENTER_NOAPI_NOINIT

f = (H5RBDB_file_t*)file;
if ( (H5RBDB_file_close(f, dxpl_id)) != 0)
    HGOTO_ERROR(H5E_FILE, H5E_CANTCLOSEFILE, NULL, "unable to close file")
        
done:
    FUNC_LEAVE_NOAPI(ret_value)

and within H5RBDB_file_close() I free the file object (H5RDB_file_t*). Is there anything else I have to do to close a file handle?

The user application is simply creating and closing a HDF5 file as I showed you in the previous email.

Thanks,

Dimos

···

---
Dimokritos Stamatakis
PhD student,
Brandeis University

On May 16, 2016, at 12:04 PM, hdf-forum-request@lists.hdfgroup.org wrote:

Send Hdf-forum mailing list submissions to
  hdf-forum@lists.hdfgroup.org

To subscribe or unsubscribe via the World Wide Web, visit
  http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

or, via email, send a message with subject or body 'help' to
  hdf-forum-request@lists.hdfgroup.org

You can reach the person managing the list at
  hdf-forum-owner@lists.hdfgroup.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Hdf-forum digest..."

Today's Topics:

  1. Re: HDF5-1.10.0 and flock() (Elena Pourmal)
  2. Re: extra file_close callback called after the end of user
     App by using the VOL plugin (Mohamad Chaarawi)
  3. Re: HDF5-1.10.0 and flock() (Dana Robinson)

----------------------------------------------------------------------

Message: 1
Date: Mon, 16 May 2016 01:00:31 +0000
From: Elena Pourmal <epourmal@hdfgroup.org>
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] HDF5-1.10.0 and flock()
Message-ID: <FBAA9981-4783-4AAE-90D4-164330F75CB3@hdfgroup.org>
Content-Type: text/plain; charset="Windows-1252"

Hi Tim,

On May 13, 2016, at 10:55 AM, Timothy Brown <Timothy.Brown-1@Colorado.EDU> wrote:

Hi all,

I was wondering if HDF5 was going to be keep the 1.8.x branch going? Or is it recommend to move to the 1.10.x?

Yes, we will keep 1.8 going until we are satisfied with the quality of 1.10.x. Transition from 1.8 to 1.10 should be seamless for our users :slight_smile:

I'm asking as we all know for SWMR you need flock() and that you can not disable SWMR at compile time (I don't need it in my day to day use).

Hmm? HDF5 implements file locking in 1.10.x to prevent unauthorized access to an HDF5 file (for example, file is opened for writing (non-SWMR) and another process tries to write to it it). File locking is enabled if flock (or similar) is available on the system. Configure checks if file locking is available, but I think, we failed to check if it is disabled. We will take a look into this situation.

Thank you for reporting!

Elena

On one of the clusters I run on we've got a Lustre file-system. However the admin's have deemed that file locking is too expensive and have disabled it. Here's the mount information:

mds01ib@o2ib1:mds02ib@o2ib1:/scratch on /lustre/janus_scratch type lustre (rw,noauto,_netdev)

So when I run a very simple test to create a HDF5 with version 1.10.0 on this file system it fails:

janus-compile1 ~$ ./test /lustre/janus_scratch/tibr1099/foo.h5
HDF5-DIAG: Error detected in HDF5 (1.10.0) thread 0:
#000: H5F.c line 491 in H5Fcreate(): unable to create file
  major: File accessibilty
  minor: Unable to open file
#001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
  major: File accessibilty
  minor: Unable to open file
#002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
  major: Virtual File Layer
  minor: Can't update object
#003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 38, error message = 'Function not implemented'
  major: File accessibilty
  minor: Bad file ID accessed
Unable to open: /lustre/janus_scratch/tibr1099/foo.h5: -1
1

When I strace the program I see it's because flock() failed:

open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
close(3) = 0
open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
flock(3, LOCK_EX|LOCK_NB) = -1 ENOSYS (Function not implemented)
close(3) = 0

Versus if I trace the program with version 1.8.15:

open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
close(3) = 0
open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
brk(0x235a000) = 0x235a000
mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f17252b8000

So my long winded example leads to three questions.
1) Do other HPC sites enable flock() on lustre? If so is it only localflock so as not to have the burden of a cluster wide flock?
2) Is there a path forward for sites that don't enable flock?
3) Is there the opposite of H5Fstart_swmr_write?

Thanks!
Tim<test.f90>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

------------------------------

Message: 2
Date: Mon, 16 May 2016 14:42:25 +0000
From: Mohamad Chaarawi <chaarawi@hdfgroup.org>
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] extra file_close callback called after the
  end of user App by using the VOL plugin
Message-ID: <4AEC97BC-092F-450E-987C-8CEC897DF38B@hdfgroup.org>
Content-Type: text/plain; charset="utf-8"

Hi Dimos,

I have not seen this behavior before. I?m guessing you have an open file handle that you missed closing, and the file close is getting triggered at exit.
To be able to look into this, I will need a simple program with your plugin that replicates this.

Thanks,
Mohamad

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Dimos Stamatakis <dimstamat@gmail.com>
Reply-To: hdf-forum <hdf-forum@lists.hdfgroup.org>
Date: Thursday, May 12, 2016 at 11:39 AM
To: hdf-forum <hdf-forum@lists.hdfgroup.org>
Subject: [Hdf-forum] extra file_close callback called after the end of user App by using the VOL plugin

Hello,

I have implemented a custom VOL plugin for HDF5 and I noticed that every time the application exits, the file_close callback is called one more time at the end. The contents of the object passed as a parameter are unspecified and sometimes the same as the previous objects, so it is hard for me to identify this redundant function call.

For example, having the following user application:

file_id = H5Fcreate(?myfile.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
status = H5Fclose(file_id);

The first API call will call the file_open callback through the VOL plugin.
The second API call will call the file_close callback through the VOL plugin with parameter the object specifying the file we want to close. This object contains the file name.
However, I see another call of the file_close callback at the very end with an invalid object (although not NULL) and file name garbage.
This causes my program to crash, since I am trying to close the underlying file which is invalid! I cannot think of a way to identify if this call is a redundant call and to ignore it.

Do you have nay idea of how to deal with this redundant callback?

Thanks,

Dimos

---
Dimokritos Stamatakis
PhD student,
Brandeis University

---
Dimokritos Stamatakis
PhD student,
Brandeis University

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.hdfgroup.org/pipermail/hdf-forum_lists.hdfgroup.org/attachments/20160516/0517345d/attachment-0001.html>

------------------------------

Message: 3
Date: Mon, 16 May 2016 16:03:16 +0000
From: Dana Robinson <derobins@hdfgroup.org>
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] HDF5-1.10.0 and flock()
Message-ID:
  <CY1PR17MB06132DDBAFD039AD226D0BCDB1770@CY1PR17MB0613.namprd17.prod.outlook.com>
  
Content-Type: text/plain; charset="us-ascii"

If a suitable way to lock files cannot be determined at configure time, a no-op function is substituted. This is currently the case on Windows. File locking is just advisory, so this isn't a big deal.

As for disabling file locking, we talked about this and will try to get a configure-time mechanism for disabling file locking implemented for HDF5 1.10.1.

Dana Robinson
Software Engineer
The HDF Group

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Elena Pourmal
Sent: Sunday, May 15, 2016 9:01 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] HDF5-1.10.0 and flock()

Hi Tim,

On May 13, 2016, at 10:55 AM, Timothy Brown <Timothy.Brown-1@Colorado.EDU> wrote:

Hi all,

I was wondering if HDF5 was going to be keep the 1.8.x branch going? Or is it recommend to move to the 1.10.x?

Yes, we will keep 1.8 going until we are satisfied with the quality of 1.10.x. Transition from 1.8 to 1.10 should be seamless for our users :slight_smile:

I'm asking as we all know for SWMR you need flock() and that you can not disable SWMR at compile time (I don't need it in my day to day use).

Hmm... HDF5 implements file locking in 1.10.x to prevent unauthorized access to an HDF5 file (for example, file is opened for writing (non-SWMR) and another process tries to write to it it). File locking is enabled if flock (or similar) is available on the system. Configure checks if file locking is available, but I think, we failed to check if it is disabled. We will take a look into this situation.

Thank you for reporting!

Elena

On one of the clusters I run on we've got a Lustre file-system. However the admin's have deemed that file locking is too expensive and have disabled it. Here's the mount information:

mds01ib@o2ib1:mds02ib@o2ib1:/scratch on /lustre/janus_scratch type
lustre (rw,noauto,_netdev)

So when I run a very simple test to create a HDF5 with version 1.10.0 on this file system it fails:

janus-compile1 ~$ ./test /lustre/janus_scratch/tibr1099/foo.h5
HDF5-DIAG: Error detected in HDF5 (1.10.0) thread 0:
#000: H5F.c line 491 in H5Fcreate(): unable to create file
  major: File accessibilty
  minor: Unable to open file
#001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
  major: File accessibilty
  minor: Unable to open file
#002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
  major: Virtual File Layer
  minor: Can't update object
#003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 38, error message = 'Function not implemented'
  major: File accessibilty
  minor: Bad file ID accessed
Unable to open: /lustre/janus_scratch/tibr1099/foo.h5: -1
1

When I strace the program I see it's because flock() failed:

open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR) = 3 fstat(3,
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
close(3) = 0
open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR|O_CREAT|O_TRUNC,
0666) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
flock(3, LOCK_EX|LOCK_NB) = -1 ENOSYS (Function not implemented)
close(3) = 0

Versus if I trace the program with version 1.8.15:

open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR) = 3 fstat(3,
{st_mode=S_IFREG|0644, st_size=0, ...}) = 0
close(3) = 0
open("/lustre/janus_scratch/tibr1099/foo.h5", O_RDWR|O_CREAT|O_TRUNC,
0666) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
brk(0x235a000) = 0x235a000
mmap(NULL, 528384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x7f17252b8000

So my long winded example leads to three questions.
1) Do other HPC sites enable flock() on lustre? If so is it only localflock so as not to have the burden of a cluster wide flock?
2) Is there a path forward for sites that don't enable flock?
3) Is there the opposite of H5Fstart_swmr_write?

Thanks!
Tim<test.f90>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.or
g
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

------------------------------

Subject: Digest Footer

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

------------------------------

End of Hdf-forum Digest, Vol 83, Issue 18
*****************************************

Hi Dimos,

I have not seen this behavior before. I’m guessing you have an open file handle that you missed closing, and the file close is getting triggered at exit.
To be able to look into this, I will need a simple program with your plugin that replicates this.

Thanks,
Mohamad

···

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Dimos Stamatakis <dimstamat@gmail.com>
Reply-To: hdf-forum <hdf-forum@lists.hdfgroup.org>
Date: Thursday, May 12, 2016 at 11:39 AM
To: hdf-forum <hdf-forum@lists.hdfgroup.org>
Subject: [Hdf-forum] extra file_close callback called after the end of user App by using the VOL plugin

Hello,

I have implemented a custom VOL plugin for HDF5 and I noticed that every time the application exits, the file_close callback is called one more time at the end. The contents of the object passed as a parameter are unspecified and sometimes the same as the previous objects, so it is hard for me to identify this redundant function call.

For example, having the following user application:

file_id = H5Fcreate(“myfile.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
status = H5Fclose(file_id);

The first API call will call the file_open callback through the VOL plugin.
The second API call will call the file_close callback through the VOL plugin with parameter the object specifying the file we want to close. This object contains the file name.
However, I see another call of the file_close callback at the very end with an invalid object (although not NULL) and file name garbage.
This causes my program to crash, since I am trying to close the underlying file which is invalid! I cannot think of a way to identify if this call is a redundant call and to ignore it.

Do you have nay idea of how to deal with this redundant callback?

Thanks,

Dimos

---
Dimokritos Stamatakis
PhD student,
Brandeis University

---
Dimokritos Stamatakis
PhD student,
Brandeis University