Is locking a file possible?

Hi,

I'm trying to support the locking of a HDF5 file in a multi-process
environment, with no success at the moment. What I have tried so far
is to call the flock system call right after the opening of an HDF5
file. Something like this:

"""
        self.file_id = H5Fopen(name, H5F_ACC_RDWR, H5P_DEFAULT)
        # Get the low-level file descriptor
        H5Fget_vfd_handle(self.file_id, H5P_DEFAULT, &file_handle)
        fd = (<int *>file_handle)[0]
        # Lock the file
        flock(fd, LOCK_EX)
"""

Then, I launch several processes that tries to access the same file,
write some dataset, and then close the file (hence, releasing the
lock). When I run a single instance of the hosting program, it runs
well. However, whenever I try to run more than one instance
simultaneously, a lot of errors happens (see attachment).

If I use a separate lock file, everything works fine. In that case, I
lock my lockfile, then open the HDF file, write, close the HDF file,
and unlock the lock file.

I have tried both HDF5 1.6.7 and 1.8.1 on a Linux box, with same result.

Any hints on why the above code is not working properly?

Thanks,

errors.out (5.99 KB)

···

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

It sounds like you should be using MPI and the parallel HDF5
interface.

==rob

···

On Mon, Sep 08, 2008 at 08:44:09PM +0200, Francesc Alted wrote:

I'm trying to support the locking of a HDF5 file in a multi-process
environment, with no success at the moment. What I have tried so far
is to call the flock system call right after the opening of an HDF5
file. Something like this:

--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Francesc,

Hi,

I'm trying to support the locking of a HDF5 file in a multi-process
environment, with no success at the moment. What I have tried so far
is to call the flock system call right after the opening of an HDF5
file. Something like this:

"""
       self.file_id = H5Fopen(name, H5F_ACC_RDWR, H5P_DEFAULT)
       # Get the low-level file descriptor
       H5Fget_vfd_handle(self.file_id, H5P_DEFAULT, &file_handle)
       fd = (<int *>file_handle)[0]
       # Lock the file
       flock(fd, LOCK_EX)
"""

Then, I launch several processes that tries to access the same file,
write some dataset, and then close the file (hence, releasing the
lock). When I run a single instance of the hosting program, it runs
well. However, whenever I try to run more than one instance
simultaneously, a lot of errors happens (see attachment).

If I use a separate lock file, everything works fine. In that case, I
lock my lockfile, then open the HDF file, write, close the HDF file,
and unlock the lock file.

I have tried both HDF5 1.6.7 and 1.8.1 on a Linux box, with same result.

Any hints on why the above code is not working properly?

  You are fighting the metadata cache in HDF5. Unfortunately there's currently no way to evict all the entries from the cache, even if you call H5Fflush(), so it's very likely that one or more of the processes will be dealing with stale metadata. I've added a new feature request to our bugzilla database and maybe we'll be able to act on it at some point.

  Quincey

···

On Sep 8, 2008, at 1:44 PM, Francesc Alted wrote:

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

A Monday 08 September 2008, Quincey Koziol escrigué:

Hi Francesc,

> Hi,
>
> I'm trying to support the locking of a HDF5 file in a multi-process
> environment, with no success at the moment. What I have tried so
> far is to call the flock system call right after the opening of an
> HDF5 file. Something like this:
>
> """
> self.file_id = H5Fopen(name, H5F_ACC_RDWR, H5P_DEFAULT)
> # Get the low-level file descriptor
> H5Fget_vfd_handle(self.file_id, H5P_DEFAULT, &file_handle)
> fd = (<int *>file_handle)[0]
> # Lock the file
> flock(fd, LOCK_EX)
> """
>
> Then, I launch several processes that tries to access the same
> file, write some dataset, and then close the file (hence, releasing
> the lock). When I run a single instance of the hosting program, it
> runs well. However, whenever I try to run more than one instance
> simultaneously, a lot of errors happens (see attachment).
>
> If I use a separate lock file, everything works fine. In that case,
> I lock my lockfile, then open the HDF file, write, close the HDF
> file, and unlock the lock file.
>
> I have tried both HDF5 1.6.7 and 1.8.1 on a Linux box, with same
> result.
>
> Any hints on why the above code is not working properly?

  You are fighting the metadata cache in HDF5. Unfortunately there's
currently no way to evict all the entries from the cache, even if you
call H5Fflush(), so it's very likely that one or more of the
processes will be dealing with stale metadata. I've added a new
feature request to our bugzilla database and maybe we'll be able to
act on it at some point.

I see. At any rate, I find it curious that locking using a regular file
works flawlessly in the same scenario.

Thanks,

···

On Sep 8, 2008, at 1:44 PM, Francesc Alted wrote:

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Francesc,

···

On Sep 9, 2008, at 5:36 AM, Francesc Alted wrote:

A Monday 08 September 2008, Quincey Koziol escrigué:

Hi Francesc,

On Sep 8, 2008, at 1:44 PM, Francesc Alted wrote:

Hi,

I'm trying to support the locking of a HDF5 file in a multi-process
environment, with no success at the moment. What I have tried so
far is to call the flock system call right after the opening of an
HDF5 file. Something like this:

"""
      self.file_id = H5Fopen(name, H5F_ACC_RDWR, H5P_DEFAULT)
      # Get the low-level file descriptor
      H5Fget_vfd_handle(self.file_id, H5P_DEFAULT, &file_handle)
      fd = (<int *>file_handle)[0]
      # Lock the file
      flock(fd, LOCK_EX)
"""

Then, I launch several processes that tries to access the same
file, write some dataset, and then close the file (hence, releasing
the lock). When I run a single instance of the hosting program, it
runs well. However, whenever I try to run more than one instance
simultaneously, a lot of errors happens (see attachment).

If I use a separate lock file, everything works fine. In that case,
I lock my lockfile, then open the HDF file, write, close the HDF
file, and unlock the lock file.

I have tried both HDF5 1.6.7 and 1.8.1 on a Linux box, with same
result.

Any hints on why the above code is not working properly?

  You are fighting the metadata cache in HDF5. Unfortunately there's
currently no way to evict all the entries from the cache, even if you
call H5Fflush(), so it's very likely that one or more of the
processes will be dealing with stale metadata. I've added a new
feature request to our bugzilla database and maybe we'll be able to
act on it at some point.

I see. At any rate, I find it curious that locking using a regular file
works flawlessly in the same scenario.

  Locking using a regular file works because you are closing & re-opening the HDF5 file for each process (which flushes all the metadata changes to the file on closing and re-reads them on re-opening the file).

  Quincey

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

A Tuesday 09 September 2008, Quincey Koziol escrigué:
[clip]

>> You are fighting the metadata cache in HDF5. Unfortunately
>> there's currently no way to evict all the entries from the cache,
>> even if you call H5Fflush(), so it's very likely that one or more
>> of the processes will be dealing with stale metadata. I've added
>> a new feature request to our bugzilla database and maybe we'll be
>> able to act on it at some point.
>
> I see. At any rate, I find it curious that locking using a regular
> file
> works flawlessly in the same scenario.

  Locking using a regular file works because you are closing & re-
opening the HDF5 file for each process (which flushes all the
metadata changes to the file on closing and re-reads them on
re-opening the file).

So, when using the HDF5 file itself for locking, as the lock process
happens after the library has already opened the file then it already
has read bits from stalled metadata cache. Now I definitely see it.

Thanks for the explanation!

···

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

A Tuesday 09 September 2008, Francesc Alted escrigué:

A Tuesday 09 September 2008, Quincey Koziol escrigué:
[clip]

> >> You are fighting the metadata cache in HDF5. Unfortunately
> >> there's currently no way to evict all the entries from the
> >> cache, even if you call H5Fflush(), so it's very likely that one
> >> or more of the processes will be dealing with stale metadata.
> >> I've added a new feature request to our bugzilla database and
> >> maybe we'll be able to act on it at some point.
> >
> > I see. At any rate, I find it curious that locking using a
> > regular file
> > works flawlessly in the same scenario.
>
> Locking using a regular file works because you are closing & re-
> opening the HDF5 file for each process (which flushes all the
> metadata changes to the file on closing and re-reads them on
> re-opening the file).

So, when using the HDF5 file itself for locking, as the lock process
happens after the library has already opened the file then it already
has read bits from stalled metadata cache. Now I definitely see it.

Hmm, not quite. After thinking a bit more on this issue, I think now
that the problem is not in the metadata cache, but it is a more
fundamental one: I'm effectively opening a file (and hence, reading
metadata, either from cache or from disk) *before* locking it, and that
will always lead to wrong results, irregardless of an existing cache or
not.

I can devise a couple of solutions for this. The first one is to add a
new parameter to the H5Fopen to inform it that we want to lock the file
as soon as the file descriptor is allocated and before reading any
meta-information (either from disk or cache), but that implies an API
change.

The other solution is to increase the lazyness of the process of reading
the metadata until it is absolutely needed by other functions. So, in
essence, the H5Fopen() should only basically have to open the
underlying file descriptor and that's all; then this descriptor can be
manually locked and the file metadata should be read later on, when it
is really needed.

All in all, both approaches seems to need too much changes in HDF5.
Perhaps a better venue is to find alternatives to do the locking in the
application side instead of including the functionality in HDF5 itself.

Cheers,

···

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Francesc,

···

On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:

A Tuesday 09 September 2008, Francesc Alted escrigué:

A Tuesday 09 September 2008, Quincey Koziol escrigué:
[clip]

  You are fighting the metadata cache in HDF5. Unfortunately
there's currently no way to evict all the entries from the
cache, even if you call H5Fflush(), so it's very likely that one
or more of the processes will be dealing with stale metadata.
I've added a new feature request to our bugzilla database and
maybe we'll be able to act on it at some point.

I see. At any rate, I find it curious that locking using a
regular file
works flawlessly in the same scenario.

  Locking using a regular file works because you are closing & re-
opening the HDF5 file for each process (which flushes all the
metadata changes to the file on closing and re-reads them on
re-opening the file).

So, when using the HDF5 file itself for locking, as the lock process
happens after the library has already opened the file then it already
has read bits from stalled metadata cache. Now I definitely see it.

Hmm, not quite. After thinking a bit more on this issue, I think now
that the problem is not in the metadata cache, but it is a more
fundamental one: I'm effectively opening a file (and hence, reading
metadata, either from cache or from disk) *before* locking it, and that
will always lead to wrong results, irregardless of an existing cache or
not.

I can devise a couple of solutions for this. The first one is to add a
new parameter to the H5Fopen to inform it that we want to lock the file
as soon as the file descriptor is allocated and before reading any
meta-information (either from disk or cache), but that implies an API
change.

The other solution is to increase the lazyness of the process of reading
the metadata until it is absolutely needed by other functions. So, in
essence, the H5Fopen() should only basically have to open the
underlying file descriptor and that's all; then this descriptor can be
manually locked and the file metadata should be read later on, when it
is really needed.

All in all, both approaches seems to need too much changes in HDF5.
Perhaps a better venue is to find alternatives to do the locking in the
application side instead of including the functionality in HDF5 itself.

  Those are both interesting ideas that I hadn't thought of. What I was thinking was to evict all the metadata from the cache and then re-read it from the file. This could be done at any point after the file was opened, although it would require that all objects in the file be closed when the cache entries were evicted.

  Quincey

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Quincey,

A Wednesday 10 September 2008, escriguéreu:

Hi Francesc,

> A Tuesday 09 September 2008, Francesc Alted escrigué:
>> A Tuesday 09 September 2008, Quincey Koziol escrigué:
>> [clip]
>>
>>>>> You are fighting the metadata cache in HDF5. Unfortunately
>>>>> there's currently no way to evict all the entries from the
>>>>> cache, even if you call H5Fflush(), so it's very likely that
>>>>> one or more of the processes will be dealing with stale
>>>>> metadata. I've added a new feature request to our bugzilla
>>>>> database and maybe we'll be able to act on it at some point.
>>>>
>>>> I see. At any rate, I find it curious that locking using a
>>>> regular file
>>>> works flawlessly in the same scenario.
>>>
>>> Locking using a regular file works because you are closing & re-
>>> opening the HDF5 file for each process (which flushes all the
>>> metadata changes to the file on closing and re-reads them on
>>> re-opening the file).
>>
>> So, when using the HDF5 file itself for locking, as the lock
>> process happens after the library has already opened the file then
>> it already has read bits from stalled metadata cache. Now I
>> definitely see it.
>
> Hmm, not quite. After thinking a bit more on this issue, I think
> now that the problem is not in the metadata cache, but it is a more
> fundamental one: I'm effectively opening a file (and hence, reading
> metadata, either from cache or from disk) *before* locking it, and
> that
> will always lead to wrong results, irregardless of an existing
> cache or
> not.
>
> I can devise a couple of solutions for this. The first one is to
> add a
> new parameter to the H5Fopen to inform it that we want to lock the
> file
> as soon as the file descriptor is allocated and before reading any
> meta-information (either from disk or cache), but that implies an
> API change.
>
> The other solution is to increase the lazyness of the process of
> reading
> the metadata until it is absolutely needed by other functions. So,
> in essence, the H5Fopen() should only basically have to open the
> underlying file descriptor and that's all; then this descriptor can
> be manually locked and the file metadata should be read later on,
> when it is really needed.
>
> All in all, both approaches seems to need too much changes in HDF5.
> Perhaps a better venue is to find alternatives to do the locking in
> the
> application side instead of including the functionality in HDF5
> itself.

  Those are both interesting ideas that I hadn't thought of. What I
was thinking was to evict all the metadata from the cache and then
re- read it from the file. This could be done at any point after the
file was opened, although it would require that all objects in the
file be closed when the cache entries were evicted.

Well, I suppose that my ignorance on the internals of HDF5 is preventing
me understanding your solution. Let's suppose that we have 2 processes
on a multi-processor machine. Let's call them process 'a' and
process 'b'. Both processes do the same thing: from time to time they
open a HDF5 file, lock it, write something on it, and close it
(unlocking it).

If process 'a' gets the lock first, then process 'b' will *open* the
file and will block until the file becomes unlocked. While process 'b'
is waiting, process 'a' writes a bunch of data in the file. When 'a'
finishes the writing and unlock the file then process 'b' unblocks and
gets the lock. But, by then (and this is main point), process 'b'
already has got internal information about the opened file that is
outdated.

The only way that I see to avoid the problem is that the information
about the opened file in process 'b' would exclusively reside in the
metadata cache; so by refreshing it (or evicting it) the new processes
can get the correct information. However, that solution does imply
that the HDF5 metadata cache is to be *shared* between both processes,
and I don't think this would be the case.

Cheers,

···

On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

Hi Francesc,

Quincey,

A Wednesday 10 September 2008, escriguéreu:

Hi Francesc,

A Tuesday 09 September 2008, Francesc Alted escrigué:

A Tuesday 09 September 2008, Quincey Koziol escrigué:
[clip]

  You are fighting the metadata cache in HDF5. Unfortunately
there's currently no way to evict all the entries from the
cache, even if you call H5Fflush(), so it's very likely that
one or more of the processes will be dealing with stale
metadata. I've added a new feature request to our bugzilla
database and maybe we'll be able to act on it at some point.

I see. At any rate, I find it curious that locking using a
regular file
works flawlessly in the same scenario.

  Locking using a regular file works because you are closing & re-
opening the HDF5 file for each process (which flushes all the
metadata changes to the file on closing and re-reads them on
re-opening the file).

So, when using the HDF5 file itself for locking, as the lock
process happens after the library has already opened the file then
it already has read bits from stalled metadata cache. Now I
definitely see it.

Hmm, not quite. After thinking a bit more on this issue, I think
now that the problem is not in the metadata cache, but it is a more
fundamental one: I'm effectively opening a file (and hence, reading
metadata, either from cache or from disk) *before* locking it, and
that
will always lead to wrong results, irregardless of an existing
cache or
not.

I can devise a couple of solutions for this. The first one is to
add a
new parameter to the H5Fopen to inform it that we want to lock the
file
as soon as the file descriptor is allocated and before reading any
meta-information (either from disk or cache), but that implies an
API change.

The other solution is to increase the lazyness of the process of
reading
the metadata until it is absolutely needed by other functions. So,
in essence, the H5Fopen() should only basically have to open the
underlying file descriptor and that's all; then this descriptor can
be manually locked and the file metadata should be read later on,
when it is really needed.

All in all, both approaches seems to need too much changes in HDF5.
Perhaps a better venue is to find alternatives to do the locking in
the
application side instead of including the functionality in HDF5
itself.

  Those are both interesting ideas that I hadn't thought of. What I
was thinking was to evict all the metadata from the cache and then
re- read it from the file. This could be done at any point after the
file was opened, although it would require that all objects in the
file be closed when the cache entries were evicted.

Well, I suppose that my ignorance on the internals of HDF5 is preventing
me understanding your solution. Let's suppose that we have 2 processes
on a multi-processor machine. Let's call them process 'a' and
process 'b'. Both processes do the same thing: from time to time they
open a HDF5 file, lock it, write something on it, and close it
(unlocking it).

If process 'a' gets the lock first, then process 'b' will *open* the
file and will block until the file becomes unlocked. While process 'b'
is waiting, process 'a' writes a bunch of data in the file. When 'a'
finishes the writing and unlock the file then process 'b' unblocks and
gets the lock. But, by then (and this is main point), process 'b'
already has got internal information about the opened file that is
outdated.

The only way that I see to avoid the problem is that the information
about the opened file in process 'b' would exclusively reside in the
metadata cache; so by refreshing it (or evicting it) the new processes
can get the correct information. However, that solution does imply
that the HDF5 metadata cache is to be *shared* between both processes,
and I don't think this would be the case.

  No, the metadata cache doesn't need to be shared between both processes. As long as each process evicts all it's metadata from the cache after it acquires the lock and flushes it's cache after it's done modifying the file but before releasing the lock, everything will work fine. Since each process has no knowledge of the contents of the file after evicting everything in it's cache, it will always get the most recent information from the file and therefore see all the changes from previous lock owners.

  Quincey

···

On Sep 10, 2008, at 11:28 AM, Francesc Alted wrote:

On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Francesc,

I have similar issues and I think you're right when you say that this should
be solved at the application layer. It is pretty difficult when the library
cannot manage its own space to have efficient locking. What if for example
the user deletes the file? Or another process wants to move the file? For
the moment I think it is difficult to deal with this effectively so I will
try to solve it with the old hack of the flag: when a process enters the
root (in my case also other major nodes in the tree) I set an attribute and
the last thing the process does is to unset the attribute. This way I also
know if there was an issue and writing failed.

Just a thought...

Regards,

-- dimitris

···

2008/9/10 Francesc Alted <faltet@pytables.com>

Quincey,

A Wednesday 10 September 2008, escriguéreu:
> Hi Francesc,
>
> On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:
> > A Tuesday 09 September 2008, Francesc Alted escrigué:
> >> A Tuesday 09 September 2008, Quincey Koziol escrigué:
> >> [clip]
> >>
> >>>>> You are fighting the metadata cache in HDF5. Unfortunately
> >>>>> there's currently no way to evict all the entries from the
> >>>>> cache, even if you call H5Fflush(), so it's very likely that
> >>>>> one or more of the processes will be dealing with stale
> >>>>> metadata. I've added a new feature request to our bugzilla
> >>>>> database and maybe we'll be able to act on it at some point.
> >>>>
> >>>> I see. At any rate, I find it curious that locking using a
> >>>> regular file
> >>>> works flawlessly in the same scenario.
> >>>
> >>> Locking using a regular file works because you are closing & re-
> >>> opening the HDF5 file for each process (which flushes all the
> >>> metadata changes to the file on closing and re-reads them on
> >>> re-opening the file).
> >>
> >> So, when using the HDF5 file itself for locking, as the lock
> >> process happens after the library has already opened the file then
> >> it already has read bits from stalled metadata cache. Now I
> >> definitely see it.
> >
> > Hmm, not quite. After thinking a bit more on this issue, I think
> > now that the problem is not in the metadata cache, but it is a more
> > fundamental one: I'm effectively opening a file (and hence, reading
> > metadata, either from cache or from disk) *before* locking it, and
> > that
> > will always lead to wrong results, irregardless of an existing
> > cache or
> > not.
> >
> > I can devise a couple of solutions for this. The first one is to
> > add a
> > new parameter to the H5Fopen to inform it that we want to lock the
> > file
> > as soon as the file descriptor is allocated and before reading any
> > meta-information (either from disk or cache), but that implies an
> > API change.
> >
> > The other solution is to increase the lazyness of the process of
> > reading
> > the metadata until it is absolutely needed by other functions. So,
> > in essence, the H5Fopen() should only basically have to open the
> > underlying file descriptor and that's all; then this descriptor can
> > be manually locked and the file metadata should be read later on,
> > when it is really needed.
> >
> > All in all, both approaches seems to need too much changes in HDF5.
> > Perhaps a better venue is to find alternatives to do the locking in
> > the
> > application side instead of including the functionality in HDF5
> > itself.
>
> Those are both interesting ideas that I hadn't thought of. What I
> was thinking was to evict all the metadata from the cache and then
> re- read it from the file. This could be done at any point after the
> file was opened, although it would require that all objects in the
> file be closed when the cache entries were evicted.

Well, I suppose that my ignorance on the internals of HDF5 is preventing
me understanding your solution. Let's suppose that we have 2 processes
on a multi-processor machine. Let's call them process 'a' and
process 'b'. Both processes do the same thing: from time to time they
open a HDF5 file, lock it, write something on it, and close it
(unlocking it).

If process 'a' gets the lock first, then process 'b' will *open* the
file and will block until the file becomes unlocked. While process 'b'
is waiting, process 'a' writes a bunch of data in the file. When 'a'
finishes the writing and unlock the file then process 'b' unblocks and
gets the lock. But, by then (and this is main point), process 'b'
already has got internal information about the opened file that is
outdated.

The only way that I see to avoid the problem is that the information
about the opened file in process 'b' would exclusively reside in the
metadata cache; so by refreshing it (or evicting it) the new processes
can get the correct information. However, that solution does imply
that the HDF5 metadata cache is to be *shared* between both processes,
and I don't think this would be the case.

A Wednesday 10 September 2008, Quincey Koziol escrigué:

Hi Francesc,

> Quincey,
>
> A Wednesday 10 September 2008, escriguéreu:
>> Hi Francesc,
>>
>>> A Tuesday 09 September 2008, Francesc Alted escrigué:
>>>> A Tuesday 09 September 2008, Quincey Koziol escrigué:
>>>> [clip]
>>>>
>>>>>>> You are fighting the metadata cache in HDF5. Unfortunately
>>>>>>> there's currently no way to evict all the entries from the
>>>>>>> cache, even if you call H5Fflush(), so it's very likely that
>>>>>>> one or more of the processes will be dealing with stale
>>>>>>> metadata. I've added a new feature request to our bugzilla
>>>>>>> database and maybe we'll be able to act on it at some point.
>>>>>>
>>>>>> I see. At any rate, I find it curious that locking using a
>>>>>> regular file
>>>>>> works flawlessly in the same scenario.
>>>>>
>>>>> Locking using a regular file works because you are closing &
>>>>> re- opening the HDF5 file for each process (which flushes all
>>>>> the metadata changes to the file on closing and re-reads them
>>>>> on re-opening the file).
>>>>
>>>> So, when using the HDF5 file itself for locking, as the lock
>>>> process happens after the library has already opened the file
>>>> then it already has read bits from stalled metadata cache. Now
>>>> I definitely see it.
>>>
>>> Hmm, not quite. After thinking a bit more on this issue, I think
>>> now that the problem is not in the metadata cache, but it is a
>>> more fundamental one: I'm effectively opening a file (and hence,
>>> reading metadata, either from cache or from disk) *before*
>>> locking it, and that
>>> will always lead to wrong results, irregardless of an existing
>>> cache or
>>> not.
>>>
>>> I can devise a couple of solutions for this. The first one is to
>>> add a
>>> new parameter to the H5Fopen to inform it that we want to lock
>>> the file
>>> as soon as the file descriptor is allocated and before reading
>>> any meta-information (either from disk or cache), but that
>>> implies an API change.
>>>
>>> The other solution is to increase the lazyness of the process of
>>> reading
>>> the metadata until it is absolutely needed by other functions.
>>> So, in essence, the H5Fopen() should only basically have to open
>>> the underlying file descriptor and that's all; then this
>>> descriptor can be manually locked and the file metadata should be
>>> read later on, when it is really needed.
>>>
>>> All in all, both approaches seems to need too much changes in
>>> HDF5. Perhaps a better venue is to find alternatives to do the
>>> locking in the
>>> application side instead of including the functionality in HDF5
>>> itself.
>>
>> Those are both interesting ideas that I hadn't thought of. What
>> I was thinking was to evict all the metadata from the cache and
>> then re- read it from the file. This could be done at any point
>> after the file was opened, although it would require that all
>> objects in the file be closed when the cache entries were evicted.
>
> Well, I suppose that my ignorance on the internals of HDF5 is
> preventing
> me understanding your solution. Let's suppose that we have 2
> processes
> on a multi-processor machine. Let's call them process 'a' and
> process 'b'. Both processes do the same thing: from time to time
> they open a HDF5 file, lock it, write something on it, and close it
> (unlocking it).
>
> If process 'a' gets the lock first, then process 'b' will *open*
> the file and will block until the file becomes unlocked. While
> process 'b'
> is waiting, process 'a' writes a bunch of data in the file. When
> 'a' finishes the writing and unlock the file then process 'b'
> unblocks and gets the lock. But, by then (and this is main point),
> process 'b' already has got internal information about the opened
> file that is outdated.
>
> The only way that I see to avoid the problem is that the
> information about the opened file in process 'b' would exclusively
> reside in the metadata cache; so by refreshing it (or evicting it)
> the new processes can get the correct information. However, that
> solution does imply that the HDF5 metadata cache is to be *shared*
> between both processes, and I don't think this would be the case.

  No, the metadata cache doesn't need to be shared between both
processes. As long as each process evicts all it's metadata from the
cache after it acquires the lock and flushes it's cache after it's
done modifying the file but before releasing the lock, everything
will work fine. Since each process has no knowledge of the contents
of the file after evicting everything in it's cache, it will always
get the most recent information from the file and therefore see all
the changes from previous lock owners.

Ah, I think I finally got what you meant. So, as I understand it, here
it is a small workflow of actions that reproduces your schema:

1. <open_file>
2. <acquire_lock>
3. <evict_metadata>
4. <write_things>
5. <close_file & release_lock & flush_metadata>

However, actions 2 and 3 are required to be manually added by the
developer. Hence, I presume that you were thinking in adding some
function for doing those actions at the same time, right? In that
case, maybe it could be worthwhile to ponder about adding a sort
of 'lock' parameter to the H5Fopen call instead. That way, actions 1,
2 and 3 can be done in just one single step, much the same than action
5, that would close, release the lock and flush metadata. The diagram
action would then looks like:

<open_file & acquire_lock & evict_metadata>
<write_things>
<close_file & release_lock & flush_metadata>

which seems quite handy to my eyes. Well, just a thought.

Thanks,

···

On Sep 10, 2008, at 11:28 AM, Francesc Alted wrote:
>> On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

Hi Dimitris,

Quincey,

A Wednesday 10 September 2008, escriguéreu:
> Hi Francesc,
>
> > A Tuesday 09 September 2008, Francesc Alted escrigué:
> >> A Tuesday 09 September 2008, Quincey Koziol escrigué:
> >> [clip]
> >>
> >>>>> You are fighting the metadata cache in HDF5. Unfortunately
> >>>>> there's currently no way to evict all the entries from the
> >>>>> cache, even if you call H5Fflush(), so it's very likely that
> >>>>> one or more of the processes will be dealing with stale
> >>>>> metadata. I've added a new feature request to our bugzilla
> >>>>> database and maybe we'll be able to act on it at some point.
> >>>>
> >>>> I see. At any rate, I find it curious that locking using a
> >>>> regular file
> >>>> works flawlessly in the same scenario.
> >>>
> >>> Locking using a regular file works because you are closing & re-
> >>> opening the HDF5 file for each process (which flushes all the
> >>> metadata changes to the file on closing and re-reads them on
> >>> re-opening the file).
> >>
> >> So, when using the HDF5 file itself for locking, as the lock
> >> process happens after the library has already opened the file then
> >> it already has read bits from stalled metadata cache. Now I
> >> definitely see it.
> >
> > Hmm, not quite. After thinking a bit more on this issue, I think
> > now that the problem is not in the metadata cache, but it is a more
> > fundamental one: I'm effectively opening a file (and hence, reading
> > metadata, either from cache or from disk) *before* locking it, and
> > that
> > will always lead to wrong results, irregardless of an existing
> > cache or
> > not.
> >
> > I can devise a couple of solutions for this. The first one is to
> > add a
> > new parameter to the H5Fopen to inform it that we want to lock the
> > file
> > as soon as the file descriptor is allocated and before reading any
> > meta-information (either from disk or cache), but that implies an
> > API change.
> >
> > The other solution is to increase the lazyness of the process of
> > reading
> > the metadata until it is absolutely needed by other functions. So,
> > in essence, the H5Fopen() should only basically have to open the
> > underlying file descriptor and that's all; then this descriptor can
> > be manually locked and the file metadata should be read later on,
> > when it is really needed.
> >
> > All in all, both approaches seems to need too much changes in HDF5.
> > Perhaps a better venue is to find alternatives to do the locking in
> > the
> > application side instead of including the functionality in HDF5
> > itself.
>
> Those are both interesting ideas that I hadn't thought of. What I
> was thinking was to evict all the metadata from the cache and then
> re- read it from the file. This could be done at any point after the
> file was opened, although it would require that all objects in the
> file be closed when the cache entries were evicted.

Well, I suppose that my ignorance on the internals of HDF5 is preventing
me understanding your solution. Let's suppose that we have 2 processes
on a multi-processor machine. Let's call them process 'a' and
process 'b'. Both processes do the same thing: from time to time they
open a HDF5 file, lock it, write something on it, and close it
(unlocking it).

If process 'a' gets the lock first, then process 'b' will *open* the
file and will block until the file becomes unlocked. While process 'b'
is waiting, process 'a' writes a bunch of data in the file. When 'a'
finishes the writing and unlock the file then process 'b' unblocks and
gets the lock. But, by then (and this is main point), process 'b'
already has got internal information about the opened file that is
outdated.

The only way that I see to avoid the problem is that the information
about the opened file in process 'b' would exclusively reside in the
metadata cache; so by refreshing it (or evicting it) the new processes
can get the correct information. However, that solution does imply
that the HDF5 metadata cache is to be *shared* between both processes,
and I don't think this would be the case.

Hi Francesc,

I have similar issues and I think you're right when you say that this should be solved at the application layer. It is pretty difficult when the library cannot manage its own space to have efficient locking. What if for example the user deletes the file? Or another process wants to move the file? For the moment I think it is difficult to deal with this effectively so I will try to solve it with the old hack of the flag: when a process enters the root (in my case also other major nodes in the tree) I set an attribute and the last thing the process does is to unset the attribute. This way I also know if there was an issue and writing failed.

  Setting a flag in the file is not sufficient. It's easy to imagine race conditions where two processes simultaneously check for the presence of the flag, determine it doesn't exist and set it, then proceed to modify the file. Some other mechanism which guarantees exclusive access must be used. (And even then, you'll have to use the cache management strategies I mentioned in an earlier mail).

  Note that we've given this a fair bit of thought at the HDF Group and have some good solutions, but would need to get funding/patches for this to get into the HDF5 library.

  Quincey

···

On Sep 10, 2008, at 12:35 PM, Dimitris Servis wrote:

2008/9/10 Francesc Alted <faltet@pytables.com>
> On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Quincey,

Hi Dimitris,

Quincey,

A Wednesday 10 September 2008, escriguéreu:
> Hi Francesc,
>
> > A Tuesday 09 September 2008, Francesc Alted escrigué:
> >> A Tuesday 09 September 2008, Quincey Koziol escrigué:
> >> [clip]
> >>
> >>>>> You are fighting the metadata cache in HDF5.
Unfortunately
> >>>>> there's currently no way to evict all the entries from the
> >>>>> cache, even if you call H5Fflush(), so it's very likely that
> >>>>> one or more of the processes will be dealing with stale
> >>>>> metadata. I've added a new feature request to our bugzilla
> >>>>> database and maybe we'll be able to act on it at some point.
> >>>>
> >>>> I see. At any rate, I find it curious that locking using a
> >>>> regular file
> >>>> works flawlessly in the same scenario.
> >>>
> >>> Locking using a regular file works because you are closing & re-
> >>> opening the HDF5 file for each process (which flushes all the
> >>> metadata changes to the file on closing and re-reads them on
> >>> re-opening the file).
> >>
> >> So, when using the HDF5 file itself for locking, as the lock
> >> process happens after the library has already opened the file then
> >> it already has read bits from stalled metadata cache. Now I
> >> definitely see it.
> >
> > Hmm, not quite. After thinking a bit more on this issue, I think
> > now that the problem is not in the metadata cache, but it is a more
> > fundamental one: I'm effectively opening a file (and hence, reading
> > metadata, either from cache or from disk) *before* locking it, and
> > that
> > will always lead to wrong results, irregardless of an existing
> > cache or
> > not.
> >
> > I can devise a couple of solutions for this. The first one is to
> > add a
> > new parameter to the H5Fopen to inform it that we want to lock the
> > file
> > as soon as the file descriptor is allocated and before reading any
> > meta-information (either from disk or cache), but that implies an
> > API change.
> >
> > The other solution is to increase the lazyness of the process of
> > reading
> > the metadata until it is absolutely needed by other functions. So,
> > in essence, the H5Fopen() should only basically have to open the
> > underlying file descriptor and that's all; then this descriptor can
> > be manually locked and the file metadata should be read later on,
> > when it is really needed.
> >
> > All in all, both approaches seems to need too much changes in HDF5.
> > Perhaps a better venue is to find alternatives to do the locking in
> > the
> > application side instead of including the functionality in HDF5
> > itself.
>
> Those are both interesting ideas that I hadn't thought of. What I
> was thinking was to evict all the metadata from the cache and then
> re- read it from the file. This could be done at any point after the
> file was opened, although it would require that all objects in the
> file be closed when the cache entries were evicted.

Well, I suppose that my ignorance on the internals of HDF5 is preventing
me understanding your solution. Let's suppose that we have 2 processes
on a multi-processor machine. Let's call them process 'a' and
process 'b'. Both processes do the same thing: from time to time they
open a HDF5 file, lock it, write something on it, and close it
(unlocking it).

If process 'a' gets the lock first, then process 'b' will *open* the
file and will block until the file becomes unlocked. While process 'b'
is waiting, process 'a' writes a bunch of data in the file. When 'a'
finishes the writing and unlock the file then process 'b' unblocks and
gets the lock. But, by then (and this is main point), process 'b'
already has got internal information about the opened file that is
outdated.

The only way that I see to avoid the problem is that the information
about the opened file in process 'b' would exclusively reside in the
metadata cache; so by refreshing it (or evicting it) the new processes
can get the correct information. However, that solution does imply
that the HDF5 metadata cache is to be *shared* between both processes,
and I don't think this would be the case.

Hi Francesc,

I have similar issues and I think you're right when you say that this
should be solved at the application layer. It is pretty difficult when the
library cannot manage its own space to have efficient locking. What if for
example the user deletes the file? Or another process wants to move the
file? For the moment I think it is difficult to deal with this effectively
so I will try to solve it with the old hack of the flag: when a process
enters the root (in my case also other major nodes in the tree) I set an
attribute and the last thing the process does is to unset the attribute.
This way I also know if there was an issue and writing failed.

       Setting a flag in the file is not sufficient. It's easy to imagine
race conditions where two processes simultaneously check for the presence of
the flag, determine it doesn't exist and set it, then proceed to modify the
file. Some other mechanism which guarantees exclusive access must be used.
(And even then, you'll have to use the cache management strategies I
mentioned in an earlier mail).

       Note that we've given this a fair bit of thought at the HDF Group
and have some good solutions, but would need to get funding/patches for this
to get into the HDF5 library.

       Quincey

I know it is not sufficient but locking will work only at the application
level and AFAIK a portable solution will try to use whole file locks or
separate lock files but that is done at a different level than HDF lib. For
a process to decide what to do, it has to check the locks and the special
attributes. Note also, that linking HDF5 statically, means each process'
cache is different anyway.

I am sure you've given it a good deal of thought but for the benefit of
having single files and no services/daemons, efficient file locking is
sacrificed and becomes cumbersome.

Best Regards,

-- dimitris

···

2008/9/10 Quincey Koziol <koziol@hdfgroup.org>

On Sep 10, 2008, at 12:35 PM, Dimitris Servis wrote:

2008/9/10 Francesc Alted <faltet@pytables.com>
> On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:

Hi Francesc,

···

2008/9/10 Francesc Alted <faltet@pytables.com>

A Wednesday 10 September 2008, Quincey Koziol escrigué:
> Hi Francesc,
>
> On Sep 10, 2008, at 11:28 AM, Francesc Alted wrote:
> > Quincey,
> >
> > A Wednesday 10 September 2008, escriguéreu:
> >> Hi Francesc,
> >>
> >> On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:
> >>> A Tuesday 09 September 2008, Francesc Alted escrigué:
> >>>> A Tuesday 09 September 2008, Quincey Koziol escrigué:
> >>>> [clip]
> >>>>
> >>>>>>> You are fighting the metadata cache in HDF5. Unfortunately
> >>>>>>> there's currently no way to evict all the entries from the
> >>>>>>> cache, even if you call H5Fflush(), so it's very likely that
> >>>>>>> one or more of the processes will be dealing with stale
> >>>>>>> metadata. I've added a new feature request to our bugzilla
> >>>>>>> database and maybe we'll be able to act on it at some point.
> >>>>>>
> >>>>>> I see. At any rate, I find it curious that locking using a
> >>>>>> regular file
> >>>>>> works flawlessly in the same scenario.
> >>>>>
> >>>>> Locking using a regular file works because you are closing
&
> >>>>> re- opening the HDF5 file for each process (which flushes all
> >>>>> the metadata changes to the file on closing and re-reads them
> >>>>> on re-opening the file).
> >>>>
> >>>> So, when using the HDF5 file itself for locking, as the lock
> >>>> process happens after the library has already opened the file
> >>>> then it already has read bits from stalled metadata cache. Now
> >>>> I definitely see it.
> >>>
> >>> Hmm, not quite. After thinking a bit more on this issue, I think
> >>> now that the problem is not in the metadata cache, but it is a
> >>> more fundamental one: I'm effectively opening a file (and hence,
> >>> reading metadata, either from cache or from disk) *before*
> >>> locking it, and that
> >>> will always lead to wrong results, irregardless of an existing
> >>> cache or
> >>> not.
> >>>
> >>> I can devise a couple of solutions for this. The first one is to
> >>> add a
> >>> new parameter to the H5Fopen to inform it that we want to lock
> >>> the file
> >>> as soon as the file descriptor is allocated and before reading
> >>> any meta-information (either from disk or cache), but that
> >>> implies an API change.
> >>>
> >>> The other solution is to increase the lazyness of the process of
> >>> reading
> >>> the metadata until it is absolutely needed by other functions.
> >>> So, in essence, the H5Fopen() should only basically have to open
> >>> the underlying file descriptor and that's all; then this
> >>> descriptor can be manually locked and the file metadata should be
> >>> read later on, when it is really needed.
> >>>
> >>> All in all, both approaches seems to need too much changes in
> >>> HDF5. Perhaps a better venue is to find alternatives to do the
> >>> locking in the
> >>> application side instead of including the functionality in HDF5
> >>> itself.
> >>
> >> Those are both interesting ideas that I hadn't thought of. What
> >> I was thinking was to evict all the metadata from the cache and
> >> then re- read it from the file. This could be done at any point
> >> after the file was opened, although it would require that all
> >> objects in the file be closed when the cache entries were evicted.
> >
> > Well, I suppose that my ignorance on the internals of HDF5 is
> > preventing
> > me understanding your solution. Let's suppose that we have 2
> > processes
> > on a multi-processor machine. Let's call them process 'a' and
> > process 'b'. Both processes do the same thing: from time to time
> > they open a HDF5 file, lock it, write something on it, and close it
> > (unlocking it).
> >
> > If process 'a' gets the lock first, then process 'b' will *open*
> > the file and will block until the file becomes unlocked. While
> > process 'b'
> > is waiting, process 'a' writes a bunch of data in the file. When
> > 'a' finishes the writing and unlock the file then process 'b'
> > unblocks and gets the lock. But, by then (and this is main point),
> > process 'b' already has got internal information about the opened
> > file that is outdated.
> >
> > The only way that I see to avoid the problem is that the
> > information about the opened file in process 'b' would exclusively
> > reside in the metadata cache; so by refreshing it (or evicting it)
> > the new processes can get the correct information. However, that
> > solution does imply that the HDF5 metadata cache is to be *shared*
> > between both processes, and I don't think this would be the case.
>
> No, the metadata cache doesn't need to be shared between both
> processes. As long as each process evicts all it's metadata from the
> cache after it acquires the lock and flushes it's cache after it's
> done modifying the file but before releasing the lock, everything
> will work fine. Since each process has no knowledge of the contents
> of the file after evicting everything in it's cache, it will always
> get the most recent information from the file and therefore see all
> the changes from previous lock owners.

Ah, I think I finally got what you meant. So, as I understand it, here
it is a small workflow of actions that reproduces your schema:

1. <open_file>
2. <acquire_lock>
3. <evict_metadata>
4. <write_things>
5. <close_file & release_lock & flush_metadata>

However, actions 2 and 3 are required to be manually added by the
developer. Hence, I presume that you were thinking in adding some
function for doing those actions at the same time, right? In that
case, maybe it could be worthwhile to ponder about adding a sort
of 'lock' parameter to the H5Fopen call instead. That way, actions 1,
2 and 3 can be done in just one single step, much the same than action
5, that would close, release the lock and flush metadata. The diagram
action would then looks like:

<open_file & acquire_lock & evict_metadata>
<write_things>
<close_file & release_lock & flush_metadata>

which seems quite handy to my eyes. Well, just a thought.

do you support NFS?

BR

-- dimitris

Hi Francesc, Dimitris, Quincey,

The type of transactions determine the type of locking you want to do.
Databases have typically small transactions, so they need fine-grained
locking; hold the lock for a short period of time and hold the lock for
only that part of the database that needs to be changed.

I think HDF5 does not fall into this category. Usually a lot of data is
read or written, so a lock on the entire file is fine. Furthermore a
lock is held for a longer time period, so the overhead of having to
close and reopen the file can be acceptable.
It is, however, somewhat cumbersome that you also have to close and
reopen all the groups, datasets, etc, so it would be nice if you could
use lock/unlock instead of having to open and close the file. But I fear
there is not much you can do about that. You just cannot be sure that
another process did not change the data structures in the file unless
HDF5 uses some clever (but probably very hard to implement) schemes.

Maybe Francesc and Dimitris can explain what kind of lock granularity
they would like to have and what scenarios they are thinking of. I can
imagine that Francesc would like some finer grained locking for the
PyTables.
One must also consider the overhead in doing unnecessary unlocking.
I.e. if a process only does lock/unlock because there might be another
process accessing the file, you may do a lot of unnecessary flushing.

Note that file locking is supported over NFS, but AFAIK NFS does not
fully guarantee that the remote cache is updated when a file gets
changed.
Also note that Unix/Linux does not remove a file until all file handles
accessing it are closed. So if one process deletes the file, the other
one can still access it. I don't know about Windows.

Cheers,
Ger

"Dimitris Servis" <servisster@gmail.com> 09/10/08 8:16 PM >>>

Hi Quincey,

Hi Dimitris,

Quincey,

A Wednesday 10 September 2008, escriguéreu:
> Hi Francesc,
>
> > A Tuesday 09 September 2008, Francesc Alted escrigué:
> >> A Tuesday 09 September 2008, Quincey Koziol escrigué:
> >> [clip]
> >>
> >>>>> You are fighting the metadata cache in HDF5.
Unfortunately
> >>>>> there's currently no way to evict all the entries from the
> >>>>> cache, even if you call H5Fflush(), so it's very likely

that

> >>>>> one or more of the processes will be dealing with stale
> >>>>> metadata. I've added a new feature request to our bugzilla
> >>>>> database and maybe we'll be able to act on it at some

point.

> >>>>
> >>>> I see. At any rate, I find it curious that locking using a
> >>>> regular file
> >>>> works flawlessly in the same scenario.
> >>>
> >>> Locking using a regular file works because you are closing &

re-

> >>> opening the HDF5 file for each process (which flushes all the
> >>> metadata changes to the file on closing and re-reads them on
> >>> re-opening the file).
> >>
> >> So, when using the HDF5 file itself for locking, as the lock
> >> process happens after the library has already opened the file

then

> >> it already has read bits from stalled metadata cache. Now I
> >> definitely see it.
> >
> > Hmm, not quite. After thinking a bit more on this issue, I

think

> > now that the problem is not in the metadata cache, but it is a

more

> > fundamental one: I'm effectively opening a file (and hence,

reading

> > metadata, either from cache or from disk) *before* locking it,

and

> > that
> > will always lead to wrong results, irregardless of an existing
> > cache or
> > not.
> >
> > I can devise a couple of solutions for this. The first one is

to

> > add a
> > new parameter to the H5Fopen to inform it that we want to lock

the

> > file
> > as soon as the file descriptor is allocated and before reading

any

> > meta-information (either from disk or cache), but that implies

an

> > API change.
> >
> > The other solution is to increase the lazyness of the process

of

> > reading
> > the metadata until it is absolutely needed by other functions.

So,

> > in essence, the H5Fopen() should only basically have to open

the

> > underlying file descriptor and that's all; then this descriptor

can

> > be manually locked and the file metadata should be read later

on,

> > when it is really needed.
> >
> > All in all, both approaches seems to need too much changes in

HDF5.

> > Perhaps a better venue is to find alternatives to do the locking

in

> > the
> > application side instead of including the functionality in HDF5
> > itself.
>
> Those are both interesting ideas that I hadn't thought of.

What I

> was thinking was to evict all the metadata from the cache and

then

> re- read it from the file. This could be done at any point after

the

> file was opened, although it would require that all objects in

the

> file be closed when the cache entries were evicted.

Well, I suppose that my ignorance on the internals of HDF5 is

preventing

me understanding your solution. Let's suppose that we have 2

processes

on a multi-processor machine. Let's call them process 'a' and
process 'b'. Both processes do the same thing: from time to time

they

open a HDF5 file, lock it, write something on it, and close it
(unlocking it).

If process 'a' gets the lock first, then process 'b' will *open*

the

file and will block until the file becomes unlocked. While process

'b'

is waiting, process 'a' writes a bunch of data in the file. When

'a'

finishes the writing and unlock the file then process 'b' unblocks

and

gets the lock. But, by then (and this is main point), process 'b'
already has got internal information about the opened file that is
outdated.

The only way that I see to avoid the problem is that the

information

about the opened file in process 'b' would exclusively reside in

the

metadata cache; so by refreshing it (or evicting it) the new

processes

can get the correct information. However, that solution does imply
that the HDF5 metadata cache is to be *shared* between both

processes,

and I don't think this would be the case.

Hi Francesc,

I have similar issues and I think you're right when you say that

this

should be solved at the application layer. It is pretty difficult

when the

library cannot manage its own space to have efficient locking. What

if for

example the user deletes the file? Or another process wants to move

the

file? For the moment I think it is difficult to deal with this

effectively

so I will try to solve it with the old hack of the flag: when a

process

enters the root (in my case also other major nodes in the tree) I

set an

attribute and the last thing the process does is to unset the

attribute.

This way I also know if there was an issue and writing failed.

       Setting a flag in the file is not sufficient. It's easy to

imagine

race conditions where two processes simultaneously check for the

presence of

the flag, determine it doesn't exist and set it, then proceed to

modify the

file. Some other mechanism which guarantees exclusive access must be

used.

(And even then, you'll have to use the cache management strategies

I

mentioned in an earlier mail).

       Note that we've given this a fair bit of thought at the HDF

Group

and have some good solutions, but would need to get funding/patches

for this

to get into the HDF5 library.

       Quincey

I know it is not sufficient but locking will work only at the
application
level and AFAIK a portable solution will try to use whole file locks
or
separate lock files but that is done at a different level than HDF lib.
For
a process to decide what to do, it has to check the locks and the
special
attributes. Note also, that linking HDF5 statically, means each
process'
cache is different anyway.

I am sure you've given it a good deal of thought but for the benefit
of
having single files and no services/daemons, efficient file locking is
sacrificed and becomes cumbersome.

Best Regards,

-- dimitris

···

2008/9/10 Quincey Koziol <koziol@hdfgroup.org>

On Sep 10, 2008, at 12:35 PM, Dimitris Servis wrote:

2008/9/10 Francesc Alted <faltet@pytables.com>
> On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Dimitris,

A Wednesday 10 September 2008, Dimitris Servis escrigué:
[clip]

> > No, the metadata cache doesn't need to be shared between
> > both processes. As long as each process evicts all it's metadata
> > from the cache after it acquires the lock and flushes it's cache
> > after it's done modifying the file but before releasing the lock,
> > everything will work fine. Since each process has no knowledge
> > of the contents of the file after evicting everything in it's
> > cache, it will always get the most recent information from the
> > file and therefore see all the changes from previous lock owners.
>
> Ah, I think I finally got what you meant. So, as I understand it,
> here it is a small workflow of actions that reproduces your schema:
>
> 1. <open_file>
> 2. <acquire_lock>
> 3. <evict_metadata>
> 4. <write_things>
> 5. <close_file & release_lock & flush_metadata>
>
> However, actions 2 and 3 are required to be manually added by the
> developer. Hence, I presume that you were thinking in adding some
> function for doing those actions at the same time, right? In that
> case, maybe it could be worthwhile to ponder about adding a sort
> of 'lock' parameter to the H5Fopen call instead. That way, actions
> 1, 2 and 3 can be done in just one single step, much the same than
> action 5, that would close, release the lock and flush metadata.
> The diagram action would then looks like:
>
> <open_file & acquire_lock & evict_metadata>
> <write_things>
> <close_file & release_lock & flush_metadata>
>
> which seems quite handy to my eyes. Well, just a thought.

do you support NFS?

Sorry, but I don't know what you are referring to exactly. Could you be
a more explicit please?

···

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Ger,

Hi Francesc, Dimitris, Quincey,

The type of transactions determine the type of locking you want to do.
Databases have typically small transactions, so they need fine-grained
locking; hold the lock for a short period of time and hold the lock for
only that part of the database that needs to be changed.

I think HDF5 does not fall into this category. Usually a lot of data is
read or written, so a lock on the entire file is fine. Furthermore a
lock is held for a longer time period, so the overhead of having to
close and reopen the file can be acceptable.
It is, however, somewhat cumbersome that you also have to close and
reopen all the groups, datasets, etc, so it would be nice if you could
use lock/unlock instead of having to open and close the file. But I fear
there is not much you can do about that. You just cannot be sure that
another process did not change the data structures in the file unless
HDF5 uses some clever (but probably very hard to implement) schemes.

  I'm guessing that the "sweet spot" for locking in HDF5 files will be at the "object" (i.e. dataset, group, etc) level. I think the "file" level will perform too poorly from the extra flushing/reloading (as you mention below) and that having locking at the "element" level [for datasets] will have too much locking overhead. It's possible that some sort of "range" locking for a group of elements in a dataset would be reasonable also...

  Quincey

···

On Sep 11, 2008, at 2:11 AM, Ger van Diepen wrote:

Maybe Francesc and Dimitris can explain what kind of lock granularity
they would like to have and what scenarios they are thinking of. I can
imagine that Francesc would like some finer grained locking for the
PyTables.
One must also consider the overhead in doing unnecessary unlocking.
I.e. if a process only does lock/unlock because there might be another
process accessing the file, you may do a lot of unnecessary flushing.

Note that file locking is supported over NFS, but AFAIK NFS does not
fully guarantee that the remote cache is updated when a file gets
changed.
Also note that Unix/Linux does not remove a file until all file handles
accessing it are closed. So if one process deletes the file, the other
one can still access it. I don't know about Windows.

Cheers,
Ger

"Dimitris Servis" <servisster@gmail.com> 09/10/08 8:16 PM >>>

Hi Quincey,

2008/9/10 Quincey Koziol <koziol@hdfgroup.org>

Hi Dimitris,

On Sep 10, 2008, at 12:35 PM, Dimitris Servis wrote:

2008/9/10 Francesc Alted <faltet@pytables.com>
Quincey,

A Wednesday 10 September 2008, escriguéreu:

Hi Francesc,

On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:

A Tuesday 09 September 2008, Francesc Alted escrigué:

A Tuesday 09 September 2008, Quincey Koziol escrigué:
[clip]

       You are fighting the metadata cache in HDF5.

Unfortunately

there's currently no way to evict all the entries from the
cache, even if you call H5Fflush(), so it's very likely

that

one or more of the processes will be dealing with stale
metadata. I've added a new feature request to our bugzilla
database and maybe we'll be able to act on it at some

point.

I see. At any rate, I find it curious that locking using a
regular file
works flawlessly in the same scenario.

Locking using a regular file works because you are closing &

re-

opening the HDF5 file for each process (which flushes all the
metadata changes to the file on closing and re-reads them on
re-opening the file).

So, when using the HDF5 file itself for locking, as the lock
process happens after the library has already opened the file

then

it already has read bits from stalled metadata cache. Now I
definitely see it.

Hmm, not quite. After thinking a bit more on this issue, I

think

now that the problem is not in the metadata cache, but it is a

more

fundamental one: I'm effectively opening a file (and hence,

reading

metadata, either from cache or from disk) *before* locking it,

and

that
will always lead to wrong results, irregardless of an existing
cache or
not.

I can devise a couple of solutions for this. The first one is

to

add a
new parameter to the H5Fopen to inform it that we want to lock

the

file
as soon as the file descriptor is allocated and before reading

any

meta-information (either from disk or cache), but that implies

an

API change.

The other solution is to increase the lazyness of the process

of

reading
the metadata until it is absolutely needed by other functions.

So,

in essence, the H5Fopen() should only basically have to open

the

underlying file descriptor and that's all; then this descriptor

can

be manually locked and the file metadata should be read later

on,

when it is really needed.

All in all, both approaches seems to need too much changes in

HDF5.

Perhaps a better venue is to find alternatives to do the locking

in

the
application side instead of including the functionality in HDF5
itself.

     Those are both interesting ideas that I hadn't thought of.

What I

was thinking was to evict all the metadata from the cache and

then

re- read it from the file. This could be done at any point after

the

file was opened, although it would require that all objects in

the

file be closed when the cache entries were evicted.

Well, I suppose that my ignorance on the internals of HDF5 is

preventing

me understanding your solution. Let's suppose that we have 2

processes

on a multi-processor machine. Let's call them process 'a' and
process 'b'. Both processes do the same thing: from time to time

they

open a HDF5 file, lock it, write something on it, and close it
(unlocking it).

If process 'a' gets the lock first, then process 'b' will *open*

the

file and will block until the file becomes unlocked. While process

'b'

is waiting, process 'a' writes a bunch of data in the file. When

'a'

finishes the writing and unlock the file then process 'b' unblocks

and

gets the lock. But, by then (and this is main point), process 'b'
already has got internal information about the opened file that is
outdated.

The only way that I see to avoid the problem is that the

information

about the opened file in process 'b' would exclusively reside in

the

metadata cache; so by refreshing it (or evicting it) the new

processes

can get the correct information. However, that solution does imply
that the HDF5 metadata cache is to be *shared* between both

processes,

and I don't think this would be the case.

Hi Francesc,

I have similar issues and I think you're right when you say that

this

should be solved at the application layer. It is pretty difficult

when the

library cannot manage its own space to have efficient locking. What

if for

example the user deletes the file? Or another process wants to move

the

file? For the moment I think it is difficult to deal with this

effectively

so I will try to solve it with the old hack of the flag: when a

process

enters the root (in my case also other major nodes in the tree) I

set an

attribute and the last thing the process does is to unset the

attribute.

This way I also know if there was an issue and writing failed.

      Setting a flag in the file is not sufficient. It's easy to

imagine

race conditions where two processes simultaneously check for the

presence of

the flag, determine it doesn't exist and set it, then proceed to

modify the

file. Some other mechanism which guarantees exclusive access must be

used.

(And even then, you'll have to use the cache management strategies

I

mentioned in an earlier mail).

      Note that we've given this a fair bit of thought at the HDF

Group

and have some good solutions, but would need to get funding/patches

for this

to get into the HDF5 library.

      Quincey

I know it is not sufficient but locking will work only at the
application
level and AFAIK a portable solution will try to use whole file locks
or
separate lock files but that is done at a different level than HDF lib.
For
a process to decide what to do, it has to check the locks and the
special
attributes. Note also, that linking HDF5 statically, means each
process'
cache is different anyway.

I am sure you've given it a good deal of thought but for the benefit
of
having single files and no services/daemons, efficient file locking is
sacrificed and becomes cumbersome.

Best Regards,

-- dimitris

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Francesc,

···

2008/9/11 Francesc Alted <faltet@pytables.com>

Hi Dimitris,

A Wednesday 10 September 2008, Dimitris Servis escrigué:
[clip]
> > > No, the metadata cache doesn't need to be shared between
> > > both processes. As long as each process evicts all it's metadata
> > > from the cache after it acquires the lock and flushes it's cache
> > > after it's done modifying the file but before releasing the lock,
> > > everything will work fine. Since each process has no knowledge
> > > of the contents of the file after evicting everything in it's
> > > cache, it will always get the most recent information from the
> > > file and therefore see all the changes from previous lock owners.
> >
> > Ah, I think I finally got what you meant. So, as I understand it,
> > here it is a small workflow of actions that reproduces your schema:
> >
> > 1. <open_file>
> > 2. <acquire_lock>
> > 3. <evict_metadata>
> > 4. <write_things>
> > 5. <close_file & release_lock & flush_metadata>
> >
> > However, actions 2 and 3 are required to be manually added by the
> > developer. Hence, I presume that you were thinking in adding some
> > function for doing those actions at the same time, right? In that
> > case, maybe it could be worthwhile to ponder about adding a sort
> > of 'lock' parameter to the H5Fopen call instead. That way, actions
> > 1, 2 and 3 can be done in just one single step, much the same than
> > action 5, that would close, release the lock and flush metadata.
> > The diagram action would then looks like:
> >
> > <open_file & acquire_lock & evict_metadata>
> > <write_things>
> > <close_file & release_lock & flush_metadata>
> >
> > which seems quite handy to my eyes. Well, just a thought.
>
> do you support NFS?

Sorry, but I don't know what you are referring to exactly. Could you be
a more explicit please?

My apologies, sorry :-). My point was if you want your app to work over NFS,
you simply cannot rely on processes sharing metadata. Processes may reside
on different machines.

Regards,

-- dimitris

Hi Ger,

A Thursday 11 September 2008, Ger van Diepen escrigué:

Hi Francesc, Dimitris, Quincey,

The type of transactions determine the type of locking you want to
do. Databases have typically small transactions, so they need
fine-grained locking; hold the lock for a short period of time and
hold the lock for only that part of the database that needs to be
changed.

I think HDF5 does not fall into this category. Usually a lot of data
is read or written, so a lock on the entire file is fine. Furthermore
a lock is held for a longer time period, so the overhead of having to
close and reopen the file can be acceptable.

Yeah, this is my impression too.

It is, however, somewhat cumbersome that you also have to close and
reopen all the groups, datasets, etc, so it would be nice if you
could use lock/unlock instead of having to open and close the file.
But I fear there is not much you can do about that. You just cannot
be sure that another process did not change the data structures in
the file unless HDF5 uses some clever (but probably very hard to
implement) schemes.

Cumbersome? in what sense? For example, PyTables keeps track of all its
opened nodes (they live in its own internal metadata LRU cache), and
when the user ask to close the file, all the opened nodes (groups,
datasets) are closed automatically (in both PyTables and HDF5 levels).
I don't know about HDF5, but if it doesn't do the same, that would be a
handy thing to implement (bar side effects that I don't see right now).

Maybe Francesc and Dimitris can explain what kind of lock granularity
they would like to have and what scenarios they are thinking of. I
can imagine that Francesc would like some finer grained locking for
the PyTables.
One must also consider the overhead in doing unnecessary unlocking.
I.e. if a process only does lock/unlock because there might be
another process accessing the file, you may do a lot of unnecessary
flushing.

I'm mainly looking about the locking functionality because a user
required it:

http://www.pytables.org/trac/ticket/185

And well, locking at file level would be enough for the time being, yes.
More fine grained locking would require direct HDF5 support for this,
and I am afraid that that would imply too many changes on it.

Note that file locking is supported over NFS, but AFAIK NFS does not
fully guarantee that the remote cache is updated when a file gets
changed.

Yeah, I don't know lately, but figthing with locking and NFS has always
been a difficult subject, to say the least.

Cheers,
Francesc

···

Also note that Unix/Linux does not remove a file until all file
handles accessing it are closed. So if one process deletes the file,
the other one can still access it. I don't know about Windows.

Cheers,
Ger

>>> "Dimitris Servis" <servisster@gmail.com> 09/10/08 8:16 PM >>>

Hi Quincey,

2008/9/10 Quincey Koziol <koziol@hdfgroup.org>

> Hi Dimitris,
>
> On Sep 10, 2008, at 12:35 PM, Dimitris Servis wrote:
>> 2008/9/10 Francesc Alted <faltet@pytables.com>
>> Quincey,
>>
>> A Wednesday 10 September 2008, escriguéreu:
>> > Hi Francesc,
>> >
>> > On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:
>> > > A Tuesday 09 September 2008, Francesc Alted escrigué:
>> > >> A Tuesday 09 September 2008, Quincey Koziol escrigué:
>> > >> [clip]
>> > >>
>> > >>>>> You are fighting the metadata cache in HDF5.
>>
>> Unfortunately
>>
>> > >>>>> there's currently no way to evict all the entries from the
>> > >>>>> cache, even if you call H5Fflush(), so it's very likely

that

>> > >>>>> one or more of the processes will be dealing with stale
>> > >>>>> metadata. I've added a new feature request to our bugzilla
>> > >>>>> database and maybe we'll be able to act on it at some

point.

>> > >>>> I see. At any rate, I find it curious that locking using a
>> > >>>> regular file
>> > >>>> works flawlessly in the same scenario.
>> > >>>
>> > >>> Locking using a regular file works because you are closing
>> > >>> &

re-

>> > >>> opening the HDF5 file for each process (which flushes all
>> > >>> the metadata changes to the file on closing and re-reads
>> > >>> them on re-opening the file).
>> > >>
>> > >> So, when using the HDF5 file itself for locking, as the lock
>> > >> process happens after the library has already opened the file

then

>> > >> it already has read bits from stalled metadata cache. Now I
>> > >> definitely see it.
>> > >
>> > > Hmm, not quite. After thinking a bit more on this issue, I

think

>> > > now that the problem is not in the metadata cache, but it is a

more

>> > > fundamental one: I'm effectively opening a file (and hence,

reading

>> > > metadata, either from cache or from disk) *before* locking it,

and

>> > > that
>> > > will always lead to wrong results, irregardless of an existing
>> > > cache or
>> > > not.
>> > >
>> > > I can devise a couple of solutions for this. The first one is

to

>> > > add a
>> > > new parameter to the H5Fopen to inform it that we want to lock

the

>> > > file
>> > > as soon as the file descriptor is allocated and before reading

any

>> > > meta-information (either from disk or cache), but that implies

an

>> > > API change.
>> > >
>> > > The other solution is to increase the lazyness of the process

of

>> > > reading
>> > > the metadata until it is absolutely needed by other functions.

So,

>> > > in essence, the H5Fopen() should only basically have to open

the

>> > > underlying file descriptor and that's all; then this
>> > > descriptor

can

>> > > be manually locked and the file metadata should be read later

on,

>> > > when it is really needed.
>> > >
>> > > All in all, both approaches seems to need too much changes in

HDF5.

>> > > Perhaps a better venue is to find alternatives to do the
>> > > locking

in

>> > > the
>> > > application side instead of including the functionality in
>> > > HDF5 itself.
>> >
>> > Those are both interesting ideas that I hadn't thought of.

What I

>> > was thinking was to evict all the metadata from the cache and

then

>> > re- read it from the file. This could be done at any point
>> > after

the

>> > file was opened, although it would require that all objects in

the

>> > file be closed when the cache entries were evicted.
>>
>> Well, I suppose that my ignorance on the internals of HDF5 is

preventing

>> me understanding your solution. Let's suppose that we have 2

processes

>> on a multi-processor machine. Let's call them process 'a' and
>> process 'b'. Both processes do the same thing: from time to time

they

>> open a HDF5 file, lock it, write something on it, and close it
>> (unlocking it).
>>
>> If process 'a' gets the lock first, then process 'b' will *open*

the

>> file and will block until the file becomes unlocked. While
>> process

'b'

>> is waiting, process 'a' writes a bunch of data in the file. When

'a'

>> finishes the writing and unlock the file then process 'b' unblocks

and

>> gets the lock. But, by then (and this is main point), process 'b'
>> already has got internal information about the opened file that is
>> outdated.
>>
>> The only way that I see to avoid the problem is that the

information

>> about the opened file in process 'b' would exclusively reside in

the

>> metadata cache; so by refreshing it (or evicting it) the new

processes

>> can get the correct information. However, that solution does
>> imply that the HDF5 metadata cache is to be *shared* between both

processes,

>> and I don't think this would be the case.
>>
>> Hi Francesc,
>>
>> I have similar issues and I think you're right when you say that

this

>> should be solved at the application layer. It is pretty difficult

when the

>> library cannot manage its own space to have efficient locking.
>> What

if for

>> example the user deletes the file? Or another process wants to
>> move

the

>> file? For the moment I think it is difficult to deal with this

effectively

>> so I will try to solve it with the old hack of the flag: when a

process

>> enters the root (in my case also other major nodes in the tree) I

set an

>> attribute and the last thing the process does is to unset the

attribute.

>> This way I also know if there was an issue and writing failed.
>
> Setting a flag in the file is not sufficient. It's easy to

imagine

> race conditions where two processes simultaneously check for the

presence of

> the flag, determine it doesn't exist and set it, then proceed to

modify the

> file. Some other mechanism which guarantees exclusive access must
> be

used.

> (And even then, you'll have to use the cache management strategies

I

> mentioned in an earlier mail).
>
> Note that we've given this a fair bit of thought at the HDF

Group

> and have some good solutions, but would need to get funding/patches

for this

> to get into the HDF5 library.
>
> Quincey

I know it is not sufficient but locking will work only at the
application
level and AFAIK a portable solution will try to use whole file locks
or
separate lock files but that is done at a different level than HDF
lib. For
a process to decide what to do, it has to check the locks and the
special
attributes. Note also, that linking HDF5 statically, means each
process'
cache is different anyway.

I am sure you've given it a good deal of thought but for the benefit
of
having single files and no services/daemons, efficient file locking
is sacrificed and becomes cumbersome.

Best Regards,

-- dimitris

---------------------------------------------------------------------
- This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to
hdf-forum-subscribe@hdfgroup.org. To unsubscribe, send a message to
hdf-forum-unsubscribe@hdfgroup.org.

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.