Also note that Unix/Linux does not remove a file until all file
handles accessing it are closed. So if one process deletes the file,
the other one can still access it. I don't know about Windows.
Cheers,
Ger
>>> "Dimitris Servis" <servisster@gmail.com> 09/10/08 8:16 PM >>>
Hi Quincey,
2008/9/10 Quincey Koziol <koziol@hdfgroup.org>
> Hi Dimitris,
>
> On Sep 10, 2008, at 12:35 PM, Dimitris Servis wrote:
>> 2008/9/10 Francesc Alted <faltet@pytables.com>
>> Quincey,
>>
>> A Wednesday 10 September 2008, escriguéreu:
>> > Hi Francesc,
>> >
>> > On Sep 10, 2008, at 6:11 AM, Francesc Alted wrote:
>> > > A Tuesday 09 September 2008, Francesc Alted escrigué:
>> > >> A Tuesday 09 September 2008, Quincey Koziol escrigué:
>> > >> [clip]
>> > >>
>> > >>>>> You are fighting the metadata cache in HDF5.
>>
>> Unfortunately
>>
>> > >>>>> there's currently no way to evict all the entries from the
>> > >>>>> cache, even if you call H5Fflush(), so it's very likely
that
>> > >>>>> one or more of the processes will be dealing with stale
>> > >>>>> metadata. I've added a new feature request to our bugzilla
>> > >>>>> database and maybe we'll be able to act on it at some
point.
>> > >>>> I see. At any rate, I find it curious that locking using a
>> > >>>> regular file
>> > >>>> works flawlessly in the same scenario.
>> > >>>
>> > >>> Locking using a regular file works because you are closing
>> > >>> &
re-
>> > >>> opening the HDF5 file for each process (which flushes all
>> > >>> the metadata changes to the file on closing and re-reads
>> > >>> them on re-opening the file).
>> > >>
>> > >> So, when using the HDF5 file itself for locking, as the lock
>> > >> process happens after the library has already opened the file
then
>> > >> it already has read bits from stalled metadata cache. Now I
>> > >> definitely see it.
>> > >
>> > > Hmm, not quite. After thinking a bit more on this issue, I
think
>> > > now that the problem is not in the metadata cache, but it is a
more
>> > > fundamental one: I'm effectively opening a file (and hence,
reading
>> > > metadata, either from cache or from disk) *before* locking it,
and
>> > > that
>> > > will always lead to wrong results, irregardless of an existing
>> > > cache or
>> > > not.
>> > >
>> > > I can devise a couple of solutions for this. The first one is
to
>> > > add a
>> > > new parameter to the H5Fopen to inform it that we want to lock
the
>> > > file
>> > > as soon as the file descriptor is allocated and before reading
any
>> > > meta-information (either from disk or cache), but that implies
an
>> > > API change.
>> > >
>> > > The other solution is to increase the lazyness of the process
of
>> > > reading
>> > > the metadata until it is absolutely needed by other functions.
So,
>> > > in essence, the H5Fopen() should only basically have to open
the
>> > > underlying file descriptor and that's all; then this
>> > > descriptor
can
>> > > be manually locked and the file metadata should be read later
on,
>> > > when it is really needed.
>> > >
>> > > All in all, both approaches seems to need too much changes in
HDF5.
>> > > Perhaps a better venue is to find alternatives to do the
>> > > locking
in
>> > > the
>> > > application side instead of including the functionality in
>> > > HDF5 itself.
>> >
>> > Those are both interesting ideas that I hadn't thought of.
What I
>> > was thinking was to evict all the metadata from the cache and
then
>> > re- read it from the file. This could be done at any point
>> > after
the
>> > file was opened, although it would require that all objects in
the
>> > file be closed when the cache entries were evicted.
>>
>> Well, I suppose that my ignorance on the internals of HDF5 is
preventing
>> me understanding your solution. Let's suppose that we have 2
processes
>> on a multi-processor machine. Let's call them process 'a' and
>> process 'b'. Both processes do the same thing: from time to time
they
>> open a HDF5 file, lock it, write something on it, and close it
>> (unlocking it).
>>
>> If process 'a' gets the lock first, then process 'b' will *open*
the
>> file and will block until the file becomes unlocked. While
>> process
'b'
>> is waiting, process 'a' writes a bunch of data in the file. When
'a'
>> finishes the writing and unlock the file then process 'b' unblocks
and
>> gets the lock. But, by then (and this is main point), process 'b'
>> already has got internal information about the opened file that is
>> outdated.
>>
>> The only way that I see to avoid the problem is that the
information
>> about the opened file in process 'b' would exclusively reside in
the
>> metadata cache; so by refreshing it (or evicting it) the new
processes
>> can get the correct information. However, that solution does
>> imply that the HDF5 metadata cache is to be *shared* between both
processes,
>> and I don't think this would be the case.
>>
>> Hi Francesc,
>>
>> I have similar issues and I think you're right when you say that
this
>> should be solved at the application layer. It is pretty difficult
when the
>> library cannot manage its own space to have efficient locking.
>> What
if for
>> example the user deletes the file? Or another process wants to
>> move
the
>> file? For the moment I think it is difficult to deal with this
effectively
>> so I will try to solve it with the old hack of the flag: when a
process
>> enters the root (in my case also other major nodes in the tree) I
set an
>> attribute and the last thing the process does is to unset the
attribute.
>> This way I also know if there was an issue and writing failed.
>
> Setting a flag in the file is not sufficient. It's easy to
imagine
> race conditions where two processes simultaneously check for the
presence of
> the flag, determine it doesn't exist and set it, then proceed to
modify the
> file. Some other mechanism which guarantees exclusive access must
> be
used.
> (And even then, you'll have to use the cache management strategies
I
> mentioned in an earlier mail).
>
> Note that we've given this a fair bit of thought at the HDF
Group
> and have some good solutions, but would need to get funding/patches
for this
> to get into the HDF5 library.
>
> Quincey
I know it is not sufficient but locking will work only at the
application
level and AFAIK a portable solution will try to use whole file locks
or
separate lock files but that is done at a different level than HDF
lib. For
a process to decide what to do, it has to check the locks and the
special
attributes. Note also, that linking HDF5 statically, means each
process'
cache is different anyway.
I am sure you've given it a good deal of thought but for the benefit
of
having single files and no services/daemons, efficient file locking
is sacrificed and becomes cumbersome.
Best Regards,
-- dimitris
---------------------------------------------------------------------
- This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to
hdf-forum-subscribe@hdfgroup.org. To unsubscribe, send a message to
hdf-forum-unsubscribe@hdfgroup.org.