concurrent access

Jason_Sachs · May 29, 2008, 5:13pm

I was wondering where I could find some more technical details about
concurrent reading/writing.

The FAQ discusses it briefly
(http://www.hdfgroup.org/hdf5-quest.html#grdwt):

<excerpt>
It is possible for multiple processes to read an HDF5 file when it is
being written to, and still read correct data. (The following steps
should be followed, EVEN IF the dataset that is being written to is
different than the datasets that are read.)

Here's what needs to be done:

* Call H5Fflush() from the writing process.

* The writing process _must_ wait until either a copy of the file is
made for the reading process, or the reading process is done accessing
the file (so that more data isn't written to the file, giving the reader
an inconsistent view of the file's state).

* The reading process _must_ open the file (it cannot have the file
open before the writing process flushes its information, or it runs the
risk of having its data cached in memory being incorrect with respect to
the state of the file) and read whatever information it wants.

* The reading process must close the file.

* The writing process may now proceed to write more data to the
file.

There must also be some mechanism for the writing process to signal the
reading process that the file is ready for reading and some way for the
reading process to signal the writing process that the file may be
written to again.
</excerpt>

Could someone elaborate in a more technical manner? e.g. SWMR
(single-writer multiple-reader) can occur if the following is true (not
sure if I have this correct; I use "process" rather than "threads" here
& am not sure if HDF5 in-memory caches have thread affinity):

1. At all times the file is in one of the following states:
(a) unmodified
(b) modified (written to, but not flushed)

2. In the unmodified state, zero or more processes may have the file
open. No process may write to the data.

3. In the modified state, exactly one process may have the file open.
This is the process that can write to it.

4. A successful transition from the unmodified state -> modified state
takes place when exactly one process has the file open and begins
writing to it.

5. A successful transition from the modified state -> unmodified state
takes place when the process that has written to the file completes a
successful call to H5Fflush().

The facilities to ensure that only one process has the file open for (4)
above are not provided by the HDF5 library and must be provided by
OS-specific facilities e.g. mutexes/semaphores/messaging/etc.

···

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Sheshadri_Mantha · May 30, 2008, 8:36pm

Hello;
i've got an implementation which uses HL API and i run multiple writers and possibly one reader. The writers go to the same os file but different hdf files.

in use case scenario, the reader and writer are operational on same hdf asset at the same time. this reader is also written in a manner that if it reaches EOF, then it'll wait sometime and then proceed reading.

all this is for win32/vc++... not sure if the same applies to *nix. and it works fine.

the only thing i needed to do was to enable multi-threading building of HDF5 and HL. i think there is a link on how to do that... i believe one need only define the symbol "H5_HAVE_THREADSAFE" and uncomment some commented out lines in H5pubconf.h.

not sure that answers your questions... and... hope it helps.

regards,
Sheshadri

Jason Sachs wrote:

···

I was wondering where I could find some more technical details about
concurrent reading/writing.

The FAQ discusses it briefly
(http://www.hdfgroup.org/hdf5-quest.html#grdwt\):

<excerpt>
It is possible for multiple processes to read an HDF5 file when it is
being written to, and still read correct data. (The following steps
should be followed, EVEN IF the dataset that is being written to is
different than the datasets that are read.)

Here's what needs to be done:

    * Call H5Fflush() from the writing process.

    * The writing process _must_ wait until either a copy of the file is
made for the reading process, or the reading process is done accessing
the file (so that more data isn't written to the file, giving the reader
an inconsistent view of the file's state).

    * The reading process _must_ open the file (it cannot have the file
open before the writing process flushes its information, or it runs the
risk of having its data cached in memory being incorrect with respect to
the state of the file) and read whatever information it wants.

    * The reading process must close the file.

    * The writing process may now proceed to write more data to the
file.

There must also be some mechanism for the writing process to signal the
reading process that the file is ready for reading and some way for the
reading process to signal the writing process that the file may be
written to again.
</excerpt>

Could someone elaborate in a more technical manner? e.g. SWMR
(single-writer multiple-reader) can occur if the following is true (not
sure if I have this correct; I use "process" rather than "threads" here
& am not sure if HDF5 in-memory caches have thread affinity):

1. At all times the file is in one of the following states:
(a) unmodified
(b) modified (written to, but not flushed)

2. In the unmodified state, zero or more processes may have the file
open. No process may write to the data.

3. In the modified state, exactly one process may have the file open.
This is the process that can write to it.

4. A successful transition from the unmodified state -> modified state
takes place when exactly one process has the file open and begins
writing to it.

5. A successful transition from the modified state -> unmodified state
takes place when the process that has written to the file completes a
successful call to H5Fflush().

The facilities to ensure that only one process has the file open for (4)
above are not provided by the HDF5 library and must be provided by
OS-specific facilities e.g. mutexes/semaphores/messaging/etc.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Quincey_Koziol · May 31, 2008, 8:22pm

Hi Jason,

I was wondering where I could find some more technical details about
concurrent reading/writing.

The FAQ discusses it briefly
(http://www.hdfgroup.org/hdf5-quest.html#grdwt\):

<excerpt>
It is possible for multiple processes to read an HDF5 file when it is
being written to, and still read correct data. (The following steps
should be followed, EVEN IF the dataset that is being written to is
different than the datasets that are read.)

Here's what needs to be done:

   * Call H5Fflush() from the writing process.

   * The writing process _must_ wait until either a copy of the file is
made for the reading process, or the reading process is done accessing
the file (so that more data isn't written to the file, giving the reader
an inconsistent view of the file's state).

   * The reading process _must_ open the file (it cannot have the file
open before the writing process flushes its information, or it runs the
risk of having its data cached in memory being incorrect with respect to
the state of the file) and read whatever information it wants.

   * The reading process must close the file.

   * The writing process may now proceed to write more data to the
file.

There must also be some mechanism for the writing process to signal the
reading process that the file is ready for reading and some way for the
reading process to signal the writing process that the file may be
written to again.
</excerpt>

Could someone elaborate in a more technical manner? e.g. SWMR
(single-writer multiple-reader) can occur if the following is true (not
sure if I have this correct; I use "process" rather than "threads" here
& am not sure if HDF5 in-memory caches have thread affinity):

1. At all times the file is in one of the following states:
(a) unmodified
(b) modified (written to, but not flushed)

2. In the unmodified state, zero or more processes may have the file
open. No process may write to the data.

3. In the modified state, exactly one process may have the file open.
This is the process that can write to it.

4. A successful transition from the unmodified state -> modified state
takes place when exactly one process has the file open and begins
writing to it.

5. A successful transition from the modified state -> unmodified state
takes place when the process that has written to the file completes a
successful call to H5Fflush().

The facilities to ensure that only one process has the file open for (4)
above are not provided by the HDF5 library and must be provided by
OS-specific facilities e.g. mutexes/semaphores/messaging/etc.

I'm not certain whether you are talking about multi-process access or multi-thread access. For multi-thread access, the HDF5 library is already threadsafe, although there isn't any concurrency currently (there's a single semaphore for all threads to acquire). Just enable the --enable-threadsafe flag at configure time when installing the HDF5 package.

The excerpt you have included above is about multi-process access to the same HDF5 file. This "protocol" needs to occur in order for HDF5's cache of information for the file to be managed properly and safely.

Quincey

···

On May 29, 2008, at 12:13 PM, Jason Sachs wrote:

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Quincey_Koziol · May 31, 2008, 8:24pm

Hi Sheshadri,

Hello;
i've got an implementation which uses HL API and i run multiple writers and possibly one reader. The writers go to the same os file but different hdf files.
in use case scenario, the reader and writer are operational on same hdf asset at the same time. this reader is also written in a manner that if it reaches EOF, then it'll wait sometime and then proceed reading.

all this is for win32/vc++... not sure if the same applies to *nix. and it works fine.

the only thing i needed to do was to enable multi-threading building of HDF5 and HL. i think there is a link on how to do that... i believe one need only define the symbol "H5_HAVE_THREADSAFE" and uncomment some commented out lines in H5pubconf.h.

not sure that answers your questions... and... hope it helps.

This is fine for multi-threaded access to the same HDF5 file, although the "--enable-threadsafe" configure should be used instead of "hacking it" this way.

Quincey

···

On May 30, 2008, at 3:36 PM, Sheshadri Mantha wrote:

regards,
Sheshadri

Jason Sachs wrote:

I was wondering where I could find some more technical details about
concurrent reading/writing.

The FAQ discusses it briefly
(http://www.hdfgroup.org/hdf5-quest.html#grdwt\):

<excerpt>
It is possible for multiple processes to read an HDF5 file when it is
being written to, and still read correct data. (The following steps
should be followed, EVEN IF the dataset that is being written to is
different than the datasets that are read.)

Here's what needs to be done:

   * Call H5Fflush() from the writing process.

   * The writing process _must_ wait until either a copy of the file is
made for the reading process, or the reading process is done accessing
the file (so that more data isn't written to the file, giving the reader
an inconsistent view of the file's state).

   * The reading process _must_ open the file (it cannot have the file
open before the writing process flushes its information, or it runs the
risk of having its data cached in memory being incorrect with respect to
the state of the file) and read whatever information it wants.

   * The reading process must close the file.

   * The writing process may now proceed to write more data to the
file.
There must also be some mechanism for the writing process to signal the
reading process that the file is ready for reading and some way for the
reading process to signal the writing process that the file may be
written to again.
</excerpt>

Could someone elaborate in a more technical manner? e.g. SWMR
(single-writer multiple-reader) can occur if the following is true (not
sure if I have this correct; I use "process" rather than "threads" here
& am not sure if HDF5 in-memory caches have thread affinity):

1. At all times the file is in one of the following states:
(a) unmodified
(b) modified (written to, but not flushed)

2. In the unmodified state, zero or more processes may have the file
open. No process may write to the data.

3. In the modified state, exactly one process may have the file open.
This is the process that can write to it.

4. A successful transition from the unmodified state -> modified state
takes place when exactly one process has the file open and begins
writing to it.

5. A successful transition from the modified state -> unmodified state
takes place when the process that has written to the file completes a
successful call to H5Fflush().

The facilities to ensure that only one process has the file open for (4)
above are not provided by the HDF5 library and must be provided by
OS-specific facilities e.g. mutexes/semaphores/messaging/etc.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

<sheshadri_mantha.vcf>----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Dimitris_Servis · May 31, 2008, 9:44pm

Hi Quincey and Jason,

this is a very interesting subject for me too. Unfortunately even if the
"protocol" is followed strictly, though I agree it is pretty vague, it
cannot be guaranteed that it will work: the accessing processes will have to
use the *same* hdf5 dll. If they link statically or use different dll then I
guess the whole thing blows up. Then one can only rely to journalling or
having an hdf5 service that provides the data.

regards

-- dimitris

···

2008/5/31 Quincey Koziol <koziol@hdfgroup.org>:

Hi Jason,

On May 29, 2008, at 12:13 PM, Jason Sachs wrote:

I was wondering where I could find some more technical details about

concurrent reading/writing.

The FAQ discusses it briefly
(http://www.hdfgroup.org/hdf5-quest.html#grdwt\):

<excerpt>
It is possible for multiple processes to read an HDF5 file when it is
being written to, and still read correct data. (The following steps
should be followed, EVEN IF the dataset that is being written to is
different than the datasets that are read.)

Here's what needs to be done:

  * Call H5Fflush() from the writing process.

  * The writing process _must_ wait until either a copy of the file is
made for the reading process, or the reading process is done accessing
the file (so that more data isn't written to the file, giving the reader
an inconsistent view of the file's state).

  * The reading process _must_ open the file (it cannot have the file
open before the writing process flushes its information, or it runs the
risk of having its data cached in memory being incorrect with respect to
the state of the file) and read whatever information it wants.

  * The reading process must close the file.

  * The writing process may now proceed to write more data to the
file.

There must also be some mechanism for the writing process to signal the
reading process that the file is ready for reading and some way for the
reading process to signal the writing process that the file may be
written to again.
</excerpt>

Could someone elaborate in a more technical manner? e.g. SWMR
(single-writer multiple-reader) can occur if the following is true (not
sure if I have this correct; I use "process" rather than "threads" here
& am not sure if HDF5 in-memory caches have thread affinity):

1. At all times the file is in one of the following states:
(a) unmodified
(b) modified (written to, but not flushed)

2. In the unmodified state, zero or more processes may have the file
open. No process may write to the data.

3. In the modified state, exactly one process may have the file open.
This is the process that can write to it.

4. A successful transition from the unmodified state -> modified state
takes place when exactly one process has the file open and begins
writing to it.

5. A successful transition from the modified state -> unmodified state
takes place when the process that has written to the file completes a
successful call to H5Fflush().

The facilities to ensure that only one process has the file open for (4)
above are not provided by the HDF5 library and must be provided by
OS-specific facilities e.g. mutexes/semaphores/messaging/etc.

       I'm not certain whether you are talking about multi-process access
or multi-thread access. For multi-thread access, the HDF5 library is
already threadsafe, although there isn't any concurrency currently (there's
a single semaphore for all threads to acquire). Just enable the
--enable-threadsafe flag at configure time when installing the HDF5 package.

       The excerpt you have included above is about multi-process access to
the same HDF5 file. This "protocol" needs to occur in order for HDF5's
cache of information for the file to be managed properly and safely.

       Quincey

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to
hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Quincey_Koziol · May 31, 2008, 9:59pm

Hi Dimitris,

Hi Quincey and Jason,

this is a very interesting subject for me too. Unfortunately even if the "protocol" is followed strictly, though I agree it is pretty vague, it cannot be guaranteed that it will work: the accessing processes will have to use the *same* hdf5 dll. If they link statically or use different dll then I guess the whole thing blows up. Then one can only rely to journalling or having an hdf5 service that provides the data.

No, they shouldn't need to use the same DLL. The HDF5 library will release all the cached information for a file when the file is closed.

Quincey

···

On May 31, 2008, at 4:44 PM, Dimitris Servis wrote:

regards

-- dimitris

2008/5/31 Quincey Koziol <koziol@hdfgroup.org>:
Hi Jason,

On May 29, 2008, at 12:13 PM, Jason Sachs wrote:

I was wondering where I could find some more technical details about
concurrent reading/writing.

The FAQ discusses it briefly
(http://www.hdfgroup.org/hdf5-quest.html#grdwt\):

<excerpt>
It is possible for multiple processes to read an HDF5 file when it is
being written to, and still read correct data. (The following steps
should be followed, EVEN IF the dataset that is being written to is
different than the datasets that are read.)

Here's what needs to be done:

  * Call H5Fflush() from the writing process.

  * The writing process _must_ wait until either a copy of the file is
made for the reading process, or the reading process is done accessing
the file (so that more data isn't written to the file, giving the reader
an inconsistent view of the file's state).

  * The reading process _must_ open the file (it cannot have the file
open before the writing process flushes its information, or it runs the
risk of having its data cached in memory being incorrect with respect to
the state of the file) and read whatever information it wants.

  * The reading process must close the file.

  * The writing process may now proceed to write more data to the
file.

There must also be some mechanism for the writing process to signal the
reading process that the file is ready for reading and some way for the
reading process to signal the writing process that the file may be
written to again.
</excerpt>

Could someone elaborate in a more technical manner? e.g. SWMR
(single-writer multiple-reader) can occur if the following is true (not
sure if I have this correct; I use "process" rather than "threads" here
& am not sure if HDF5 in-memory caches have thread affinity):

1. At all times the file is in one of the following states:
(a) unmodified
(b) modified (written to, but not flushed)

2. In the unmodified state, zero or more processes may have the file
open. No process may write to the data.

3. In the modified state, exactly one process may have the file open.
This is the process that can write to it.

4. A successful transition from the unmodified state -> modified state
takes place when exactly one process has the file open and begins
writing to it.

5. A successful transition from the modified state -> unmodified state
takes place when the process that has written to the file completes a
successful call to H5Fflush().

The facilities to ensure that only one process has the file open for (4)
above are not provided by the HDF5 library and must be provided by
OS-specific facilities e.g. mutexes/semaphores/messaging/etc.

       I'm not certain whether you are talking about multi-process access or multi-thread access. For multi-thread access, the HDF5 library is already threadsafe, although there isn't any concurrency currently (there's a single semaphore for all threads to acquire). Just enable the --enable-threadsafe flag at configure time when installing the HDF5 package.

       The excerpt you have included above is about multi-process access to the same HDF5 file. This "protocol" needs to occur in order for HDF5's cache of information for the file to be managed properly and safely.

       Quincey

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Jason_Sachs · June 2, 2008, 6:10pm

I'm not certain whether you are talking about multi-process

access

or multi-thread access. For multi-thread access, the HDF5 library is
already threadsafe, although there isn't any concurrency currently

(there's

a single semaphore for all threads to acquire). Just enable the
--enable-threadsafe flag at configure time when installing the HDF5

package.

Multi-process access. The solution I am working on for now, is that in
addition to the HDF5 file, I dump a 2nd copy of my data into some shared
memory that another process can use; in some cases I can't afford for
the writing process to wait & reserve time for its HDF5 files to be in a
flushed but not modified state, in order for other processes to read the
data.

···

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

concurrent access