Meaning of H5_HAVE_THREADSAFE

Due to the C++ bindings not being thread-safe, I was hoping to turn off thread-safety in the C core and handle synchronization in my code. This turned out to be impossible because the HDF5 core uses
global, non-thread-specific data structures when H5_HAVE_THREADSAFE is undefined. There does not
appear to be a way to turn off internal synchronization while keeping thread or instance specific
data structures. That would be useful. SQLite has good semantics for thread safety:

http://www.sqlite.org/compile.html#threadsafe

This option controls whether or not code is included in SQLite to enable it to operate safely in
a multithreaded environment. The default is SQLITE_THREADSAFE=1 which is safe for use in a
multithreaded environment. When compiled with SQLITE_THREADSAFE=0 all mutexing code is omitted
and it is unsafe to use SQLite in a multithreaded program. When compiled with
SQLITE_THREADSAFE=2, SQLite can be used in a multithreaded program so long as no two threads
attempt to use the same database connection at the same time.

At the very least it would be useful if the documentation specifies that H5_HAVE_THREADSAFE controls both the use of global data structures and synchronization. I had to dig into the code to understand it.

Thanks,
-Matt

Hi Matt,

···

On Jun 27, 2011, at 11:49 AM, Matthew Chambers wrote:

Due to the C++ bindings not being thread-safe, I was hoping to turn off thread-safety in the C core and handle synchronization in my code. This turned out to be impossible because the HDF5 core uses global, non-thread-specific data structures when H5_HAVE_THREADSAFE is undefined. There does not appear to be a way to turn off internal synchronization while keeping thread or instance specific data structures. That would be useful. SQLite has good semantics for thread safety:

http://www.sqlite.org/compile.html#threadsafe

This option controls whether or not code is included in SQLite to enable it to operate safely in a multithreaded environment. The default is SQLITE_THREADSAFE=1 which is safe for use in a
multithreaded environment. When compiled with SQLITE_THREADSAFE=0 all mutexing code is omitted and it is unsafe to use SQLite in a multithreaded program. When compiled with
SQLITE_THREADSAFE=2, SQLite can be used in a multithreaded program so long as no two threads
attempt to use the same database connection at the same time.

At the very least it would be useful if the documentation specifies that H5_HAVE_THREADSAFE controls both the use of global data structures and synchronization. I had to dig into the code to understand it.

  Hmm, what are you thinking of when you say the HDF5 library uses global, non-thread-specific data structures? (We try to eliminate those, and if you can point us to something we've missed, we can probably fix it)

  Quincey

This seems a reasonable goal, but is it incompatible to serialize access
to the C++ objects with H5_HAVE_THREADSAFE turned on? The point you make
about global variables is valid as well, but there are two problems, and
both need solving:

1. HDF5 C Library not stepping on its own feet.
2. not having your program garble things up via the non-thread-safe
interface.

If only one or the other is managing things, it is more efficient, but the
performance hit can't be that huge.

The other alternative is to write your own C++ classes to manage access to
the C API. The HDF5 C++ bindings are a pretty thin layer over the C
bindings. Depending on the complexity of your application this might not
be a significant additional coding effort.

I used the C++ bindings in the ITK toolkit, and I still had to understand
the underlying library interface to do things; if I had to do it all over
again I'd go directly with the C API, because it gives one finer-grained
control, and also avoids dealing with the slight quirks of the C++ API.
Using the C++ API did seem to add some second order head-scratching to the
project.

···

On 6/27/11 11:49 AM, "Matthew Chambers" <matt.chambers42@gmail.com> wrote:

Due to the C++ bindings not being thread-safe, I was hoping to turn off
thread-safety in the C core
and handle synchronization in my code. This turned out to be impossible
because the HDF5 core uses
global, non-thread-specific data structures when H5_HAVE_THREADSAFE is
undefined. There does not
appear to be a way to turn off internal synchronization while keeping
thread or instance specific
data structures. That would be useful. SQLite has good semantics for
thread safety:

http://www.sqlite.org/compile.html#threadsafe

This option controls whether or not code is included in SQLite to
enable it to operate safely in
a multithreaded environment. The default is SQLITE_THREADSAFE=1 which
is safe for use in a
multithreaded environment. When compiled with SQLITE_THREADSAFE=0 all
mutexing code is omitted
and it is unsafe to use SQLite in a multithreaded program. When
compiled with
SQLITE_THREADSAFE=2, SQLite can be used in a multithreaded program so
long as no two threads
attempt to use the same database connection at the same time.

At the very least it would be useful if the documentation specifies that
H5_HAVE_THREADSAFE controls
both the use of global data structures and synchronization. I had to dig
into the code to understand it.

Thanks,
-Matt

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

________________________________
Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be legally privileged. If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited. Please reply to the sender that you have received the message in error, then delete it. Thank you.
________________________________

Hi Quincey,

Hi Matt,

Due to the C++ bindings not being thread-safe, I was hoping to turn off thread-safety in the C core and handle synchronization in my code. This turned out to be impossible because the HDF5 core uses global, non-thread-specific data structures when H5_HAVE_THREADSAFE is undefined. There does not appear to be a way to turn off internal synchronization while keeping thread or instance specific data structures. That would be useful. SQLite has good semantics for thread safety:

http://www.sqlite.org/compile.html#threadsafe

This option controls whether or not code is included in SQLite to enable it to operate safely in a multithreaded environment. The default is SQLITE_THREADSAFE=1 which is safe for use in a
multithreaded environment. When compiled with SQLITE_THREADSAFE=0 all mutexing code is omitted and it is unsafe to use SQLite in a multithreaded program. When compiled with
SQLITE_THREADSAFE=2, SQLite can be used in a multithreaded program so long as no two threads
attempt to use the same database connection at the same time.

At the very least it would be useful if the documentation specifies that H5_HAVE_THREADSAFE controls both the use of global data structures and synchronization. I had to dig into the code to understand it.

  Hmm, what are you thinking of when you say the HDF5 library uses global, non-thread-specific data structures? (We try to eliminate those, and if you can point us to something we've missed, we can probably fix it)

I can only speak to the first global I found to cause problems. There may be others. That was the codestack:

In H5CS.c:

#ifdef H5_HAVE_THREADSAFE
/*
  * The per-thread function stack. pthread_once() initializes a special
  * key that will be used by all threads to create a stack specific to
  * each thread individually. The association of stacks to threads will
  * be handled by the pthread library.

···

On 6/28/2011 6:32 AM, Quincey Koziol wrote:

On Jun 27, 2011, at 11:49 AM, Matthew Chambers wrote:

  *
  * In order for this macro to work, H5CS_get_my_stack() must be preceeded
  * by "H5CS_t *fstack =".
  */
static H5CS_t *H5CS_get_stack(void);
#define H5CS_get_my_stack() H5CS_get_stack()
#else /* H5_HAVE_THREADSAFE */
/*
  * The function stack. Eventually we'll have some sort of global table so each
  * thread has it's own stack. The stacks will be created on demand when the
  * thread first calls H5CS_push(). */
H5CS_t H5CS_stack_g[1];
#define H5CS_get_my_stack() (H5CS_stack_g+0)
#endif /* H5_HAVE_THREADSAFE */

I see two solutions:
1. Fix codestack so it works for serialized, multithreaded programs even when HAVE_THREADSAFE is undefined. This could be tricky since it'll need to deal with platforms that don't have threading at all. I.e. it needs to test on HAVE_PTHREAD_H/HAVE_WIN_THREADS instead of HAVE_THREADSAFE. With no platform threading at all the global variable is a reasonable solution.

2. Document or show a warning/error that codestack isn't safe to use in a multithreaded program even if H5 calls are serialized.

I didn't think about it before but I could have just disabled codestack instead of giving up my goal of serialized calls to the H5 API.

Thanks,
-Matt

Hi Matt,

Hi Quincey,

Hi Matt,

Due to the C++ bindings not being thread-safe, I was hoping to turn off thread-safety in the C core and handle synchronization in my code. This turned out to be impossible because the HDF5 core uses global, non-thread-specific data structures when H5_HAVE_THREADSAFE is undefined. There does not appear to be a way to turn off internal synchronization while keeping thread or instance specific data structures. That would be useful. SQLite has good semantics for thread safety:

http://www.sqlite.org/compile.html#threadsafe

This option controls whether or not code is included in SQLite to enable it to operate safely in a multithreaded environment. The default is SQLITE_THREADSAFE=1 which is safe for use in a
multithreaded environment. When compiled with SQLITE_THREADSAFE=0 all mutexing code is omitted and it is unsafe to use SQLite in a multithreaded program. When compiled with
SQLITE_THREADSAFE=2, SQLite can be used in a multithreaded program so long as no two threads
attempt to use the same database connection at the same time.

At the very least it would be useful if the documentation specifies that H5_HAVE_THREADSAFE controls both the use of global data structures and synchronization. I had to dig into the code to understand it.

  Hmm, what are you thinking of when you say the HDF5 library uses global, non-thread-specific data structures? (We try to eliminate those, and if you can point us to something we've missed, we can probably fix it)

I can only speak to the first global I found to cause problems. There may be others. That was the codestack:

In H5CS.c:

#ifdef H5_HAVE_THREADSAFE
/*
* The per-thread function stack. pthread_once() initializes a special
* key that will be used by all threads to create a stack specific to
* each thread individually. The association of stacks to threads will
* be handled by the pthread library.
*
* In order for this macro to work, H5CS_get_my_stack() must be preceeded
* by "H5CS_t *fstack =".
*/
static H5CS_t *H5CS_get_stack(void);
#define H5CS_get_my_stack() H5CS_get_stack()
#else /* H5_HAVE_THREADSAFE */
/*
* The function stack. Eventually we'll have some sort of global table so each
* thread has it's own stack. The stacks will be created on demand when the
* thread first calls H5CS_push(). */
H5CS_t H5CS_stack_g[1];
#define H5CS_get_my_stack() (H5CS_stack_g+0)
#endif /* H5_HAVE_THREADSAFE */

I see two solutions:
1. Fix codestack so it works for serialized, multithreaded programs even when HAVE_THREADSAFE is undefined. This could be tricky since it'll need to deal with platforms that don't have threading at all. I.e. it needs to test on HAVE_PTHREAD_H/HAVE_WIN_THREADS instead of HAVE_THREADSAFE. With no platform threading at all the global variable is a reasonable solution.

  Yes, I think this is a good idea. I'll file an issue for it.

2. Document or show a warning/error that codestack isn't safe to use in a multithreaded program even if H5 calls are serialized.

  I'll file an issue for this also.

I didn't think about it before but I could have just disabled codestack instead of giving up my goal of serialized calls to the H5 API.

  That would have worked for this part, but the error stack code uses a similar construct and doesn't have a way to be disabled.

  Quincey

···

On Jul 8, 2011, at 11:35 AM, Matthew Chambers wrote:

On 6/28/2011 6:32 AM, Quincey Koziol wrote:

On Jun 27, 2011, at 11:49 AM, Matthew Chambers wrote: