It seems like the current status is that I can't do concurrent reads within the same file. Is that the case?
But, concurrent reads from separate files by the same process are OK, right?
My use case:
* GIS data that has been clipped up to geospatial boundaries (say a 1x1 arcminute grid), and written to HDF5 such that each grid element has its own dataset.
* We'll likely split N grid elements into their own files.
* I'll have many (8+) readers but no writers. Each thread would be responsible for checking for and pulling data from a particular dataset. i.e. there wouldn't be multiple readers pulling separate chunks of data from the same dataset.
We've encountered errors attempting reads from the same file from multiple threads in the same process, even to different datasets, running under Windows 64. We've never tried reading from multiple files using the same process.
It seems like the current status is that I can't do concurrent reads within the same file. Is that the case?
But, concurrent reads from separate files by the same process are OK, right?
My use case:
* GIS data that has been clipped up to geospatial boundaries (say a 1x1 arcminute grid), and written to HDF5 such that each grid element has its own dataset.
* We'll likely split N grid elements into their own files.
* I'll have many (8+) readers but no writers. Each thread would be responsible for checking for and pulling data from a particular dataset. i.e. there wouldn't be multiple readers pulling separate chunks of data from the same dataset.
Concurrent reads from separate files are okay from the same process with a threadsafe build of the HDF5 library (which effectively serializes the calls as noted in one of your references.)
Concurrent reads from separate files are *not* okay from the same process with a non-threadsafe build of the HDF5 library.
An explanation cut from an earlier email outside the forum. Will request an FAQ entry since this is somewhat unexpected w/o the explanation:
···
There are places where the HDF5 library modifies global data structures that are independent of a particular HDF5 file, and we rely on the semaphore around the library API calls to protect the data structure from being corrupted by simultaneous manipulation from different threads. An example of this would be the HDF5 library's freespace manager; another is the open file list.
On Sep 23, 2010, at 5:35 PM, Sebastian Good wrote:
We've encountered errors attempting reads from the same file from multiple threads in the same process, even to different datasets, running under Windows 64. We've never tried reading from multiple files using the same process.
It seems like the current status is that I can't do concurrent reads within the same file. Is that the case?
But, concurrent reads from separate files by the same process are OK, right?
My use case:
* GIS data that has been clipped up to geospatial boundaries (say a 1x1 arcminute grid), and written to HDF5 such that each grid element has its own dataset.
* We'll likely split N grid elements into their own files.
* I'll have many (8+) readers but no writers. Each thread would be responsible for checking for and pulling data from a particular dataset. i.e. there wouldn't be multiple readers pulling separate chunks of data from the same dataset.