Concurrent read/write question

Gabriel_Corneanu · June 7, 2010, 11:15am

Hi all,

I have searched for an answer, but none of the messages I found was
clear enough... so here is it again.
I have an application (long term AE monitoring) like this:
- 1 writer which constantly writes (appends! data only) to a few
tables; data rate might be very low to quite high.
- 1 (seldom more, but it shouldn't matter) reader, which needs to
process the data at the same time; people use "real time" words, but
all I need is a reasonable small delay in reading "new" data
- the data structure is created/fixed on start
- copying the existing data to another file is not an option

Is is possible that the reader be able to read the "new" data? A
"flush" is acceptable from the writer, but no other time consuming
blocking...
I tried to play with caching and other settings, but nothing was very clear.
Simply closing/reopening the reader seems to work (I see the data
present at time of open), but I need to be sure it is safe.

In a very optimistic approach, appending data would NOT create
inconsistencies in the file... The metadata should be re-written to
file in a "clever" order...

Another relatively simple solution is to control/synchronize metadata
access (write/read), to ensure that data is consistent; is it
possible?

I'm new to hdf5, so any help/hints are appreciated.

BTW, I found this statement about netcdf, which is exactly what I
need... Being based on hdf5, I hope there is a simple solution.

http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Limitations.html
"Finally, for classic and 64-bit offset files, concurrent access to a
netCDF dataset is limited. One writer and multiple readers may access
data in a single dataset simultaneously, but there is no support for
multiple concurrent writers."

Regards,
Gabriel

Quincey_Koziol · June 9, 2010, 3:31am

Hi Gabriel,

Hi all,

I have searched for an answer, but none of the messages I found was
clear enough... so here is it again.
I have an application (long term AE monitoring) like this:
- 1 writer which constantly writes (appends! data only) to a few
tables; data rate might be very low to quite high.
- 1 (seldom more, but it shouldn't matter) reader, which needs to
process the data at the same time; people use "real time" words, but
all I need is a reasonable small delay in reading "new" data
- the data structure is created/fixed on start
- copying the existing data to another file is not an option

Is is possible that the reader be able to read the "new" data? A
"flush" is acceptable from the writer, but no other time consuming
blocking...
I tried to play with caching and other settings, but nothing was very clear.
Simply closing/reopening the reader seems to work (I see the data
present at time of open), but I need to be sure it is safe.

In a very optimistic approach, appending data would NOT create
inconsistencies in the file... The metadata should be re-written to
file in a "clever" order...

Another relatively simple solution is to control/synchronize metadata
access (write/read), to ensure that data is consistent; is it
possible?

True concurrent access to a file when it is being written to is not currently supported. We are working to add a single-writer/multiple-reader access mode for the next release (1.10.0), but it's not available yet. The current way to have pseudo-concurrent access to a file is found here: http://www.hdfgroup.org/hdf5-quest.html#grdwt

Quincey

···

On Jun 7, 2010, at 6:15 AM, Gabriel Corneanu wrote:

I'm new to hdf5, so any help/hints are appreciated.

BTW, I found this statement about netcdf, which is exactly what I
need... Being based on hdf5, I hope there is a simple solution.

http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/Limitations.html
"Finally, for classic and 64-bit offset files, concurrent access to a
netCDF dataset is limited. One writer and multiple readers may access
data in a single dataset simultaneously, but there is no support for
multiple concurrent writers."

Regards,
Gabriel

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Gabriel_Corneanu · June 9, 2010, 10:41am

Thanks for the reply... I was afraid of that answer.
As I said, I do NOT want concurrent write; that's definitely much more complex.

I remember I read something about cache settings, but no clear statements.
How is NetCDF doing it??
Making a COPY is definitely not an option... Isn't it possible to
solve it by synchronizing the metadata access?
Is there any time estimation for the "single-writer/multiple-reader" feature?

Or, is using shared cache (H5Freopen) safe? I would build a "data
reader" thread in the writer process; is this a viable solution??

Thanks,
Gabriel

Quincey_Koziol · June 10, 2010, 2:36am

Hi Gabriel,

Thanks for the reply... I was afraid of that answer.
As I said, I do NOT want concurrent write; that's definitely much more complex.

Yes, that's definitely a lot more complex than SWMR access.

I remember I read something about cache settings, but no clear statements.
How is NetCDF doing it??

I think the netCDF information you sent was about cases where the netCDF-3 file format was being used, not HDF5.

Making a COPY is definitely not an option... Isn't it possible to
solve it by synchronizing the metadata access?

Yes, here's the link in our FAQ for how to do this:

http://www.hdfgroup.org/hdf5-quest.html#grdwt

Is there any time estimation for the "single-writer/multiple-reader" feature?

It will probably be available for beta testing toward the end of the year.

Or, is using shared cache (H5Freopen) safe? I would build a "data
reader" thread in the writer process; is this a viable solution??

As long as you are accessing the file with the same process (i.e. multiple threads within a process), you are fine and you don't need to take any special actions. It's only when writing with one process and reading with another that there are issues.

Quincey

···

On Jun 9, 2010, at 5:41 AM, Gabriel Corneanu wrote: