HDF5 multiple servers writing at once

Idf · November 27, 2008, 6:58am

Hello,

Let me describe a scenario.

I am using Python and Tables which in turn automatically knows how to write HDF5
files.

Imagine that I have a NAS or a SAN, and that there are three servers attached to
this SAN/NAS. Say I have an HDF5 file residing on the SAN/NAS. The structure of
the HDF5 file is such that it contains information for say three stock symbols,
MSFT, ORCL and AAPL, and each symbol are their own separate group within the HDF5
file (sorry if that is not the right terminology I am totally new to HDF5.)

Say that each server is responsible for persisting tick data for each symbol to
the HDF5 file on the NAS/SAN. Can I have the multiple servers each writing AAPL,
ORCL and AAPL __simultaneously__ to the HDF5 file without creating havoc?

Thank you.

Ivan

···

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Francesc_Alted1 · November 27, 2008, 9:23am

Hello Ivan,

A Thursday 27 November 2008, idf@synctrading.com escrigué:

Hello,

Let me describe a scenario.

I am using Python and Tables which in turn automatically knows how to
write HDF5 files.

Imagine that I have a NAS or a SAN, and that there are three servers
attached to this SAN/NAS. Say I have an HDF5 file residing on the
SAN/NAS. The structure of the HDF5 file is such that it contains
information for say three stock symbols, MSFT, ORCL and AAPL, and
each symbol are their own separate group within the HDF5 file (sorry
if that is not the right terminology I am totally new to HDF5.)

Say that each server is responsible for persisting tick data for each
symbol to the HDF5 file on the NAS/SAN. Can I have the multiple
servers each writing AAPL, ORCL and AAPL __simultaneously__ to the
HDF5 file without creating havoc?

No, neither PyTables nor HDF5 are thought for doing this safely. What
you can do is synchronize your programs for doing writes to the file by
turns (providing some kind of file locking for the three processes).
If you follow that path, you must ensure that the file is effectively
opened and *closed* before and *after* each writing operation.

If what you need is a high throughput for writes, another possibility is
to write to three different files and then merge the files on a regular
basis by using another process. You should be careful in this case
because while the fourth process is reading a file, another process can
be adding more data in that moment, so you should keep track of the
data actually consolidated so as to prevent data loses.

Hope that helps,

···

--
Francesc Alted

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

robl · December 1, 2008, 6:47pm

Normally I'd say use MPI to coordinate multiple writers, but the
amount of data is so small that it would be overkill to move to
parallel i/o.

Why not have a single designated writer thread or process which
consumes a work queue? Depending on how clever you want your work
queue to be and what your consistency requirements are for the
underlying file, you could either batch updates to the HDF5 file or
ignore updates that are subsequently rendered out-of-date by newer
information.

It's a little more work than passing a writer token, but gives you
some optimization opportunities.

==rob

···

On Thu, Nov 27, 2008 at 10:23:42AM +0100, Francesc Alted wrote:

No, neither PyTables nor HDF5 are thought for doing this safely. What
you can do is synchronize your programs for doing writes to the file by
turns (providing some kind of file locking for the three processes).
If you follow that path, you must ensure that the file is effectively
opened and *closed* before and *after* each writing operation.

--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Idf · December 1, 2008, 8:16pm

Rob,

Thanks for the response. Actually, I gave three stock symbols as an example. The
actual implementation will be dealing with all optionable symbols and many
strikes and months [the entire option surface]. The data requirements are on the
order of 10GB a day of realtime tick data. Perhaps MPI is the right solution
after all?

Also, would certain hardware like a SAN with a parallel files system like

http://en.wikipedia.org/wiki/GPFS

Help?

Ivan

On Mon Dec 1 12:47 , Robert Latham <robl@mcs.anl.gov> sent:

···

On Thu, Nov 27, 2008 at 10:23:42AM +0100, Francesc Alted wrote:

No, neither PyTables nor HDF5 are thought for doing this safely. What
you can do is synchronize your programs for doing writes to the file by
turns (providing some kind of file locking for the three processes).
If you follow that path, you must ensure that the file is effectively
opened and *closed* before and *after* each writing operation.

Normally I'd say use MPI to coordinate multiple writers, but the
amount of data is so small that it would be overkill to move to
parallel i/o.

Why not have a single designated writer thread or process which
consumes a work queue? Depending on how clever you want your work
queue to be and what your consistency requirements are for the
underlying file, you could either batch updates to the HDF5 file or
ignore updates that are subsequently rendered out-of-date by newer
information.

It's a little more work than passing a writer token, but gives you
some optimization opportunities.

==rob

--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

robl · December 1, 2008, 9:24pm

I don't know too much about application I/O access patterns in the
financial space (you guys don't exactly publish a lot about what
you're up to! :> ). Still, I think MPI might be overkill here.

10GB for an 8 hour trading day means we're talking ~350k/sec of I/O,
though that ignores bursty periods and assumes you either write or
read this data a single time. Still, gigE gives you 100x headroom
here, so I still think you can marshal all the I/O onto one process.

Now, if you already have MPI processes doing some sort of calculation,
then parallel I/O almost comes for free, and you should do that. Just
want you to use the I/O approach that best matches your application.

GPFS can be a good solution for some workloads. If you go the SAN +
GPFS route, you will be well advised to write to a single HDF5 file.
Simultaneous creation of many files in a single directory is often an
expensive operation for any parallel file system.

I'd start with your processes sending data to a single writer
operating on a single HDF5 file. If I/O is less than 10-15% of your
runtime, I wouldn't worry about more sophisticated approaches.

==rob

···

On Mon, Dec 01, 2008 at 02:16:13PM -0600, idf@synctrading.com wrote:

Rob,

Thanks for the response. Actually, I gave three stock symbols as an
example. The actual implementation will be dealing with all
optionable symbols and many strikes and months [the entire option
surface]. The data requirements are on the order of 10GB a day of
realtime tick data. Perhaps MPI is the right solution after all?

Also, would certain hardware like a SAN with a parallel files system
like

http://en.wikipedia.org/wiki/GPFS

Help?

--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Idf · December 2, 2008, 3:44pm

Thanks again for the advice.

Ivan

On Mon Dec 1 15:24 , Robert Latham <robl@mcs.anl.gov> sent:

···

On Mon, Dec 01, 2008 at 02:16:13PM -0600, idf@synctrading.com wrote:

Rob,

Thanks for the response. Actually, I gave three stock symbols as an
example. The actual implementation will be dealing with all
optionable symbols and many strikes and months [the entire option
surface]. The data requirements are on the order of 10GB a day of
realtime tick data. Perhaps MPI is the right solution after all?

Also, would certain hardware like a SAN with a parallel files system
like

http://en.wikipedia.org/wiki/GPFS

Help?

I don't know too much about application I/O access patterns in the
financial space (you guys don't exactly publish a lot about what
you're up to! :> ). Still, I think MPI might be overkill here.

10GB for an 8 hour trading day means we're talking ~350k/sec of I/O,
though that ignores bursty periods and assumes you either write or
read this data a single time. Still, gigE gives you 100x headroom
here, so I still think you can marshal all the I/O onto one process.

Now, if you already have MPI processes doing some sort of calculation,
then parallel I/O almost comes for free, and you should do that. Just
want you to use the I/O approach that best matches your application.

GPFS can be a good solution for some workloads. If you go the SAN +
GPFS route, you will be well advised to write to a single HDF5 file.
Simultaneous creation of many files in a single directory is often an
expensive operation for any parallel file system.

I'd start with your processes sending data to a single writer
operating on a single HDF5 file. If I/O is less than 10-15% of your
runtime, I wouldn't worry about more sophisticated approaches.

==rob

--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.