hdf5 import benchmark

magawake · June 29, 2009, 11:38pm

Does anyone have hdf5 data import benchmarks? I want to know what is a
reasonable load time for a particular dataset. For example, if you
have a 5million tab seperated file, and you import the data into hdf5
how long would/should it take? What factors effect the load time?
Assume there is no i/o bottle neck?

I want to see how fast my approach is: I am able to load 1000 rows in
7 seconds. How does that compare?

TIA

···

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Francesc_Alted2 · June 30, 2009, 11:41am

Hi Mag,

A Tuesday 30 June 2009 01:38:42 Mag Gam escrigué:

Does anyone have hdf5 data import benchmarks? I want to know what is a
reasonable load time for a particular dataset. For example, if you
have a 5million tab seperated file, and you import the data into hdf5
how long would/should it take? What factors effect the load time?
Assume there is no i/o bottle neck?

I want to see how fast my approach is: I am able to load 1000 rows in
7 seconds. How does that compare?

That's pretty slow. With the attached Python script, I'm able to import a 5
million rows CSV file into an HDF5 file in around 16s, which is pretty good.
The script is written for PyTables, but can be migrated to h5py if desired.

HTH,

import-csv.py (1.19 KB)

···

--
Francesc Alted

magawake · June 30, 2009, 12:15pm

Thanks for the response and sample code! This should be very helpful
to me! We managed to get speeds much better now with Numpy.

···

On Tue, Jun 30, 2009 at 7:41 AM, Francesc Alted<faltet@pytables.org> wrote:

Hi Mag,

A Tuesday 30 June 2009 01:38:42 Mag Gam escrigué:

Does anyone have hdf5 data import benchmarks? I want to know what is a
reasonable load time for a particular dataset. For example, if you
have a 5million tab seperated file, and you import the data into hdf5
how long would/should it take? What factors effect the load time?
Assume there is no i/o bottle neck?

I want to see how fast my approach is: I am able to load 1000 rows in
7 seconds. How does that compare?

That's pretty slow. With the attached Python script, I'm able to import a 5
million rows CSV file into an HDF5 file in around 16s, which is pretty good.
The script is written for PyTables, but can be migrated to h5py if desired.

HTH,

--
Francesc Alted

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.