Webinar Announcement: Tablite: 9BN rows/sec + HDF5 Support for all

Webinar Announcement: Tablite: 9BN rows/sec + HDF5 Support for all
October 26, 2022, 9:00 a.m. Central time US/Canada
Register now

Tablite is an open source project which can be used for incremental data processing. Tablite uses HDF5 as a backend with strong abstraction, so that copy/append/repetition of data is handled in pages (this allows us to slice 9,000,000,000 rows in less than a second on localhost. Additional benefits of Tablite include the implementation of multiprocessing, respecting and addressing the limits of free memory, and using datatype mapping to native HDF5 types which in combination makes Tablite an elegant solution.

You may have first heard about Tablite from this forum post: 9BN rows/sec + HDF5 support for all python datatypes

Come learn what Tablite is, and how and when to use it from its developer Dr. Bjorn Madsen, Head of System Design Tools, Dematic.

Please register to join us!

2 Likes

Hi Everyone,

Just a reminder that we will be hosting Dr. Bjorn Madsen, Head of System Design Tools, Dematic to talk about Tablite tomorrow, Wednesday 10/26 at 9:00 a.m. Central time US/Canada. From the Tablite website:

Tablite uses HDF5 as a backend with strong abstraction, so that copy, append & repetition of data is handled in pages. This is imperative for incremental data processing.

Tablite tests for memory footprint. One test compares the memory footprint of 10,000,000 integers where tablite will use < 1 Mb RAM in contrast to python which will require around 133.7 Mb of RAM (1M lists with 10 integers). Tablite also tests to assure that working with 1Tb of data is tolerable.

Tablite achieves this by using HDF5 as storage which is faster than mmap’ed files for the average case [1, 2 ] and stores all data in /tmp/tablite.hdf5 so if your OS (windows/linux/mac) sits on a SSD it will benefit from high IOPS and permit slices of 9,000,000,000 rows in less than a second

Registration is required: https://us06web.zoom.us/meeting/register/tZcqfu2vqjMuHtfnPpgYKS-H0SNg_YkEYWxF

Please feel free to join us and share this with anyone you think might be interested.

On October 26, 2022, The HDF Group hosted Dr. Bjorn Badsen to discuss his project, Tablite.

Tablite is an open source project which can be used for incremental data processing. Tablite uses HDF5 as a backend with strong abstraction, so that copy/append/repetition of data is handled in pages (this allows us to slice 9,000,000,000 rows in less than a second on localhost. Additional benefits of Tablite include the implementation of multiprocessing, respecting and addressing the limits of free memory, and using datatype mapping to native HDF5 types which in combination makes Tablite an elegant solution.