We are happy to announce PyTables 3.9.0.
What’s new
After nearly one year since the previous release, PyTables 3.9.0
includes an assortment of improvements and fixes contributed by many
people. A big thank you to everyone involved in keeping PyTables alive
& kicking!
This release enhances chunked and extensible arrays (CArray/EArray)
compressed with Blosc2 with the optimized read operation added in
PyTables 3.8.0 (based on HDF5 direct chunking). Slicing such
multidimensional arrays (even along inner dimensions) has been greatly
sped up thanks to Blosc2 NDim 2-level partitioning of chunks into
smaller multidimensional blocks, which avoids the decompression of whole
chunks. This is enabled by providing the Blosc2 cframe with a b2nd
metalayer when writing each array chunk. Please check out
“Multidimensional slicing and chunk/block sizes” in the optimization
tips section of the User’s Guide for more information. This development
was funded by a NumFOCUS grant, with the support of the Blosc project.
More info about the optimized read operation on hyper-slices in our
recent presentation at the 2023 European HDF User Group (HUG) plugins
and data compression summit:
PyTables 3.9.0 also adds support for column-level attributes (e.g. for
units) via their attrs
field. This has been in the queue for more
than two years, but it’s finally here!
PyTables now supports the forthcoming Python 3.12, with binary wheels
available and extensive automatic testing. Please note that wheels and
testing for Python 3.8 have been dropped due to issues with
dependencies. Users or distribution packagers may still use Python 3.8,
but they are encouraged to perform tests on their own.
Regarding Blosc2 compression support, PyTables may now use either the
python-blosc2 package (which is handy for packaged Python wheels), or
just the c-blosc2 library (which may be preferable for wider software
distributions). Newer versions of Blosc2 and other dependencies are
required that fix various issues.
In case you want to know more in detail what has changed in this
version, please refer to: PyTables Release Notes — PyTables 3.9.0 documentation
You can install it via pip or download a source package with generated
PDF and HTML docs from:
For an online version of the manual, visit:
http://www.pytables.org/usersguide/index.html
What it is?
PyTables is a library for managing hierarchical datasets and
designed to efficiently cope with extremely large amounts of data with
support for full 64-bit file addressing. PyTables runs on top of
the HDF5 library and NumPy package for achieving maximum throughput and
convenient use. PyTables includes OPSI, a new indexing technology,
allowing to perform data lookups in tables exceeding 10 gigarows
(10**10 rows) in less than a tenth of a second.
Resources
About PyTables: http://www.pytables.org
About the HDF5 library: http://hdfgroup.org/HDF5/
About NumPy: http://numpy.scipy.org/
Acknowledgments
Thanks to many users who provided feature improvements, patches, bug
reports, support and suggestions. See the THANKS
file in the
distribution package for a (incomplete) list of contributors. Most
specially, a lot of kudos go to the HDF5 and NumPy makers.
Without them, PyTables simply would not exist.
Share your experience
Let us know of any bugs, suggestions, gripes, kudos, etc. you may have.
Enjoy data!
– The PyTables Developers