ANN: HDF5 for Python (h5py) 1.2

andrew.collette · June 22, 2009, 10:03pm

Announcing HDF5 for Python (h5py) 1.2

···

=====================================

I'm pleased to announce the availability of HDF5 for Python 1.2 final!
This release represents a significant update to the h5py feature set.
Some of the new new features are:

- Support for variable-length strings!
- Use of built-in Python exceptions (KeyError, etc), alongside H5Error
- Top-level support for HDF5 CORE, SEC2, STDIO, WINDOWS and FAMILY drivers
- Support for ENUM and ARRAY types
- Support for Unicode file names
- Big speedup (~3x) when using single-index slicing on a chunked dataset

Main site: http://h5py.alfven.org
Google code: http://h5py.googlecode.com

What is h5py?
-------------

HDF5 for Python (h5py) is a general-purpose Python interface to the
Hierarchical Data Format library, version 5. HDF5 is a versatile,
mature scientific software library designed for the fast, flexible
storage of enormous amounts of data.

From a Python programmer's perspective, HDF5 provides a robust way to

store data, organized by name in a tree-like fashion. You can create
datasets (arrays on disk) hundreds of gigabytes in size, and perform
random-access I/O on desired sections. Datasets are organized in a
filesystem-like hierarchy using containers called "groups", and
accesed using the tradional POSIX /path/to/resource syntax.

In addition to providing interoperability with existing HDF5 datasets
and platforms, h5py is a convienient way to store and retrieve
arbitrary NumPy data and metadata.

Full list of new features in 1.2
--------------------------------

  - Variable-length strings are now supported! They are mapped to native
    Python strings via the NumPy "object" type. VL strings may be read,
    written and created from h5py, and are allowed in all HDF5 contexts,
    even as members of compound or array types.

  - HDF5 exceptions now inherit from common Python built-ins like TypeError
    and ValueError (in addition to current HDF5 error hierarchy), freeing
    the user from knowledge of the HDF5 error system. Existing code which
    uses H5Error will continue to work.

  - Many different low-level HDF5 drivers can now be used when creating
    a file, which allows purely in-memory ("core") files, multi-volume
    ("family") files, and files which use low-level buffered I/O.

  - Groups and attributes now support the standard Python dictionary
    interface methods, including keys(), values() and friends. The existing
    methods (listnames(), listobjects(), etc.) remain and will not be
    removed until at least h5py 1.4 or equivalent.

  - Workaround for an HDF5 bug has sped up reading/writing of chunked
    datasets. When using a slice with fewer dimensions than the dataset,
    there can be as much as a 3x improvement in write times over h5py 1.1.

  - Enumerated types are now fully supported; they can be used in NumPy
    anywhere integer types are allowed, and are stored as native HDF5
    enums. Conversion between integers and enums is supported.

- The NumPy "array" dtype is now allowed as a top-level type when
creating a dataset, not just as a member of a compound type.

- Unicode file names are now supported

- It's now possible to explicitly set the type of an attribute, and to
preserve the type of an attribute while modifying it.

- High-level objects now have .parent and .file attributes, to make the
navigation of HDF5 files more convenient.

Design revisions since 1.1
--------------------------

  - The role of the "name" attribute on File objects has changed. "name"
    now returns the HDF5 path of the File object ('/'); the file name on
    disk is available at File.filename.

  - Dictionary-interface methods for Group and AttributeManager objects have
    been renamed to follow the standard Python convention (keys(), values(),
    etc). The old method names are still available but deprecated.

  - The HDF5 shuffle filter is no longer automatically activated when
    GZIP or LZF compression is used; many datasets "in the wild" do not
    benefit from shuffling.

Standard features
-----------------

- Supports storage of NumPy data of the following types:

    * Integer/Unsigned Integer
    * Float/Double
    * Complex/Double Complex
    * Compound ("recarray")
    * Strings
    * Boolean
    * Array
    * Enumeration (integers)
    * Void

- Random access to datasets using the standard NumPy slicing syntax,
including a subset of fancy indexing and point-based selection

- Transparent compression of datasets using GZIP, LZF or SZIP,
and error-detection using Fletcher32

- "Pythonic" interface supporting dictionary and NumPy-array metaphors
for the high-level HDF5 abstrations like groups and datasets

- A comprehensive, object-oriented wrapping of the HDF5 low-level C API
via Cython, in addition to the NumPy-like high-level interface.

- Supports many new features of HDF5 1.8, including recursive iteration
over entire files and in-library copy operations on the file tree

- Thread-safe

Where to get it
---------------

* Main website, documentation: http://h5py.alfven.org

* Downloads, bug tracker: http://h5py.googlecode.com

Requires
--------

* Linux, Mac OS-X or Windows

* Python 2.5 (Windows), Python 2.5 or 2.6 (Linux/Mac OS-X)

* NumPy 1.0.3 or later

* HDF5 1.6.5 or later (including 1.8); HDF5 is included with
the Windows version.

Thanks
------

Thanks to D. Dale, E. Lawrence and other for their continued support
and comments. Also thanks to the Francesc Alted and the PyTables project,
for inspiration and generously providing their code to the community. Thanks
to everyone at the HDF Group for creating such a useful piece of software.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Dominik_Szczerba · June 23, 2009, 7:09am

May I ask for a brief comment in context of pytables?

-- Dominik

···

On Mon, 2009-06-22 at 15:03 -0700, Andrew Collette wrote:

Announcing HDF5 for Python (h5py) 1.2

I'm pleased to announce the availability of HDF5 for Python 1.2 final!
This release represents a significant update to the h5py feature set.
Some of the new new features are:

- Support for variable-length strings!
- Use of built-in Python exceptions (KeyError, etc), alongside H5Error
- Top-level support for HDF5 CORE, SEC2, STDIO, WINDOWS and FAMILY drivers
- Support for ENUM and ARRAY types
- Support for Unicode file names
- Big speedup (~3x) when using single-index slicing on a chunked dataset

Main site: http://h5py.alfven.org
Google code: http://h5py.googlecode.com

What is h5py?
-------------

HDF5 for Python (h5py) is a general-purpose Python interface to the
Hierarchical Data Format library, version 5. HDF5 is a versatile,
mature scientific software library designed for the fast, flexible
storage of enormous amounts of data.

>From a Python programmer's perspective, HDF5 provides a robust way to
store data, organized by name in a tree-like fashion. You can create
datasets (arrays on disk) hundreds of gigabytes in size, and perform
random-access I/O on desired sections. Datasets are organized in a
filesystem-like hierarchy using containers called "groups", and
accesed using the tradional POSIX /path/to/resource syntax.

In addition to providing interoperability with existing HDF5 datasets
and platforms, h5py is a convienient way to store and retrieve
arbitrary NumPy data and metadata.

Full list of new features in 1.2
--------------------------------

  - Variable-length strings are now supported! They are mapped to native
    Python strings via the NumPy "object" type. VL strings may be read,
    written and created from h5py, and are allowed in all HDF5 contexts,
    even as members of compound or array types.

  - HDF5 exceptions now inherit from common Python built-ins like TypeError
    and ValueError (in addition to current HDF5 error hierarchy), freeing
    the user from knowledge of the HDF5 error system. Existing code which
    uses H5Error will continue to work.

  - Many different low-level HDF5 drivers can now be used when creating
    a file, which allows purely in-memory ("core") files, multi-volume
    ("family") files, and files which use low-level buffered I/O.

  - Groups and attributes now support the standard Python dictionary
    interface methods, including keys(), values() and friends. The existing
    methods (listnames(), listobjects(), etc.) remain and will not be
    removed until at least h5py 1.4 or equivalent.

  - Workaround for an HDF5 bug has sped up reading/writing of chunked
    datasets. When using a slice with fewer dimensions than the dataset,
    there can be as much as a 3x improvement in write times over h5py 1.1.

  - Enumerated types are now fully supported; they can be used in NumPy
    anywhere integer types are allowed, and are stored as native HDF5
    enums. Conversion between integers and enums is supported.

  - The NumPy "array" dtype is now allowed as a top-level type when
    creating a dataset, not just as a member of a compound type.

  - Unicode file names are now supported

  - It's now possible to explicitly set the type of an attribute, and to
    preserve the type of an attribute while modifying it.

  - High-level objects now have .parent and .file attributes, to make the
    navigation of HDF5 files more convenient.

Design revisions since 1.1
--------------------------

  - The role of the "name" attribute on File objects has changed. "name"
    now returns the HDF5 path of the File object ('/'); the file name on
    disk is available at File.filename.

  - Dictionary-interface methods for Group and AttributeManager objects have
    been renamed to follow the standard Python convention (keys(), values(),
    etc). The old method names are still available but deprecated.

  - The HDF5 shuffle filter is no longer automatically activated when
    GZIP or LZF compression is used; many datasets "in the wild" do not
    benefit from shuffling.

Standard features
-----------------

  - Supports storage of NumPy data of the following types:

    * Integer/Unsigned Integer
    * Float/Double
    * Complex/Double Complex
    * Compound ("recarray")
    * Strings
    * Boolean
    * Array
    * Enumeration (integers)
    * Void

  - Random access to datasets using the standard NumPy slicing syntax,
    including a subset of fancy indexing and point-based selection

  - Transparent compression of datasets using GZIP, LZF or SZIP,
    and error-detection using Fletcher32

  - "Pythonic" interface supporting dictionary and NumPy-array metaphors
    for the high-level HDF5 abstrations like groups and datasets

  - A comprehensive, object-oriented wrapping of the HDF5 low-level C API
    via Cython, in addition to the NumPy-like high-level interface.

  - Supports many new features of HDF5 1.8, including recursive iteration
    over entire files and in-library copy operations on the file tree

  - Thread-safe

Where to get it
---------------

* Main website, documentation: http://h5py.alfven.org

* Downloads, bug tracker: http://h5py.googlecode.com

Requires
--------

* Linux, Mac OS-X or Windows

* Python 2.5 (Windows), Python 2.5 or 2.6 (Linux/Mac OS-X)

* NumPy 1.0.3 or later

* HDF5 1.6.5 or later (including 1.8); HDF5 is included with
  the Windows version.

Thanks
------

Thanks to D. Dale, E. Lawrence and other for their continued support
and comments. Also thanks to the Francesc Alted and the PyTables project,
for inspiration and generously providing their code to the community. Thanks
to everyone at the HDF Group for creating such a useful piece of software.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

andrew.collette · June 23, 2009, 8:26am

Hi Dominik,

May I ask for a brief comment in context of pytables?

H5py and PyTables have different goals; h5py tries to map the HDF5
feature set as directly to Python as possible, while PyTables presents
a more database-style interface. This leads to different feature
sets; for example, h5py has Python bindings for almost the whole HDF5
C API (in addition to the high-level array interface), while PyTables
has more facilities for indexing, fast querying (see also "numexpr")
and other database-like operations. They also use different
Python-side type systems.

There are a couple of brief comparisons here:

http://www.pytables.org/moin/FAQ#HowdoesPyTablescomparewiththeh5pyproject.3F
Google Code Archive - Long-term storage for Google Code Project Hosting.?

Andrew

···

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

ananyagupta1214 · March 31, 2020, 6:41am

Yes, you are right this link is informative to know know more here I share whatever I knew
An HDF5 file is a container for two kinds of objects: datasets, which are array-like collections of data, and groups, which are folder-like containers that hold datasets and other groups. The most fundamental thing to remember when using h5py is:

Groups work like dictionaries, and datasets work like NumPy arrays
Suppose someone has sent you a HDF5 file, mytestfile.hdf5. (To create this file, read Appendix: Creating a file.) The very first thing you’ll need to do is to open the file for reading:

import h5py
f = h5py.File(‘mytestfile.hdf5’, ‘r’)
The File object is your starting point. What is stored in this file? Remember h5py.File acts like a Python dictionary, thus we can check the keys,

list(f.keys())
[‘mydataset’]
Based on our observation, there is one data set, mydataset in the file. Let us examine the data set as a Dataset object

dset = f[‘mydataset’]
The object we obtained isn’t an array, but an HDF5 dataset. Like NumPy arrays, datasets have both a shape and a data type:

dset.shape
(100,)
dset.dtype
dtype(‘int32’)
They also support array-style slicing. This is how you read and write data from a dataset in the file:

dset[…] = np.arange(100)
dset[0]
0
dset[10]
10
dset[0:100:10]
array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])
For more, see File Objects and Datasets.

Appendix: Creating a file…
I am also learning python online training from CETPA INFOTECH . If you have any updates regards please share with me.

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

ANN: HDF5 for Python (h5py) 1.2

Announcing HDF5 for Python (h5py) 1.2