H5glance - terminal HDF5 inspector

I’d like to show h5glance, a terminal-based HDF5 inspector we created at European XFEL. It presents groups and datasets in a format that, we hope, makes it easy to quickly see interesting information. More info.

Screenshot%20from%202018-09-30%2020-22-39

How to use it:

In a shell:

python3 -m pip install h5glance
h5glance --help

h5glance requires Python 3.5 or newer, and h5py will be automatically installed as a dependency.

Why we made it:

We work with a lot of deeply nested HDF5 files over SSH. h5ls doesn’t give enough information by default, and I often forget the -r flag to look into nested groups. h5ls -rv gives more detail than I want, and it’s not always easy to find and understand what I need.

The hdfview GUI can show me what I want, but clicking into the nested structure is inefficient, and if I didn’t use ssh -X, I have to start a new SSH session to use a GUI.

I’ve seen a few other attempts at terminal-based tools for working with HDF5 files, but several projects seemed to have stalled, and nothing was quite what I wanted. I’m happy to hear about things I may have missed!

h5glance is designed to work with lots of nested groups (our files can easily have the datasets 6-7 layers deep). The layout makes the nesting clear without taking up much width. You can also tab-complete paths within the file (run h5glance file.h5 - and it will prompt for a path), which helps to avoid typos.

Hi Thomas!

04.10.2018 11:15, Thomas Kluyver пишет:

I’d like to show h5glance, a terminal-based HDF5 inspector we created
at European XFEL. It presents groups and datasets in a format that, we
hope, makes it easy to quickly see interesting information. More info
https://github.com/European-XFEL/h5glance.

Thank you for your excellent contribution! Your program fills the
long-present gap between h5dump, h5ls and HDFView for quick
check/overview/debug of HDF5 files. Special thanks for selecting Python
as implementation language :wink:

I especially liked that h5glance outputs items in creation order if
tracked in the file. However, for some reason it’s not always the
case. For the attached file, h5dump -q creation_order says
HDF5 “test.h5” {
GROUP “/” {
GROUP “parent” {
GROUP “substances” {
}
GROUP “mixtures” {
}
GROUP “instructions” {
}
GROUP “amounts” {
}
GROUP “samples” {
}
GROUP “signals” {
}
GROUP “peaks” {
}
GROUP “assignments” {
}
}
}
}

but h5glance outputs
test.h5/parent
├amounts
├substances
├samples
├mixtures
├signals
├instructions
├peaks
└assignments

Could you please take a look?

I will also submit a github issue/proposal for supporting complex data
types.

Thank you again for awesome code you created,
Andrey Paramonov

test.h5 (7.13 KB)

Thanks Andrey!

I had no idea about the iteration order - it’s just using h5py’s functionality to do that. It looks like h5py should use creation order if that was tracked, and name order otherwise. Here’s the relevant code:

I tried with your sample file and I see similar results. I don’t know how to tell if it’s h5py or h5dump that is getting the order right, though.

Hi Thomas!

04.10.2018 15:05, Thomas Kluyver пишет:

[thomas1] thomas1 https://forum.hdfgroup.org/u/thomas1
October 4

Thanks Andrey!

I had no idea about the iteration order - it’s just using h5py’s
functionality to do that. It looks like h5py should use creation order
if that was tracked, and name order otherwise.

I tried with your sample file and I see similar results. I don’t know
how to tell if it’s h5py or h5dump that is getting the order right, though.

It’s h5dump and h5py, actually :wink:

import h5py

f = h5py.File('test.h5')
g = f.create_group('parent', track_order=True)
g.create_group('substances')
g.create_group('mixtures')
g.create_group('instructions')
g.create_group('amounts')
g.create_group('samples')
g.create_group('signals')
g.create_group('peaks')
g.create_group('assignments')

for t in g:
     print(t)

outputs

substances
mixtures
instructions
amounts
samples
signals
peaks
assignments

Best wishes,
Andrey Paramonov

Yes and no. It works when you create the file:

 % python3  << EOF
import h5py

f = h5py.File('test.h5')
g = f.create_group('parent', track_order=True)
g.create_group('substances')
g.create_group('mixtures')
g.create_group('instructions')
g.create_group('amounts')
g.create_group('samples')
g.create_group('signals')
g.create_group('peaks')
g.create_group('assignments')

for t in g:
     print(t)
EOF
substances
mixtures
instructions
amounts
samples
signals
peaks
assignments

But not when you read it again:

 % python3 << EOF
import h5py
f = h5py.File('test.h5')
for t in f['parent']:
    print(t)
EOF
amounts
substances
samples
mixtures
signals
instructions
peaks
assignments

I see from Github you’ve already worked with the relevant code in h5py, so you can probably debug it better than I can. :slight_smile:

04.10.2018 15:30, Thomas Kluyver пишет:

Yes and no. It works when you create the file:

But not when you read it again:

Uhm, this is really strange. I’ll get to debug it.

But please be sure that these small glitches not nearly compromise the
great job you have done :wink:

Best wishes,
Andrey Paramonov