Newbie question: hdf5 python plotting utility architecture


#1

Hi HDF5 forum!

Thanks for reading this. I’m new here. It’s my very first post. I want to build something cool and I need your help and advice!

I have an hdf5 file that contains a collection of datasets and metadata in hierarchy. I want a tool to help plot the data to analyze it. This will help me explore what is going on and solve problems. For example, I want to be able to make 2-D plots of various combinations of the data. For example, one dataset contained has 5 columns a,b,c,d,e and I want to create plots of a vs c and log(b) vs e. I want an option to preserve the plots and plug in other data. I want to be able to combine a vs c from multiple datasets. And with all these plots to be able to change axis and reformat.

What would be a good choice for app or combination of apps to accomplish this?

For example, create a GUI in pyside2 which allows me to be able to select from available columns what will be plotted and then pandas to create data frames and then matplotlib to graph them. Could this work? Can I use one of these tools to create a notebook page with certain frequently used plots and then drop new data in it?

I feel terribly lazy even asking, but is there anything already written, like an all-in-one framework, that does this sort of thing?

Happy coding…


#2

HDF Group posted a blog article on work done by BioSimulations.org that included visualizing HDF5 data (in their case from HSDS) using Vega. I can’t say whether this is applicable for your project, but Vega is definitely worth a look.

From the sound of your query in the CTD session, it will be important to resolve whether your users will want a notebook-style interface, or a more tailored UI.


#3

H5Web is an easy option. Here’s an example of how to use it in a Jupyter notebook. (You need to pip install jupyterlab_h5web first). G.


#4

My recommendation is Apache Superset.


#5

Thank you for the reference to Superset. I’ve not used Superset, but I took a cursory look and an initial search did not turn up a Superset connector (Python DB-ABI w/ an SQL Alchemy dialect) for HDF5.

Do you know if HDF5 has a Superset connector? Or do you export/translate HDF5 into a Superset compatible datastore first (or a compatible intermediate, perhaps PyTables)?


#6

Yes, you can use Apache Drill:

  1. Read http://hdfeos.org/examples/drill.php
  2. Read https://thedataist.com/visualize-anything-with-superset-and-drill/

By the way, you may start your own project bypassing Drill.
That is, h5py -> Superset connector.
Please let me know if you start one and got funding (or A+ grade in your class project)! :slight_smile: