How to get started wih HDF5 + Java

Hi community,

I am probably missing something here but I can not find working instructions on how to use HDF5 in Java as a library through JNI (I want to open, read, and write H5 files).

My assumption is that I download HDF5 source, compile it and end up with a jar file and native libraries for Linux in this case. Together, they can be used in Java.
Please correct me if there is another way.

I tried to follow these instructions https://portal.hdfgroup.org/display/support/HDF-Java#HDF-Java-build
but:

  • on the branch hdf_5_10 there is no configure executable
  • on the latest branch hdf_5_13_2 the configure file is Windows formatted (dos2unix) and after compiling no Java libraries are created

Can anyone please point me in the right direction?

Thanks in advance!

Hello! Would you consider using HDFView?
https://www.hdfgroup.org/downloads/hdfview

Thanks for replying.
I am not sure if I understand your suggestion.

How can I use HDFView in a Java project?
Can you point me to instructions on how to do that?

We test both methods, autotools configure and CMake, I like CMake better as it is cross platform.
We also have pre-compiled binaries available that include Java, and external compression libraries.

Java lib requires that you build SHARED.
If you are using the develop branch then you need to run autogen.sh, but it looks like you are using a release branch.

Prefer to build in a separate build folder and not in-source.

Run configure from within the build folder and for autotools:
configure --enable-build-mode=production --enable-shared --enable-java

CMake:
cmake -G “generator type” -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE="“config/toolchain/GCC.cmake” (or other) -DBUILD_SHARED_LIBS=ON
-DHDF5_BUILD_JAVA=ON

Suggest you run cpack after make build.

I’m sorry. I didn’t know what I was talking about… Byrn was able to help you better.

Also CMake configuration will need the path to the source at the end of the command line.

There are also some alternative bindings in a package called JHDF5 that seem popular on the Java community.

https://unlimited.ethz.ch/display/JHDF/JHDF5+FAQ


https://mvnrepository.com/artifact/cisd/jhdf5/19.04.0

Hi @dseide,

On the other hand, if you have a preference for high-level declarative APIs you may want to try HDFql, which is similar to SQL. Please see some examples that illustrate the usage of HDFql in the Java programming language. In addition, besides Java, HDFql supports C, C++, C#, Python, Fortran and R programming languages.

Hope it helps!

1 Like

Hi Byrn,

Thanks a lot for your comment.

I don’t necessarily want to compile the whole thing and would be very grateful for already compiled and ready-to-use binaries for Windows, Linux, and macOS. Can you provide a link for the download?

In the meanwhile, I will try to go on with the CMake approach you proposed.

Hi kittisopikulm,

Thank you for commenting.

Your suggestion sounds very promising (too much actually), I will give it a try.

I have to admit that I have mixed feelings here. Currently, JHDF5 currently uses the 1.10 branch of the HDF5 library. It also rewrites the JNI bindings rather than layering adjacent to or on top of those bindings. That said it takes care of a lot of issues that would otherwise be very difficult to deal with in Java itself.

Reading the subject as “How to get started with HDF5 + Java” I have to agree w/ the HDFql suggestion. Before getting into the technical weeds, I think you would want to quickly find out if HDF5 can help you solve the (non-technical) problem at hand. Under the circumstances, JNI appears almost as the surest non-starter way. With HDFql, you can even pivot to another (host-)language should Java not be viable in the long run.

G.

1 Like

https://www.hdfgroup.org/downloads/hdf5/

Thank you everyone for your replies.

@byrn
I am sorry, but there are a lot of links on the website you indicate.
Which one is it? The linux files are specific for certain distributions which is suspicious as I would expect hdf.jar + native libraries which are not compiled for different distributions. The file for Windows contains an MSI installer.

@kittisopikulm
Using the JNI library from Zurich University seems to be the better approach. The lack of documentation is a problem, though.

@gheber
The query language approach might be helpful, but we are not looking for that at the moment.

My recommendation is netcdf-java.
No JNI is required especially when your workflow is read-only.

https://docs.unidata.ucar.edu/netcdf-java/5.3/userguide/building_from_source.html

Alternatively, use jhdf

Try also Drill that uses jhdf, which could be the simplest solution.

http://hdfeos.org/examples/drill.php

JHDF (jamesmudd) is missing slicing which is mandatory for us because we want to manage matrices with dimensions of 5M x 100k. Loading this amount of data into memory is not possible on current everyday computers.

I finally became friends with the JHDF from ETH Zurich, thanks again @kittisopikulm.

Thank you all for your help and support.

The JHDF5 from ETH is commonly called sis-jhdf5 or cisd-jhdf5.

You can find the documentation, including JavaDoc here:
https://unlimited.ethz.ch/pages/viewpage.action?pageId=92865195

@kittisopikulm
Do you know anyone at ETH Zurich?

Because they claim that the latest version has support for slicing/block reading, although the necessary methods are not implemented.

The Javadoc seems to be private and the source code is not available either.
I was referring to this when I said that there is no documentation.

EDIT:
The source and javadoc are available in the downloadable ZIP but they are excluded from mvn and not visible online.

I already contacted the responsible people of the project a few days ago, but they haven’t replied yet.

Doesn’t your matrix require only 5T memory?
I think you can easily find such small system from AWS [1].
I don’t know what you mean by slicing capability but you can request such feature via GitHub for jHDF or netCDF-java community.

Anyway, you can use whatever Java solution for such small data because you don’t have to deal with deploying jar on the the cluster of thousand machines.

[1] https://aws.amazon.com/ec2/instance-types/high-memory/

@hyoklee
I am sorry, what is 5T memory? You’re not referring to 5TB, are you?

We need this solution to run on a work laptop (8 - 16 GB).
Slicing or block reading refers to the capability to read only parts of an array stored in HDF5.
I.e. There is an array of 1 Billion Strings with length 30 stored in HDF5, reading this into memory is simply not possible with that amount of RAM. So it has to be “sliced” to be read one piece at a time.
The latest version of sis-jhdf5 claims to offer this functionality with a code example on their webpage but the actual interface does not offer those methods.

I have been able to get in touch with them finally, I will let you know once I learn something new.

Regarding netCDF-java:
I understand it helps to manage CDF on top of HDF5 but we are dealing with data that comes from other libraries that use plain HDF5 (with non-CDF data models) which makes me believe that it is of no use to us.

Regarding jamesmudd/jhdf:
This project is developed as a hobby project (no offense, this is great work!) and lacks a few functionalities like writing and slicing. I do not want to depend on them to implement those features because I don’t know if and when this would happen.