I am probably missing something here but I can not find working instructions on how to use HDF5 in Java as a library through JNI (I want to open, read, and write H5 files).
My assumption is that I download HDF5 source, compile it and end up with a jar file and native libraries for Linux in this case. Together, they can be used in Java.
Please correct me if there is another way.
On the other hand, if you have a preference for high-level declarative APIs you may want to try HDFql, which is similar to SQL. Please see some examples that illustrate the usage of HDFql in the Java programming language. In addition, besides Java, HDFql supports C, C++, C#, Python, Fortran and R programming languages.
I have to admit that I have mixed feelings here. Currently, JHDF5 currently uses the 1.10 branch of the HDF5 library. It also rewrites the JNI bindings rather than layering adjacent to or on top of those bindings. That said it takes care of a lot of issues that would otherwise be very difficult to deal with in Java itself.
Reading the subject as “How to get started with HDF5 + Java” I have to agree w/ the HDFql suggestion. Before getting into the technical weeds, I think you would want to quickly find out if HDF5 can help you solve the (non-technical) problem at hand. Under the circumstances, JNI appears almost as the surest non-starter way. With HDFql, you can even pivot to another (host-)language should Java not be viable in the long run.
I am sorry, but there are a lot of links on the website you indicate.
Which one is it? The linux files are specific for certain distributions which is suspicious as I would expect hdf.jar + native libraries which are not compiled for different distributions. The file for Windows contains an MSI installer.
Using the JNI library from Zurich University seems to be the better approach. The lack of documentation is a problem, though.
The query language approach might be helpful, but we are not looking for that at the moment.
JHDF (jamesmudd) is missing slicing which is mandatory for us because we want to manage matrices with dimensions of 5M x 100k. Loading this amount of data into memory is not possible on current everyday computers.
I finally became friends with the JHDF from ETH Zurich, thanks again @kittisopikulm.
Doesn’t your matrix require only 5T memory?
I think you can easily find such small system from AWS .
I don’t know what you mean by slicing capability but you can request such feature via GitHub for jHDF or netCDF-java community.
Anyway, you can use whatever Java solution for such small data because you don’t have to deal with deploying jar on the the cluster of thousand machines.
I am sorry, what is 5T memory? You’re not referring to 5TB, are you?
We need this solution to run on a work laptop (8 - 16 GB).
Slicing or block reading refers to the capability to read only parts of an array stored in HDF5.
I.e. There is an array of 1 Billion Strings with length 30 stored in HDF5, reading this into memory is simply not possible with that amount of RAM. So it has to be “sliced” to be read one piece at a time.
The latest version of sis-jhdf5 claims to offer this functionality with a code example on their webpage but the actual interface does not offer those methods.
I have been able to get in touch with them finally, I will let you know once I learn something new.
I understand it helps to manage CDF on top of HDF5 but we are dealing with data that comes from other libraries that use plain HDF5 (with non-CDF data models) which makes me believe that it is of no use to us.
This project is developed as a hobby project (no offense, this is great work!) and lacks a few functionalities like writing and slicing. I do not want to depend on them to implement those features because I don’t know if and when this would happen.