HDFql 2.0.0 Release - HDF5 made easy and fast

We are happy to announce the release of HDFql 2.0.0!

This version includes:

  • Added support for parallel HDF5 (PHDF5) - thanks to Quincey (Berkeley Lab, USA), Rob (Argonne Lab, USA), Konrad (CNRS, France), Sebastian (Juelich Supercomputing Centre, Germany), Guy (IN2P3, France) and Holger (Fermilab, USA) for the feedback.

  • Added wrapper for the Intel Fortran Compiler (IFORT) - thanks to Intel for sponsoring HDFql with a free IFORT license.

  • Improved performance and memory footprint

  • Added capability to specify the storage allocation when creating a dataset (i.e. early, incremental or late)

  • Bug fixes - thanks to Erik (M+P, The Netherlands), Ryan (KMLabs, USA), Michael (U.S. Army Corps of Engineers, USA), Romain (CEA, France) and Petr (University of West Bohemia, Czech Republic) for reporting these.

  • Updated reference manual - thanks to Jeff (Redwood Center for Theoretical Neuroscience, USA) for the feedback.

(Please check the release notes for further details)

For heads up on releases and the latest on HDFql, we welcome you to connect on twitter.com/hdfql

This looks like an exciting project? Could you give more information about the downloads? Exactly how were the VS binaries compiled? Static or Dynamic Linked runtime libs? Also for the macOS builds are they compatible with the built in clang from Xcode? Do I need to install Homebrew or some such to get the GCC compiler runtime libs?

Hi Mike,

The HDFql binaries built with VS are dynamically linked to runtime libraries. These binaries were compiled/linked with the following flags (in VS): /02 /LD /openmp /link /LTCG /FORCE:MULTIPLE /NODEFAULTLIB:libcmt.lib

Concerning macOS, HDFql was built with GCC (version 4.9). In principle, it should be possible to call HDFql shared library (compiled with GCC) from a program compiled with clang without issues. Consequently, there is no need to install GCC runtime libraries to run the program.

Hope this helps!

Hi,

After updating the HDFql from 1.5.0 to 2.0.0 the provided chunk sizes aren’t respected anymore, leading to really poor compression. In the release notes its mentioned, that chunk sizes are automatically computed, but it appears to be a bug, as the provided ones aren’t respected.

Creating a dataset with 1.5.0:

CREATE CHUNKED(6000,3) DATASET acceleration AS FLOAT(UNLIMITED,3) ENABLE SHUFFLE ZLIB LEVEL 9"

and repeatedly increasing it lead to:

> h5ls -v mydata.h5
...
acceleration             Dataset {8204472/Inf, 3/3}
    Location:  1:1664
    Links:     1
    Chunks:    {6000, 3} 72000 bytes
    Storage:   98453664 logical bytes, 13530282 allocated bytes, 727.65% utilization
    Filter-0:  shuffle-2 OPT {4}
    Filter-1:  deflate-1 OPT {9}
    Type:      native float
...

The same with 2.0.0

...
acceleration             Dataset {8204472/Inf, 3/3}
    Location:  1:1664
    Links:     1
    Chunks:    {1, 3} 12 bytes
    Storage:   98453664 logical bytes, 144773425 allocated bytes, 68.01% utilization
    Filter-0:  shuffle-2 OPT {4}
    Filter-1:  deflate-1 OPT {9}
    Type:      native float
...

as you can see, the chunking differs. (I’m using the Java Wrapper on Ubuntu 18.10)

Have there been semantic changes, which require migration or is this a plain bug?

Thx,

Hi Gerhard,

Yes, we can confirm this issue is a bug introduced in HDFql version 2.0.0. Thanks for reporting it!

We are currently fixing it and a patch version (2.0.1) with this fix will be released this week. We will post in this thread when the patch version is available for download (at http://www.hdfql.com).

Cheers!

1 Like

Hi,

thanks for that update. This looks like a very interesting project to me.
Did I get it correctly and the new version supports parallel HDF5 in C++, too? The Reference manual indicates this to me, but I couldn’t find explicit proof.
I am just wondering because parallel HDF5 is not supported in the C++ API.

Thanks and regards,

Chris

Hi Widmannc,

Thanks for the encouraging feedback!

Yes, HDFql supports parallel HDF5 in C++ (besides C, Java, Python, C#, Fortran and R programming languages).

Below a small example to illustrate how you can use HDFql to work with parallel HDF5 in C++.

Hope this helps!

// assume that the following C++ program is launched in parallel using four MPI processes (e.g. "mpiexec –n 4 my_program")

// include HDFql C++ header file (make sure it can be found by the C++ compiler)
#include "HDFql.hpp"

int main(int argc, char *argv[])
{
    // declare variables
    char script[1024];
    int rank;

    // create an HDF5 file named "my_file.h5" in parallel
    HDFql::execute("CREATE PARALLEL FILE my_file.h5");

    // use (i.e. open) HDF5 file "my_file.h5" in parallel
    HDFql::execute("USE PARALLEL FILE my_file.h5");

    // create an HDF5 dataset named "my_dataset" of data type int of one dimension (size 4)
    HDFql::execute("CREATE DATASET my_dataset AS INT(4)");

    // get number (i.e. rank) of the MPI process (should be between 0 and 3)
    rank = HDFql::mpiGetRank();

    // prepare script to insert (i.e. write) in parallel the values 0, 10, 20 and 30 into positions #0 (by MPI process rank 0), #1 (by MPI process rank 1), #2 (by MPI process rank 2) and #3 (by MPI process rank 3) of dataset "my_dataset" using a point selection
    sprintf(script, "INSERT INTO PARALLEL my_dataset(%d) VALUES(%d)", rank, rank * 10);

    // execute script
    HDFql::execute(script);

    return EXIT_SUCCESS;
}

Hi,

thank you very much for your fast response and the detailed answer including the code example.
This is indeed a very interesting approach in working with parallel HDF5 and I will try it in the next weeks.

Kind regards,

Chris

Hi Gerhard,

We have released HDFql version 2.0.1, which fixes the issue you have posted in this thread. This version is available for download at http://www.hdfql.com.

Cheers!