H5Datatype with variable length: How to set the values?

Hi, i am trying create a compound dataset that uses variable length data as shown in the last 2 columns here:

For testing I tried to create a simple compound dataset having only one “column” of variable length 32-bit integers. I used this test

@Test
public void testCompoundDataSetIntVariableLength() throws Exception{
    
    String fileName = "./test.h5";
    FileFormat fileFormat = FileFormat.getFileFormat(FileFormat.FILE_TYPE_HDF5);
    H5File file = (H5File) fileFormat.createFile(fileName, FileFormat.FILE_CREATE_DELETE);
    
    // This is the total amount of entries for each type in the compound
    int DIM_SIZE = 2;        
    
    String message = "";
    Group pgroup = null;
    
    // We set the dimensions we want to store the data in - the product of all values has to be the dimension size
    long[] DIMs = { DIM_SIZE, 1 };
    
    // Chunks the data is stored in
    long[] CHUNKs = { 1, 1 };
    
    // We create dummy arrays with default values
    int[] array1 = new int[]{1,2};
    int[] array2 = new int[]{1,2,3,4};
    
    // Set up the data object
    Vector<Object> data = new Vector<>();
    data.add(array1);
    data.add(array2);
    //int[][] array = new int[][]{array1,array2};
    //data.add(array);
    
    // create groups
    String[] mnames = { "int" };

    H5Datatype intValueDataType = (H5Datatype)file.createDatatype(Datatype.CLASS_INTEGER, 4, Datatype.NATIVE, Datatype.NATIVE);        
    H5Datatype intValLengthDataType = (H5Datatype)file.createDatatype(Datatype.CLASS_VLEN, Datatype.NATIVE, Datatype.NATIVE, Datatype.NATIVE,intValueDataType);
    Datatype[] mdtypes = new H5Datatype[1];
    mdtypes[0] = intValLengthDataType;
    
    int[] msizes = new int[]{1};
    
    Dataset dset = file.createCompoundDS("/CompoundDS", pgroup, DIMs, null, CHUNKs, 0, mnames, mdtypes, msizes, data);
}

and tried different types how to fill data, either as several 1D arrays or one 2D array. In both cases the result is the same, I get the correct data type definition in the dataset but the data itself is empty:

I would expect that {(1,2)} is the content of row 1 and {(1,2,3,4)} of row 2. What am I doing wrong? Storage: SIZE:0 seems to indicate that no data was actually written to the dataset or am I wrong? I guess the behavior is somewhat similar to the one explained for Fortran in this thread.

The object library was created as an abstract layer that attempts to hide the storage format underneath. HDF5 has grown more complex, and sometimes that object library can be too abstract to understand and implement something. Still compounds in datsets should be workable (although the current development is improving attributes and compounds).
I would suggest trying to use the hdf5 java wrappers first to get something working. Being a wrapper around the C library APIs gives you a direct sync with the existing C documentation. Also check the H5Ex_T_Compound.java example.

So what does that mean exactly? There is nothing wrong with my approach, but the current state of the object library simply does not support that action?

I built my whole code around the object library. I can not really switch to the java wrappers or add some native stuff to the functionality of the without rewriting all my code, can I?

For perspective, or as an alternative, you should consider the HDFql (Java) interface. See the extensive documentation and variable-length examples throughout, e.g., near the beginning of section 4.2.

Bonus: You can try everything in the HDFql shell, before spending too much time writing custom code.

Best, G.

1 Like

I will have a closer look at it, thx. But before I do that, I would just want to have clarification: Did I do something wrong in my example or is there just no way of doing this operation properly in the current state of the object package? An if not, is there a reasonable chance that this might be fixed in an upcoming version?

I’ll take a look and get back to you. G.

Caution: I’m not a Java programmer.

OK, with that caveat, I looked around the Java code here and here.
Your code has the right intention, but from what I can see, currently, only variable-length strings are supported by the object layer.

It appears that the lower-level API hdf.hdf5lib supports variable-length sequences, but only for scalar base types, such as integers, floats, etc.

In other words, there is a disconnect between the object layer and hdf.hdf5lib for non-string, variable-length sequence types. What’s missing is a chunk of tedious glue code which breaks apart your ragged array and fiddles around with pointers. (Look at the JNI wrapper to get an idea.)

Unless you wanna drop down to the hdf.hdf5lib level and make your code look like C-code, I’d look elsewhere. Did I mention HDFql :wink:?

Best, G.

To add a bit more to the discussion…

We are currently enabling HDFql to support variable-length data types in Java - this new feature will be available in our next official release. To give a heads-up, your issue could be solved as follows (as an example and based on the screenshot posted above):

// declare Java class that "mimics" the HDF5 compound dataset
class Data
{
    int myDimension;
    int myShapeType;
    int myInterpolationType;
    int myIntegrationType;
    int myNumberOfNormalComponents;
    int myNumberOfShearComponents;
    ArrayList myConnectivity;
    ArrayList myFaceConnectivity;
}

// declare variables
Data write[] = new Data[1];
Data read[] = new Data[1];

// create HDF5 file 'myFile.h5' and use (i.e. open) it
HDFql.execute("CREATE AND USE FILE myFile.h5");

// create compound dataset 'myDataset'
HDFql.execute("CREATE DATASET myDataset AS COMPOUND(myDimension AS INT, myShapeType AS INT, myInterpolationType AS INT, myIntegrationType AS INT, myNumberOfNormalComponents AS INT, myNumberOfShearComponents AS INT, myConnectivity AS VARINT, myFaceConnectivity AS VARINT)");

// populate variable 'write' with dummy values
write[0] = new Data();
write[0].myDimension = 1;
write[0].myShapeType = 2;
write[0].myInterpolationType = 3;
write[0].myIntegrationType = 4;
write[0].myNumberOfNormalComponents = 5;
write[0].myNumberOfShearComponents = 6;
write[0].myConnectivity = new ArrayList();
write[0].myConnectivity.add(10);
write[0].myConnectivity.add(20);
write[0].myFaceConnectivity = new ArrayList();
write[0].myFaceConnectivity.add(30);
write[0].myFaceConnectivity.add(40);
write[0].myFaceConnectivity.add(50);

// write content of variable 'write' into dataset 'myDataset'
HDFql.execute("INSERT INTO myDataset VALUES FROM MEMORY " + HDFql.variableRegister(write));

// read content of dataset 'myDataset' and populate variable 'read' with it
HDFql.execute("SELECT FROM myDataset INTO MEMORY " + HDFql.variableRegister(read));

// display content of variable 'read'
System.out.println("myDimension: " + read[0].myDimension);
System.out.println("myShapeType: " + read[0].myShapeType);
System.out.println("myInterpolationType: " + read[0].myInterpolationType);
System.out.println("myIntegrationType: " + read[0].myIntegrationType);
System.out.println("myNumberOfNormalComponents: " + read[0].myNumberOfNormalComponents);
System.out.println("myNumberOfShearComponents: " + read[0].myNumberOfShearComponents);
for(int i = 0; i < read[0].myConnectivity.size(); i++)
{
    System.out.println("myConnectivity: " + read[0].myConnectivity.get(i));
}
for(int i = 0; i < read[0].myFaceConnectivity.size(); i++)
{
    System.out.println("myFaceConnectivity: " + read[0].myFaceConnectivity.get(i));
}
1 Like

Variable length datatypes are tricky to get working in the JNI code because of the pointers. However, the development history of the object library made assumptions that now need to be reversed - it is on our list!
HDFView development depends on it working.

Good to see this support coming @byrn! Which Java primitive data type or class are you thinking to use to represent/store HDF5 variable-length data? At our side, we finished implementing this support in HDFql and opted for the ArrayList class.

Yes, I have worked the issue and got real close a couple of times. So I am very aware of the issue every time I add or fix a feature in HDFView.
ArrayList is an interesting choice, I will keep it in mind. Whatever we do, it will need to apply or work with the JNI code as well.
Next releases; I have made a big change to the object library to better support attribute data. Hopefully I can get vlen data working correctly too.
Allen

1 Like

Hi, have there been any developments regarding this issue? Now, the issue becomes pressing for me. Since I wrote a whole code around the hdf object library I am hesitant to switch the whole code to HDFql.

I would even be totally ok with a hacky solution for variable length int[]. So if anybody can point me in the right direction…

Welcome back, Martin!

Just to let you know that since our previous post (January 2022), we have released HDFql version 2.5.0. This version fully supports reading and writing HDF5 datasets and attributes of variable-length types, as well as compound types containing variable-length members, in Java.

On another note, we are currently preparing the next official release of HDFql (version 2.6.0), planned for this autumn. This upcoming release will bring many goodies, including:

  • A MATLAB wrapper for HDFql
  • An enhanced Python wrapper (based on ctypes, enabling compatibility with any Python version)
  • Improved support for parallel HDF5 (MPI)
  • Many smaller improvements and bug fixes.

Stay tuned!

1 Like

I tried adding the support for writing vlen int32[] to my code by means of the lower-level HDF5 Java API. However, I currently failed as this seems to require the hvl_t struct from hdf.hdf5lib.structs, which is not present (or publicly exposed) in the 2 Java HDF Object package versions I tested (from HDFView 3.3.2 and 3.4.1). So, no luck natively in Java so far.

I solved my problem now with a pyhon scripts. Initially, I write the data as Strings to the compound data set. The script then converts my HDF5 file with compound datasets. The String data is converted to vlen data as a postprocessing step using h5py and numpy array. Not really nice, but i seems to work.

Interestingly, the data is represented correctly when shown with h5dump. However, HDFView (3.3.2) shows *ERROR* in the respective vlen columns.

I would still be highly interested in a native Java solution using the HDF object package

1 Like

I’ve been working on support for nested compound/variable-length data within HDFView. The present version (3.4.1) runs into both read problems on the java side of HDF5 and display issues within HDFView. Unfortunately, because the HDFView fixes depend on the HDF5 changes, and HDFView builds against specific HDF5 versions, HDFView won’t be able to integrate the fix until HDF5’s next release, 2.2.0, which is slated for late July.

If you’re reading *ERROR* values in a dataset that isn’t a vlen datatype inside of another datatype, then it might be a separate issue. If so, I would appreciate more information so I can replicate and investigate the problem.

1 Like

The issue is for vlen data in a compound dataset, so if I understand you correctly, the problem should be adressed by your fix in 2.2.0. If you want, I am more than happy to provide you with an example file.

Is writing vlen data in compound datasets also on your agenda?

Reading vlen data within a compound dataset is covered by my set of fixes. Writing to nested vlen/cmpd types from HDFView is out of scope currently.