H5Datatype with variable length: How to set the values?

Hi, i am trying create a compound dataset that uses variable length data as shown in the last 2 columns here:

For testing I tried to create a simple compound dataset having only one “column” of variable length 32-bit integers. I used this test

@Test
public void testCompoundDataSetIntVariableLength() throws Exception{
    
    String fileName = "./test.h5";
    FileFormat fileFormat = FileFormat.getFileFormat(FileFormat.FILE_TYPE_HDF5);
    H5File file = (H5File) fileFormat.createFile(fileName, FileFormat.FILE_CREATE_DELETE);
    
    // This is the total amount of entries for each type in the compound
    int DIM_SIZE = 2;        
    
    String message = "";
    Group pgroup = null;
    
    // We set the dimensions we want to store the data in - the product of all values has to be the dimension size
    long[] DIMs = { DIM_SIZE, 1 };
    
    // Chunks the data is stored in
    long[] CHUNKs = { 1, 1 };
    
    // We create dummy arrays with default values
    int[] array1 = new int[]{1,2};
    int[] array2 = new int[]{1,2,3,4};
    
    // Set up the data object
    Vector<Object> data = new Vector<>();
    data.add(array1);
    data.add(array2);
    //int[][] array = new int[][]{array1,array2};
    //data.add(array);
    
    // create groups
    String[] mnames = { "int" };

    H5Datatype intValueDataType = (H5Datatype)file.createDatatype(Datatype.CLASS_INTEGER, 4, Datatype.NATIVE, Datatype.NATIVE);        
    H5Datatype intValLengthDataType = (H5Datatype)file.createDatatype(Datatype.CLASS_VLEN, Datatype.NATIVE, Datatype.NATIVE, Datatype.NATIVE,intValueDataType);
    Datatype[] mdtypes = new H5Datatype[1];
    mdtypes[0] = intValLengthDataType;
    
    int[] msizes = new int[]{1};
    
    Dataset dset = file.createCompoundDS("/CompoundDS", pgroup, DIMs, null, CHUNKs, 0, mnames, mdtypes, msizes, data);
}

and tried different types how to fill data, either as several 1D arrays or one 2D array. In both cases the result is the same, I get the correct data type definition in the dataset but the data itself is empty:

I would expect that {(1,2)} is the content of row 1 and {(1,2,3,4)} of row 2. What am I doing wrong? Storage: SIZE:0 seems to indicate that no data was actually written to the dataset or am I wrong? I guess the behavior is somewhat similar to the one explained for Fortran in this thread.

The object library was created as an abstract layer that attempts to hide the storage format underneath. HDF5 has grown more complex, and sometimes that object library can be too abstract to understand and implement something. Still compounds in datsets should be workable (although the current development is improving attributes and compounds).
I would suggest trying to use the hdf5 java wrappers first to get something working. Being a wrapper around the C library APIs gives you a direct sync with the existing C documentation. Also check the H5Ex_T_Compound.java example.

So what does that mean exactly? There is nothing wrong with my approach, but the current state of the object library simply does not support that action?

I built my whole code around the object library. I can not really switch to the java wrappers or add some native stuff to the functionality of the without rewriting all my code, can I?

For perspective, or as an alternative, you should consider the HDFql (Java) interface. See the extensive documentation and variable-length examples throughout, e.g., near the beginning of section 4.2.

Bonus: You can try everything in the HDFql shell, before spending too much time writing custom code.

Best, G.

1 Like

I will have a closer look at it, thx. But before I do that, I would just want to have clarification: Did I do something wrong in my example or is there just no way of doing this operation properly in the current state of the object package? An if not, is there a reasonable chance that this might be fixed in an upcoming version?

I’ll take a look and get back to you. G.

Caution: I’m not a Java programmer.

OK, with that caveat, I looked around the Java code here and here.
Your code has the right intention, but from what I can see, currently, only variable-length strings are supported by the object layer.

It appears that the lower-level API hdf.hdf5lib supports variable-length sequences, but only for scalar base types, such as integers, floats, etc.

In other words, there is a disconnect between the object layer and hdf.hdf5lib for non-string, variable-length sequence types. What’s missing is a chunk of tedious glue code which breaks apart your ragged array and fiddles around with pointers. (Look at the JNI wrapper to get an idea.)

Unless you wanna drop down to the hdf.hdf5lib level and make your code look like C-code, I’d look elsewhere. Did I mention HDFql :wink:?

Best, G.

To add a bit more to the discussion…

We are currently enabling HDFql to support variable-length data types in Java - this new feature will be available in our next official release. To give a heads-up, your issue could be solved as follows (as an example and based on the screenshot posted above):

// declare Java class that "mimics" the HDF5 compound dataset
class Data
{
    int myDimension;
    int myShapeType;
    int myInterpolationType;
    int myIntegrationType;
    int myNumberOfNormalComponents;
    int myNumberOfShearComponents;
    ArrayList myConnectivity;
    ArrayList myFaceConnectivity;
}

// declare variables
Data write[] = new Data[1];
Data read[] = new Data[1];

// create HDF5 file 'myFile.h5' and use (i.e. open) it
HDFql.execute("CREATE AND USE FILE myFile.h5");

// create compound dataset 'myDataset'
HDFql.execute("CREATE DATASET myDataset AS COMPOUND(myDimension AS INT, myShapeType AS INT, myInterpolationType AS INT, myIntegrationType AS INT, myNumberOfNormalComponents AS INT, myNumberOfShearComponents AS INT, myConnectivity AS VARINT, myFaceConnectivity AS VARINT)");

// populate variable 'write' with dummy values
write[0] = new Data();
write[0].myDimension = 1;
write[0].myShapeType = 2;
write[0].myInterpolationType = 3;
write[0].myIntegrationType = 4;
write[0].myNumberOfNormalComponents = 5;
write[0].myNumberOfShearComponents = 6;
write[0].myConnectivity = new ArrayList();
write[0].myConnectivity.add(10);
write[0].myConnectivity.add(20);
write[0].myFaceConnectivity = new ArrayList();
write[0].myFaceConnectivity.add(30);
write[0].myFaceConnectivity.add(40);
write[0].myFaceConnectivity.add(50);

// write content of variable 'write' into dataset 'myDataset'
HDFql.execute("INSERT INTO myDataset VALUES FROM MEMORY " + HDFql.variableRegister(write));

// read content of dataset 'myDataset' and populate variable 'read' with it
HDFql.execute("SELECT FROM myDataset INTO MEMORY " + HDFql.variableRegister(read));

// display content of variable 'read'
System.out.println("myDimension: " + read[0].myDimension);
System.out.println("myShapeType: " + read[0].myShapeType);
System.out.println("myInterpolationType: " + read[0].myInterpolationType);
System.out.println("myIntegrationType: " + read[0].myIntegrationType);
System.out.println("myNumberOfNormalComponents: " + read[0].myNumberOfNormalComponents);
System.out.println("myNumberOfShearComponents: " + read[0].myNumberOfShearComponents);
for(int i = 0; i < read[0].myConnectivity.size(); i++)
{
    System.out.println("myConnectivity: " + read[0].myConnectivity.get(i));
}
for(int i = 0; i < read[0].myFaceConnectivity.size(); i++)
{
    System.out.println("myFaceConnectivity: " + read[0].myFaceConnectivity.get(i));
}
1 Like

Variable length datatypes are tricky to get working in the JNI code because of the pointers. However, the development history of the object library made assumptions that now need to be reversed - it is on our list!
HDFView development depends on it working.

Good to see this support coming @byrn! Which Java primitive data type or class are you thinking to use to represent/store HDF5 variable-length data? At our side, we finished implementing this support in HDFql and opted for the ArrayList class.

Yes, I have worked the issue and got real close a couple of times. So I am very aware of the issue every time I add or fix a feature in HDFView.
ArrayList is an interesting choice, I will keep it in mind. Whatever we do, it will need to apply or work with the JNI code as well.
Next releases; I have made a big change to the object library to better support attribute data. Hopefully I can get vlen data working correctly too.
Allen

1 Like