H5Datatype with variable length: How to set the values?


#1

Hi, i am trying create a compound dataset that uses variable length data as shown in the last 2 columns here:

For testing I tried to create a simple compound dataset having only one “column” of variable length 32-bit integers. I used this test

@Test
public void testCompoundDataSetIntVariableLength() throws Exception{
    
    String fileName = "./test.h5";
    FileFormat fileFormat = FileFormat.getFileFormat(FileFormat.FILE_TYPE_HDF5);
    H5File file = (H5File) fileFormat.createFile(fileName, FileFormat.FILE_CREATE_DELETE);
    
    // This is the total amount of entries for each type in the compound
    int DIM_SIZE = 2;        
    
    String message = "";
    Group pgroup = null;
    
    // We set the dimensions we want to store the data in - the product of all values has to be the dimension size
    long[] DIMs = { DIM_SIZE, 1 };
    
    // Chunks the data is stored in
    long[] CHUNKs = { 1, 1 };
    
    // We create dummy arrays with default values
    int[] array1 = new int[]{1,2};
    int[] array2 = new int[]{1,2,3,4};
    
    // Set up the data object
    Vector<Object> data = new Vector<>();
    data.add(array1);
    data.add(array2);
    //int[][] array = new int[][]{array1,array2};
    //data.add(array);
    
    // create groups
    String[] mnames = { "int" };

    H5Datatype intValueDataType = (H5Datatype)file.createDatatype(Datatype.CLASS_INTEGER, 4, Datatype.NATIVE, Datatype.NATIVE);        
    H5Datatype intValLengthDataType = (H5Datatype)file.createDatatype(Datatype.CLASS_VLEN, Datatype.NATIVE, Datatype.NATIVE, Datatype.NATIVE,intValueDataType);
    Datatype[] mdtypes = new H5Datatype[1];
    mdtypes[0] = intValLengthDataType;
    
    int[] msizes = new int[]{1};
    
    Dataset dset = file.createCompoundDS("/CompoundDS", pgroup, DIMs, null, CHUNKs, 0, mnames, mdtypes, msizes, data);
}

and tried different types how to fill data, either as several 1D arrays or one 2D array. In both cases the result is the same, I get the correct data type definition in the dataset but the data itself is empty:

I would expect that {(1,2)} is the content of row 1 and {(1,2,3,4)} of row 2. What am I doing wrong? Storage: SIZE:0 seems to indicate that no data was actually written to the dataset or am I wrong? I guess the behavior is somewhat similar to the one explained for Fortran in this thread.


#2

The object library was created as an abstract layer that attempts to hide the storage format underneath. HDF5 has grown more complex, and sometimes that object library can be too abstract to understand and implement something. Still compounds in datsets should be workable (although the current development is improving attributes and compounds).
I would suggest trying to use the hdf5 java wrappers first to get something working. Being a wrapper around the C library APIs gives you a direct sync with the existing C documentation. Also check the H5Ex_T_Compound.java example.


#3

So what does that mean exactly? There is nothing wrong with my approach, but the current state of the object library simply does not support that action?

I built my whole code around the object library. I can not really switch to the java wrappers or add some native stuff to the functionality of the without rewriting all my code, can I?


#4

For perspective, or as an alternative, you should consider the HDFql (Java) interface. See the extensive documentation and variable-length examples throughout, e.g., near the beginning of section 4.2.

Bonus: You can try everything in the HDFql shell, before spending too much time writing custom code.

Best, G.


#5

I will have a closer look at it, thx. But before I do that, I would just want to have clarification: Did I do something wrong in my example or is there just no way of doing this operation properly in the current state of the object package? An if not, is there a reasonable chance that this might be fixed in an upcoming version?


#6

I’ll take a look and get back to you. G.


#7

Caution: I’m not a Java programmer.

OK, with that caveat, I looked around the Java code here and here.
Your code has the right intention, but from what I can see, currently, only variable-length strings are supported by the object layer.

It appears that the lower-level API hdf.hdf5lib supports variable-length sequences, but only for scalar base types, such as integers, floats, etc.

In other words, there is a disconnect between the object layer and hdf.hdf5lib for non-string, variable-length sequence types. What’s missing is a chunk of tedious glue code which breaks apart your ragged array and fiddles around with pointers. (Look at the JNI wrapper to get an idea.)

Unless you wanna drop down to the hdf.hdf5lib level and make your code look like C-code, I’d look elsewhere. Did I mention HDFql :wink:?

Best, G.