extending string datatype byte size?

Hi,

Is there a way to extend the byte size of an existing string datatype? For example, I have a string column where the byte size is 3, but I extend the dataset in order to add more values, and now want to write a value that is 5 bytes. I've looked through the documentation to see if this is possible but haven't had any luck. I do know that I could instead use the variable length string data type to handle this. However, I am working with very large datasets, and there seems to be a very significant read/write performance penalty using variable length string data type.

I have an example test case below illustrating what I am trying to do, any pointers would be greatly appreciated.

Thanks,
Chris

@Test
  public void testHdfStringWriter() throws Exception {
    // create and open a temp file
    File file = new File("c:\\temp.h5");
    int fileId = H5.H5Fcreate(file.getAbsolutePath(),
        HDF5Constants.H5F_ACC_TRUNC, HDF5Constants.H5P_DEFAULT,
        HDF5Constants.H5P_DEFAULT);

    // define dimensions and create dataspace
    int rowCount = 3;
    long[] dataset_dims = { rowCount };
    long[] max_dims = { HDF5Constants.H5S_UNLIMITED };
    long[] chunk_dims = { rowCount };
    int dataspaceId = H5.H5Screate_simple(1, dataset_dims, max_dims);
    int dcplId = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
    H5.H5Pset_deflate(dcplId, 9);
    H5.H5Pset_chunk(dcplId, 1, chunk_dims);

    // create 3 byte string datatype, then create the dataset
    int strtypeId = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
    H5.H5Tset_size(strtypeId, 3);
    int datasetId = H5.H5Dcreate(fileId, "/dataset1", strtypeId,
        dataspaceId, dcplId);

    // write the data
    String[] strings = { "111", "222","333" };
    byte[][] stringData = new byte[rowCount][3];
    for(int i=0; i<strings.length; i++){
      stringData[i] = strings[i].getBytes();
    }
    H5.H5Dwrite(datasetId, strtypeId, HDF5Constants.H5S_ALL,
        HDF5Constants.H5S_ALL, HDF5Constants.H5P_DEFAULT, stringData);
    
    //try to increase byte size to 5
    H5.H5Tset_size(strtypeId, 5);
    
    //write longer strings
    String[] newStrings = { "11111", "22222","33333" };
    stringData = new byte[rowCount][5];
    for(int i=0; i<strings.length; i++){
      stringData[i] = newStrings[i].getBytes();
    }
    H5.H5Dwrite(datasetId, strtypeId, HDF5Constants.H5S_ALL,
        HDF5Constants.H5S_ALL, HDF5Constants.H5P_DEFAULT, stringData);
    
    H5.H5Tclose(strtypeId);
    H5.H5Sclose(dataspaceId);
    H5.H5Dclose(datasetId);
    H5.H5Fclose(fileId);
  }

Hello Chris,

Is there a way to extend the byte size of an existing string datatype?

You can't change the datatype of a dataset, but you can read your data in, convert your data to a new datatype (using H5Tconvert), and write that data to another dataset with the new datatype.

Attached is an example of that. (I'll include it at the bottom of this
message, too.)

Will that work for you? (You could then delete the original dataset
(see H5Ldelete) and use the h5repack utility to get rid of the unused
space left by the deleted object.)

-Barbara Jones
HDF Helpdesk

For example, I have a string column where the byte size is 3, but I extend the dataset in order to add more values, and now want to write a value that is 5 bytes. I've looked through the documentation to see if this is possible but haven't had any luck. I do know that I could instead use the variable length string data type to handle this. However, I am working with very large datasets, and there seems to be a very significant read/write performance penalty using variable length string data type.

I have an example test case below illustrating what I am trying to do, any pointers would be greatly appreciated.

Thanks,
Chris

/********************************************************************/
/*
    Create a dataset with an array of strings of size 5.
    Convert the data to an array of strings of size 8, and write it
    to a new dataset of that size.
*/
/********************************************************************/

#include "hdf5.h"
#include <string.h>
#define FILE "cnvstr.h5"

main() {

    hid_t file_id, dataset_id, dataspace_id, dataset_id1;
    herr_t status;
    char buf[]={"test left call mesh"};
    char newbuf[50];

    hsize_t dims[2] = {2,2};
    hid_t dtype, dtype1;
    size_t size;

    file_id = H5Fcreate(FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
    printf ("H5Fcreate returns: %i\n", file_id);
    dataspace_id = H5Screate_simple (2, dims, NULL);
    printf ("H5Screate_simple returns: %i\n", dataspace_id);

    dtype = H5Tcopy (H5T_C_S1);
    size = 5;
    status = H5Tset_size (dtype, size);

    dataset_id = H5Dcreate(file_id, "StrData", dtype, dataspace_id,
                    H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);

    printf ("H5Dcreate returns: %i\n", dataset_id);
    status = H5Dwrite (dataset_id, dtype, H5S_ALL, H5S_ALL, H5P_DEFAULT, buf);
    printf ("H5Dwrite returns: %i\n", status);

    dtype1 = H5Tcopy (H5T_C_S1);
    size = 8;
    status = H5Tset_size (dtype1, size);

    dataset_id1 = H5Dcreate(file_id, "StrDatLong", dtype1, dataspace_id,
                             H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
    printf ("H5Dcreate returns: %i\n", dataset_id1);

    /* Copy buffer of data to new buffer and convert it to new datatype size */
    strncpy (newbuf, buf, 20);
    newbuf[20] = '\0';
    status = H5Tconvert (dtype, dtype1, size, newbuf, NULL, H5P_DEFAULT);
    printf ("H5Tconvert returns: %i\n", status);

    status = H5Dwrite (dataset_id1, dtype1, H5S_ALL, H5S_ALL, H5P_DEFAULT, newbuf);
    printf ("H5Dwrite returns: %i\n", status);

    status = H5Tclose (dtype);
    status = H5Tclose (dtype1);
    status = H5Dclose(dataset_id);
    status = H5Dclose(dataset_id1);
    status = H5Sclose(dataspace_id);
    status = H5Fclose(file_id);
}

cnvstr.c (2.12 KB)

Hi Barbara,

Thanks, this is exactly the information that I was looking for. Assuming these calls are exposed in the Java API, this solution should be acceptable for what I am trying to do. Thanks again for your help, I appreciate it.

Regards,
Chris

Hi Chris,

Thanks, this is exactly the information that I was looking for. Assuming these calls are exposed in the Java API, this solution should be acceptable for what I am trying to do.

H5Tconvert is in HDF-Java 2.7. HOWEVER, it does not work the same
way as the C API, since Java does not support casting. It converts
the data to 'byte'.

You can obtain HDF-Java 2.7 Beta from here, if you want to look at this:

    http://www.hdfgroup.org/ftp/HDF5/hdf-java-2.7/

-Barbara Jones

Hi Chris,

H5Tconvert() was not supported in hdf-java 2.6 and is partially supported
in hdf-java 2.7 (beta release is out as Barbara mentioned).

Right now, H5Tconvert() in hdf-java only takes byte buffers as input and
output, i.e.

void H5Tconvert(int src_id, int dst_id, long nelmts, byte[] buf,
     byte[] background, int plist_id)

For example, if you want to convert unsigned short (2-byte integer) to
a 4-byte integer, you have to convert your short buffer to a byte array
and pass and allocate a byte buff as the output and pass it to H5Tconvert.
You then convert the output buffer from H5Tconvert to a 4-byte integer array.

A better approach is to implement a method that takes an object as the
buffer, i.e.

void H5Tconvert(int src_id, int dst_id, long nelmts, object buf,
     object background, int plist_id)

Since there is no direct mapping between Java object and C void*,
the implementation of the Java method above requires a lot of work.

Also, some datatype, e.g. compound datatype, will be extremely hard
to implement in Java. So we provide a middle-way method for our users.

Thanks
--pc

···

On 10/5/2010 12:51 PM, Brown, Chris wrote:

Hi Barbara,

Thanks, this is exactly the information that I was looking for. Assuming these calls are exposed in the Java API, this solution should be acceptable for what I am trying to do. Thanks again for your help, I appreciate it.

Regards,

Chris

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org