storing a string dataset in HDF5 using Java

Jason_Sachs · July 13, 2009, 3:16pm

I'll ask a simplified set of questions to rephrase my original question
about using the high-level Object API in Java.

I would like to create a gzip-compressed string dataset in an HDF5 file
using Java, based around a datatype that is H5T_C_S1 with a known string
length. I have done it in C/C++, just trying to port to Java. The
datatype code is initialized roughly as follows.

    int tid = H5.H5Tcopy( HDF5Constants.H5T_C_S1);
    H5.H5Tset_size(tid, x);
    Datatype dtype = new H5Datatype(tid);

    FileFormat ff = g.getFileFormat();
    ncsa.hdf.object.Dataset ds = ff.createScalarDS(
        "ESPDF", g, dtype, dims, maxdims,
chunks, gzip, null);

1) How do I use FileFormat.createScalarDS with string data? The examples
in
http://www.hdfgroup.org/hdf-java-html/hdf-object/use.html#read_dataset
talk only about arrays of integers.
2) Is there any way to incrementally write the dataset, rather than
having to get all the data in memory ahead of time?

···

_________________________________________________________________________________________

This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.

Jason_Sachs · July 13, 2009, 3:42pm

p.s. The problem I'm running into seems to be related to the fact that
in C, H5Dwrite takes a void * that works just fine. In Java the library
seems to be too smart and only takes in a String array and does not
allow a byte array.

This seems like it is a bug. Java certainly does not allow typecasting
from String to byte arrays, but from the standpoint of an HDF5 file they
are equivalent.

Is there a way to convert a byte array into a String, that does the
right thing w/r/t HDF5's equivalence? If I do this based on a naive
interpretation of the String constructor:

   byte[] b;
   /* assign data to b */;
   String[] s = {new String(b)}; // looks fine in the debugger
   ncsa.hdf.object.Dataset ds = ff.createScalarDS( ... );
   ds.write(s); //crashes here

, then this crashes with an EXCEPTION_ACCESS_VIOLATION.

···

-----Original Message-----
Sent: Monday, July 13, 2009 11:16 AM
To: hdf-forum@hdfgroup.org
Subject: Re: [Hdf-forum] storing a string dataset in HDF5 using Java

I'll ask a simplified set of questions to rephrase my original question
about using the high-level Object API in Java.

I would like to create a gzip-compressed string dataset in an HDF5 file
using Java, based around a datatype that is H5T_C_S1 with a known string
length. I have done it in C/C++, just trying to port to Java. The
datatype code is initialized roughly as follows.

    int tid = H5.H5Tcopy( HDF5Constants.H5T_C_S1);
    H5.H5Tset_size(tid, x);
    Datatype dtype = new H5Datatype(tid);

    FileFormat ff = g.getFileFormat();
    ncsa.hdf.object.Dataset ds = ff.createScalarDS(
        "ESPDF", g, dtype, dims, maxdims,
chunks, gzip, null);

1) How do I use FileFormat.createScalarDS with string data? The examples
in
http://www.hdfgroup.org/hdf-java-html/hdf-object/use.html#read_dataset
talk only about arrays of integers.
2) Is there any way to incrementally write the dataset, rather than
having to get all the data in memory ahead of time?

_________________________________________________________________________________________

This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.

Peter_Cao · July 14, 2009, 2:27pm

Hi Jason,

Below is the test program. It work fine for me.

2) Is there any way to incrementally write the dataset, rather than
having to get all the data in memory ahead of time?

you can set the max dim size to unlimited or some size larger than the current
dim size. you can write data incrementally. There are some C examples.

--Peter

···

========================
    public static final void testStrings(String fname) throws Exception
    {
        final int strLen = 20;
        final long dims = {5};
        final String data = new String[(int)dims[0]];
               // retrieve an instance of H5File
        final FileFormat fileFormat = FileFormat.getFileFormat(FileFormat.FILE_TYPE_HDF5);

        if (fileFormat == null)
        {
            System.err.println("Cannot find HDF5 FileFormat.");
            return;
        }

// create a new file with a given file name.
final H5File testFile = (H5File)fileFormat.create(fname);

        if (testFile == null)
        {
            System.err.println("Failed to create file:"+fname);
            return;
        }
               for (int i=0; i<data.length; i++) {
            data[i] = "test string "+i;;
        }

        // open the file and retrieve the root group
        testFile.open();
        final Group root = (Group)((javax.swing.tree.DefaultMutableTreeNode)testFile.getRootNode()).getUserObject();

        Datatype dtype = testFile.createDatatype(
            Datatype.CLASS_STRING, strLen, Datatype.NATIVE, Datatype.NATIVE);
        Dataset dataset = testFile.createScalarDS
            ("dset", root, dtype, dims, null, null, 0, data);

testFile.close();
}

Jason Sachs wrote:

I'll ask a simplified set of questions to rephrase my original question
about using the high-level Object API in Java.

I would like to create a gzip-compressed string dataset in an HDF5 file
using Java, based around a datatype that is H5T_C_S1 with a known string
length. I have done it in C/C++, just trying to port to Java. The
datatype code is initialized roughly as follows.

    int tid = H5.H5Tcopy( HDF5Constants.H5T_C_S1);
    H5.H5Tset_size(tid, x);
    Datatype dtype = new H5Datatype(tid);

    FileFormat ff = g.getFileFormat();
    ncsa.hdf.object.Dataset ds = ff.createScalarDS(
        "ESPDF", g, dtype, dims, maxdims,
chunks, gzip, null);

1) How do I use FileFormat.createScalarDS with string data? The examples
in
http://www.hdfgroup.org/hdf-java-html/hdf-object/use.html#read_dataset
talk only about arrays of integers.
2) Is there any way to incrementally write the dataset, rather than
having to get all the data in memory ahead of time?

_________________________________________________________________________________________

This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Jason_Sachs · July 14, 2009, 2:48pm

Thanks Peter, but I was looking for an example of strings with
compression, and I also don't understand how the array of strings
(String) maps to a single string in the file. (are they just
concatenated?)

I got it working by translating my C code more or less verbatim into
Java using the low-level H5.* functions.

···

-----Original Message-----
From: hdf-forum-bounces@hdfgroup.org
[mailto:hdf-forum-bounces@hdfgroup.org] On Behalf Of Peter Cao
Sent: Tuesday, July 14, 2009 10:27 AM
To: hdf-forum@hdfgroup.org
Subject: Re: [Hdf-forum] storing a string dataset in HDF5 using Java

Hi Jason,

Below is the test program. It work fine for me.

2) Is there any way to incrementally write the dataset, rather than
having to get all the data in memory ahead of time?

you can set the max dim size to unlimited or some size larger than the
current
dim size. you can write data incrementally. There are some C examples.

--Peter

========================
    public static final void testStrings(String fname) throws Exception
    {
        final int strLen = 20;
        final long dims = {5};
        final String data = new String[(int)dims[0]];

        // retrieve an instance of H5File
        final FileFormat fileFormat =
FileFormat.getFileFormat(FileFormat.FILE_TYPE_HDF5);

        if (fileFormat == null)
        {
            System.err.println("Cannot find HDF5 FileFormat.");
            return;
        }

// create a new file with a given file name.
final H5File testFile = (H5File)fileFormat.create(fname);

        if (testFile == null)
        {
            System.err.println("Failed to create file:"+fname);
            return;
        }

        for (int i=0; i<data.length; i++) {
            data[i] = "test string "+i;;
        }

        // open the file and retrieve the root group
        testFile.open();
        final Group root =
(Group)((javax.swing.tree.DefaultMutableTreeNode)testFile.getRootNode())
getUserObject();

        Datatype dtype = testFile.createDatatype(
            Datatype.CLASS_STRING, strLen, Datatype.NATIVE,
Datatype.NATIVE);
        Dataset dataset = testFile.createScalarDS
            ("dset", root, dtype, dims, null, null, 0, data);

testFile.close();
}

Jason Sachs wrote:

I'll ask a simplified set of questions to rephrase my original

question

about using the high-level Object API in Java.

I would like to create a gzip-compressed string dataset in an HDF5

file

using Java, based around a datatype that is H5T_C_S1 with a known

string

length. I have done it in C/C++, just trying to port to Java. The
datatype code is initialized roughly as follows.

    int tid = H5.H5Tcopy( HDF5Constants.H5T_C_S1);
    H5.H5Tset_size(tid, x);
    Datatype dtype = new H5Datatype(tid);

    FileFormat ff = g.getFileFormat();
    ncsa.hdf.object.Dataset ds = ff.createScalarDS(
        "ESPDF", g, dtype, dims, maxdims,
chunks, gzip, null);

1) How do I use FileFormat.createScalarDS with string data? The

examples

in
http://www.hdfgroup.org/hdf-java-html/hdf-object/use.html#read_dataset
talk only about arrays of integers.
2) Is there any way to incrementally write the dataset, rather than
having to get all the data in memory ahead of time?

________________________________________________________________________
_________________

This e-mail and the information, including any attachments, it

contains are intended to be a confidential communication only to the
person or entity to whom it is addressed and may contain information
that is privileged. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this communication is strictly prohibited. If you have
received this communication in error, please immediately notify the
sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Jason_Sachs · July 14, 2009, 2:53pm

Is there any way to search the archives of this mailing list? It doesn't
show up as searchable in Google at all, and the archive gateway
http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/ doesn't have
any search features.

···

_________________________________________________________________________________________

This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.

Peter_Cao · July 15, 2009, 2:11pm

Hi Jason,

In order to use compression, you have to specify the chunks.
Change the code I sent to you as following. The file size will
be 7KB. Without compression, the file size is 128KB.

Thanks
--pc

···

===============
    public static final void testStrings(String fname) throws Exception
    {
        final int strLen = 128;
        final long dims = {1000};
        final long chunks = {100};
................
        Dataset dataset = testFile.createScalarDS ("dset", root, dtype, dims, null, chunks, 6, data);

testFile.close();
}

Jason Sachs wrote:

Thanks Peter, but I was looking for an example of strings with
compression, and I also don't understand how the array of strings
(String) maps to a single string in the file. (are they just
concatenated?)

I got it working by translating my C code more or less verbatim into
Java using the low-level H5.* functions.

-----Original Message-----
From: hdf-forum-bounces@hdfgroup.org
[mailto:hdf-forum-bounces@hdfgroup.org] On Behalf Of Peter Cao
Sent: Tuesday, July 14, 2009 10:27 AM
To: hdf-forum@hdfgroup.org
Subject: Re: [Hdf-forum] storing a string dataset in HDF5 using Java

Hi Jason,

Below is the test program. It work fine for me.

2) Is there any way to incrementally write the dataset, rather than
having to get all the data in memory ahead of time?

you can set the max dim size to unlimited or some size larger than the current
dim size. you can write data incrementally. There are some C examples.

--Peter

========================
    public static final void testStrings(String fname) throws Exception
    {
        final int strLen = 20;
        final long dims = {5};
        final String data = new String[(int)dims[0]];
               // retrieve an instance of H5File
        final FileFormat fileFormat = FileFormat.getFileFormat(FileFormat.FILE_TYPE_HDF5);

        if (fileFormat == null)
        {
            System.err.println("Cannot find HDF5 FileFormat.");
            return;
        }

        // create a new file with a given file name.
        final H5File testFile = (H5File)fileFormat.create(fname);

        if (testFile == null)
        {
            System.err.println("Failed to create file:"+fname);
            return;
        }
               for (int i=0; i<data.length; i++) {
            data[i] = "test string "+i;;
        }

        // open the file and retrieve the root group
        testFile.open();
        final Group root = (Group)((javax.swing.tree.DefaultMutableTreeNode)testFile.getRootNode())
getUserObject();

        Datatype dtype = testFile.createDatatype(
            Datatype.CLASS_STRING, strLen, Datatype.NATIVE, Datatype.NATIVE);
        Dataset dataset = testFile.createScalarDS
            ("dset", root, dtype, dims, null, null, 0, data);

        testFile.close();
    }

Jason Sachs wrote:


I'll ask a simplified set of questions to rephrase my original


question


about using the high-level Object API in Java.

I would like to create a gzip-compressed string dataset in an HDF5


file


using Java, based around a datatype that is H5T_C_S1 with a known


string


length. I have done it in C/C++, just trying to port to Java. The
datatype code is initialized roughly as follows.

    int tid = H5.H5Tcopy( HDF5Constants.H5T_C_S1);
    H5.H5Tset_size(tid, x);
    Datatype dtype = new H5Datatype(tid);

    FileFormat ff = g.getFileFormat();
    ncsa.hdf.object.Dataset ds = ff.createScalarDS(
        "ESPDF", g, dtype, dims, maxdims,
chunks, gzip, null);

1) How do I use FileFormat.createScalarDS with string data? The


examples


in
http://www.hdfgroup.org/hdf-java-html/hdf-object/use.html#read_dataset
talk only about arrays of integers.
2) Is there any way to incrementally write the dataset, rather than
having to get all the data in memory ahead of time?

________________________________________________________________________
_________________


This e-mail and the information, including any attachments, it


contains are intended to be a confidential communication only to the
person or entity to whom it is addressed and may contain information
that is privileged. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this communication is strictly prohibited. If you have
received this communication in error, please immediately notify the
sender and destroy the original message.


Thank you.

Please consider the environment before printing this email.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

storing a string dataset in HDF5 using Java

testFile.close(); }

testFile.close(); }

testFile.close(); }

testFile.close(); }

testFile.close();
}

testFile.close();
}

testFile.close();
}

testFile.close();
}