storing a string dataset in HDF5 using Java

I'll ask a simplified set of questions to rephrase my original question
about using the high-level Object API in Java.

I would like to create a gzip-compressed string dataset in an HDF5 file
using Java, based around a datatype that is H5T_C_S1 with a known string
length. I have done it in C/C++, just trying to port to Java. The
datatype code is initialized roughly as follows.

    int tid = H5.H5Tcopy( HDF5Constants.H5T_C_S1);
    H5.H5Tset_size(tid, x);
    Datatype dtype = new H5Datatype(tid);
    
    FileFormat ff = g.getFileFormat();
    ncsa.hdf.object.Dataset ds = ff.createScalarDS(
        "ESPDF", g, dtype, dims, maxdims,
chunks, gzip, null);

1) How do I use FileFormat.createScalarDS with string data? The examples
in
http://www.hdfgroup.org/hdf-java-html/hdf-object/use.html#read_dataset
talk only about arrays of integers.
2) Is there any way to incrementally write the dataset, rather than
having to get all the data in memory ahead of time?

···

_________________________________________________________________________________________

This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.

p.s. The problem I'm running into seems to be related to the fact that
in C, H5Dwrite takes a void * that works just fine. In Java the library
seems to be too smart and only takes in a String array and does not
allow a byte array.

This seems like it is a bug. Java certainly does not allow typecasting
from String to byte arrays, but from the standpoint of an HDF5 file they
are equivalent.

Is there a way to convert a byte array into a String, that does the
right thing w/r/t HDF5's equivalence? If I do this based on a naive
interpretation of the String constructor:

   byte[] b;
   /* assign data to b */;
   String[] s = {new String(b)}; // looks fine in the debugger
   ncsa.hdf.object.Dataset ds = ff.createScalarDS( ... );
   ds.write(s); //crashes here

, then this crashes with an EXCEPTION_ACCESS_VIOLATION.

···

-----Original Message-----
Sent: Monday, July 13, 2009 11:16 AM
To: hdf-forum@hdfgroup.org
Subject: Re: [Hdf-forum] storing a string dataset in HDF5 using Java

I'll ask a simplified set of questions to rephrase my original question
about using the high-level Object API in Java.

I would like to create a gzip-compressed string dataset in an HDF5 file
using Java, based around a datatype that is H5T_C_S1 with a known string
length. I have done it in C/C++, just trying to port to Java. The
datatype code is initialized roughly as follows.

    int tid = H5.H5Tcopy( HDF5Constants.H5T_C_S1);
    H5.H5Tset_size(tid, x);
    Datatype dtype = new H5Datatype(tid);
    
    FileFormat ff = g.getFileFormat();
    ncsa.hdf.object.Dataset ds = ff.createScalarDS(
        "ESPDF", g, dtype, dims, maxdims,
chunks, gzip, null);

1) How do I use FileFormat.createScalarDS with string data? The examples
in
http://www.hdfgroup.org/hdf-java-html/hdf-object/use.html#read_dataset
talk only about arrays of integers.
2) Is there any way to incrementally write the dataset, rather than
having to get all the data in memory ahead of time?

_________________________________________________________________________________________

This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.

Hi Jason,

Below is the test program. It work fine for me.

2) Is there any way to incrementally write the dataset, rather than
having to get all the data in memory ahead of time?

you can set the max dim size to unlimited or some size larger than the current
dim size. you can write data incrementally. There are some C examples.

--Peter

···

========================
    public static final void testStrings(String fname) throws Exception
    {
        final int strLen = 20;
        final long[] dims = {5};
        final String[] data = new String[(int)dims[0]];
               // retrieve an instance of H5File
        final FileFormat fileFormat = FileFormat.getFileFormat(FileFormat.FILE_TYPE_HDF5);

        if (fileFormat == null)
        {
            System.err.println("Cannot find HDF5 FileFormat.");
            return;
        }

        // create a new file with a given file name.
        final H5File testFile = (H5File)fileFormat.create(fname);

        if (testFile == null)
        {
            System.err.println("Failed to create file:"+fname);
            return;
        }
               for (int i=0; i<data.length; i++) {
            data[i] = "test string "+i;;
        }

        // open the file and retrieve the root group
        testFile.open();
        final Group root = (Group)((javax.swing.tree.DefaultMutableTreeNode)testFile.getRootNode()).getUserObject();

        Datatype dtype = testFile.createDatatype(
            Datatype.CLASS_STRING, strLen, Datatype.NATIVE, Datatype.NATIVE);
        Dataset dataset = testFile.createScalarDS
            ("dset", root, dtype, dims, null, null, 0, data);

        testFile.close();
    }

Jason Sachs wrote:

I'll ask a simplified set of questions to rephrase my original question
about using the high-level Object API in Java.

I would like to create a gzip-compressed string dataset in an HDF5 file
using Java, based around a datatype that is H5T_C_S1 with a known string
length. I have done it in C/C++, just trying to port to Java. The
datatype code is initialized roughly as follows.

    int tid = H5.H5Tcopy( HDF5Constants.H5T_C_S1);
    H5.H5Tset_size(tid, x);
    Datatype dtype = new H5Datatype(tid);
    
    FileFormat ff = g.getFileFormat();
    ncsa.hdf.object.Dataset ds = ff.createScalarDS(
        "ESPDF", g, dtype, dims, maxdims,
chunks, gzip, null);

1) How do I use FileFormat.createScalarDS with string data? The examples
in
http://www.hdfgroup.org/hdf-java-html/hdf-object/use.html#read_dataset
talk only about arrays of integers.
2) Is there any way to incrementally write the dataset, rather than
having to get all the data in memory ahead of time?

_________________________________________________________________________________________

This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Thanks Peter, but I was looking for an example of strings with
compression, and I also don't understand how the array of strings
(String[]) maps to a single string in the file. (are they just
concatenated?)

I got it working by translating my C code more or less verbatim into
Java using the low-level H5.* functions.

···

-----Original Message-----
From: hdf-forum-bounces@hdfgroup.org
[mailto:hdf-forum-bounces@hdfgroup.org] On Behalf Of Peter Cao
Sent: Tuesday, July 14, 2009 10:27 AM
To: hdf-forum@hdfgroup.org
Subject: Re: [Hdf-forum] storing a string dataset in HDF5 using Java

Hi Jason,

Below is the test program. It work fine for me.

2) Is there any way to incrementally write the dataset, rather than
having to get all the data in memory ahead of time?

you can set the max dim size to unlimited or some size larger than the
current
dim size. you can write data incrementally. There are some C examples.

--Peter

========================
    public static final void testStrings(String fname) throws Exception
    {
        final int strLen = 20;
        final long[] dims = {5};
        final String[] data = new String[(int)dims[0]];
       
        // retrieve an instance of H5File
        final FileFormat fileFormat =
FileFormat.getFileFormat(FileFormat.FILE_TYPE_HDF5);

        if (fileFormat == null)
        {
            System.err.println("Cannot find HDF5 FileFormat.");
            return;
        }

        // create a new file with a given file name.
        final H5File testFile = (H5File)fileFormat.create(fname);

        if (testFile == null)
        {
            System.err.println("Failed to create file:"+fname);
            return;
        }
       
        for (int i=0; i<data.length; i++) {
            data[i] = "test string "+i;;
        }

        // open the file and retrieve the root group
        testFile.open();
        final Group root =
(Group)((javax.swing.tree.DefaultMutableTreeNode)testFile.getRootNode())
getUserObject();

        Datatype dtype = testFile.createDatatype(
            Datatype.CLASS_STRING, strLen, Datatype.NATIVE,
Datatype.NATIVE);
        Dataset dataset = testFile.createScalarDS
            ("dset", root, dtype, dims, null, null, 0, data);

        testFile.close();
    }

Jason Sachs wrote:

I'll ask a simplified set of questions to rephrase my original

question

about using the high-level Object API in Java.

I would like to create a gzip-compressed string dataset in an HDF5

file

using Java, based around a datatype that is H5T_C_S1 with a known

string

length. I have done it in C/C++, just trying to port to Java. The
datatype code is initialized roughly as follows.

    int tid = H5.H5Tcopy( HDF5Constants.H5T_C_S1);
    H5.H5Tset_size(tid, x);
    Datatype dtype = new H5Datatype(tid);
    
    FileFormat ff = g.getFileFormat();
    ncsa.hdf.object.Dataset ds = ff.createScalarDS(
        "ESPDF", g, dtype, dims, maxdims,
chunks, gzip, null);

1) How do I use FileFormat.createScalarDS with string data? The

examples

in
http://www.hdfgroup.org/hdf-java-html/hdf-object/use.html#read_dataset
talk only about arrays of integers.
2) Is there any way to incrementally write the dataset, rather than
having to get all the data in memory ahead of time?

________________________________________________________________________
_________________

This e-mail and the information, including any attachments, it

contains are intended to be a confidential communication only to the
person or entity to whom it is addressed and may contain information
that is privileged. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this communication is strictly prohibited. If you have
received this communication in error, please immediately notify the
sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Is there any way to search the archives of this mailing list? It doesn't
show up as searchable in Google at all, and the archive gateway
http://mail.hdfgroup.org/pipermail/hdf-forum_hdfgroup.org/ doesn't have
any search features.

···

_________________________________________________________________________________________

This e-mail and the information, including any attachments, it contains are intended to be a confidential communication only to the person or entity to whom it is addressed and may contain information that is privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify the sender and destroy the original message.

Thank you.

Please consider the environment before printing this email.

Hi Jason,

In order to use compression, you have to specify the chunks.
Change the code I sent to you as following. The file size will
be 7KB. Without compression, the file size is 128KB.

Thanks
--pc

···

===============
    public static final void testStrings(String fname) throws Exception
    {
        final int strLen = 128;
        final long[] dims = {1000};
        final long[] chunks = {100};
................
        Dataset dataset = testFile.createScalarDS ("dset", root, dtype, dims, null, chunks, 6, data);

        testFile.close();
    }

Jason Sachs wrote:

Thanks Peter, but I was looking for an example of strings with
compression, and I also don't understand how the array of strings
(String[]) maps to a single string in the file. (are they just
concatenated?)

I got it working by translating my C code more or less verbatim into
Java using the low-level H5.* functions.

-----Original Message-----
From: hdf-forum-bounces@hdfgroup.org
[mailto:hdf-forum-bounces@hdfgroup.org] On Behalf Of Peter Cao
Sent: Tuesday, July 14, 2009 10:27 AM
To: hdf-forum@hdfgroup.org
Subject: Re: [Hdf-forum] storing a string dataset in HDF5 using Java

Hi Jason,

Below is the test program. It work fine for me.

2) Is there any way to incrementally write the dataset, rather than
having to get all the data in memory ahead of time?

you can set the max dim size to unlimited or some size larger than the current
dim size. you can write data incrementally. There are some C examples.

--Peter

========================
    public static final void testStrings(String fname) throws Exception
    {
        final int strLen = 20;
        final long[] dims = {5};
        final String[] data = new String[(int)dims[0]];
               // retrieve an instance of H5File
        final FileFormat fileFormat = FileFormat.getFileFormat(FileFormat.FILE_TYPE_HDF5);

        if (fileFormat == null)
        {
            System.err.println("Cannot find HDF5 FileFormat.");
            return;
        }

        // create a new file with a given file name.
        final H5File testFile = (H5File)fileFormat.create(fname);

        if (testFile == null)
        {
            System.err.println("Failed to create file:"+fname);
            return;
        }
               for (int i=0; i<data.length; i++) {
            data[i] = "test string "+i;;
        }

        // open the file and retrieve the root group
        testFile.open();
        final Group root = (Group)((javax.swing.tree.DefaultMutableTreeNode)testFile.getRootNode())
getUserObject();

        Datatype dtype = testFile.createDatatype(
            Datatype.CLASS_STRING, strLen, Datatype.NATIVE, Datatype.NATIVE);
        Dataset dataset = testFile.createScalarDS
            ("dset", root, dtype, dims, null, null, 0, data);

        testFile.close();
    }

Jason Sachs wrote:
  

I'll ask a simplified set of questions to rephrase my original
    

question
  

about using the high-level Object API in Java.

I would like to create a gzip-compressed string dataset in an HDF5
    

file
  

using Java, based around a datatype that is H5T_C_S1 with a known
    

string
  

length. I have done it in C/C++, just trying to port to Java. The
datatype code is initialized roughly as follows.

    int tid = H5.H5Tcopy( HDF5Constants.H5T_C_S1);
    H5.H5Tset_size(tid, x);
    Datatype dtype = new H5Datatype(tid);
    
    FileFormat ff = g.getFileFormat();
    ncsa.hdf.object.Dataset ds = ff.createScalarDS(
        "ESPDF", g, dtype, dims, maxdims,
chunks, gzip, null);

1) How do I use FileFormat.createScalarDS with string data? The
    

examples
  

in
http://www.hdfgroup.org/hdf-java-html/hdf-object/use.html#read_dataset
talk only about arrays of integers.
2) Is there any way to incrementally write the dataset, rather than
having to get all the data in memory ahead of time?

________________________________________________________________________
_________________
  

This e-mail and the information, including any attachments, it
    

contains are intended to be a confidential communication only to the
person or entity to whom it is addressed and may contain information
that is privileged. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this communication is strictly prohibited. If you have
received this communication in error, please immediately notify the
sender and destroy the original message.
  

Thank you.

Please consider the environment before printing this email.

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org