HDF5 string dataset compression

Hi all,

I'm very new to HDF5 and am trying to find an example of how to compress a
string written to an HDF5 file. I've been using the C++ API and writing
scalar strings to my dataset, but I can't figure out how to use data
chunking so that I can enable compression. I tried to adapt the example
provided for Java
(http://www.hdfgroup.org/hdf-java-html/javadocs/ncsa/hdf/object/h5/H5ScalarDS.html#create(java.lang.String,
ncsa.hdf.object.Group, ncsa.hdf.object.Datatype, long[], long[], long[],
int, java.lang.Object):

    std::string my_string = function_that_populates_string();
    std::string dataset_name = function_that_names_dataset();

    StrType datatype(0, H5T_VARIABLE);
    DataSpace dataspace(H5S_SCALAR);

    DSetCreatPropList plist;
    hsize_t chunk_dims[1] = {1024};
    plist.setChunk(1, chunk_dims);

    DataSet dataset = my_group.createDataSet(dataset_name, datatype,
dataspace, plist);
    dataset.write(my_string, datatype);

but it throws errors at runtime:

HDF5-DIAG: Error detected in HDF5 (1.8.6) thread 0:
  #000: H5D.c line 170 in H5Dcreate2(): unable to create dataset
    major: Dataset
    minor: Unable to initialize object
  #001: H5Dint.c line 431 in H5D_create_named(): unable to create and link
to dataset
    major: Dataset
    minor: Unable to initialize object
  #002: H5L.c line 1640 in H5L_link_object(): unable to create new link to
object
    major: Links
    minor: Unable to initialize object
  #003: H5L.c line 1866 in H5L_create_real(): can't insert link
    major: Symbol table
    minor: Unable to insert object
  #004: H5Gtraverse.c line 929 in H5G_traverse(): internal path traversal
failed
    major: Symbol table
    minor: Object not found
  #005: H5Gtraverse.c line 718 in H5G_traverse_real(): traversal operator
failed
    major: Symbol table
    minor: Callback failed
  #006: H5L.c line 1686 in H5L_link_cb(): unable to create object
    major: Object header
    minor: Unable to initialize object
  #007: H5O.c line 3005 in H5O_obj_create(): unable to open object
    major: Object header
    minor: Can't open object
  #008: H5Doh.c line 295 in H5O_dset_create(): unable to create dataset
    major: Dataset
    minor: Unable to initialize object
  #009: H5Dint.c line 1035 in H5D_create(): unable to construct layout
information
    major: Dataset

If I get rid of the provided plist (and hence the chunking), then the code
works without problem. Can anyone point me in the right direction?

Thanks!

Nathan

Hi Nathan,

···

On Tue, Dec 11, 2012 at 11:20 AM, Nathan Smith <nathanjsmith@gmail.com>wrote:

Hi all,

I'm very new to HDF5 and am trying to find an example of how to compress a
string written to an HDF5 file. I've been using the C++ API and writing
scalar strings to my dataset, but I can't figure out how to use data
chunking so that I can enable compression.

Variable-length data cannot be compressed due to the way we store it in the
HDF5 library.

Dana

I know the length of the string a priori. Can you point me to an example
of writing out a fixed length string that uses compression?

Thanks!

Nathan

···

On Tue, Dec 11, 2012 at 12:29 PM, Dana Robinson <derobins@hdfgroup.org>wrote:

Hi Nathan,

On Tue, Dec 11, 2012 at 11:20 AM, Nathan Smith <nathanjsmith@gmail.com>wrote:

Hi all,

I'm very new to HDF5 and am trying to find an example of how to compress
a string written to an HDF5 file. I've been using the C++ API and writing
scalar strings to my dataset, but I can't figure out how to use data
chunking so that I can enable compression.

Variable-length data cannot be compressed due to the way we store it in
the HDF5 library.

Dana

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Nathan,

Does the example c++/examples/h5group.cpp in our distribution help?

Binh-Minh

···

_____

From: Hdf-forum [mailto:hdf-forum-bounces@hdfgroup.org] On Behalf Of Nathan
Smith
Sent: Tuesday, December 11, 2012 1:33 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] HDF5 string dataset compression

I know the length of the string a priori. Can you point me to an example of
writing out a fixed length string that uses compression?

Thanks!

Nathan

On Tue, Dec 11, 2012 at 12:29 PM, Dana Robinson <derobins@hdfgroup.org> wrote:

Hi Nathan,

On Tue, Dec 11, 2012 at 11:20 AM, Nathan Smith <nathanjsmith@gmail.com> wrote:

Hi all,

I'm very new to HDF5 and am trying to find an example of how to compress a
string written to an HDF5 file. I've been using the C++ API and writing
scalar strings to my dataset, but I can't figure out how to use data
chunking so that I can enable compression.

Variable-length data cannot be compressed due to the way we store it in the
HDF5 library.

Dana

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

for fixed langth, just replace the H5T_VARIABLE with the length you want to set.

···

On 12/11/2012 12:32 PM, Nathan Smith wrote:

I know the length of the string a priori. Can you point me to an example of writing out a fixed length string that uses compression?

Thanks!

Nathan

On Tue, Dec 11, 2012 at 12:29 PM, Dana Robinson <derobins@hdfgroup.org > <mailto:derobins@hdfgroup.org>> wrote:

    Hi Nathan,

    On Tue, Dec 11, 2012 at 11:20 AM, Nathan Smith > <nathanjsmith@gmail.com <mailto:nathanjsmith@gmail.com>> wrote:

        Hi all,

        I'm very new to HDF5 and am trying to find an example of how
        to compress a string written to an HDF5 file. I've been using
        the C++ API and writing scalar strings to my dataset, but I
        can't figure out how to use data chunking so that I can enable
        compression.

    Variable-length data cannot be compressed due to the way we store
    it in the HDF5 library.

    Dana

    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi all,

Thanks for your suggestions. I'm still missing something, though. Using
the h5group.cpp example, I can get my code to create a chunked dataset, but
I can't write anything to it.

            std::string dataset_name = function_that_returns_name();
            std::string dataset_value = function_that_returns_value();

            // Is the size the size of the buffer, or the length of the
string? (should I include the null-terminator?)
            hsize_t dataset_length = dataset_value.length() + 1;
            StrType datatype(0, dataset_length);
            hsize_t sdims[1];
            sdims[0] = dataset_length;
            DataSpace dataspace(1, sdims);

            DSetCreatPropList plist;
            hsize_t chunk_dims[1];
            chunk_dims[0] = 24;
            plist.setChunk(1, chunk_dims);

            DataSet sim_details = group.createDataSet(dataset_name,
datatype, dataspace, plist);

            // This raises a bad pointer exception.
            sim_details.write(dataset_value, datatype);

  #009: H5FDsec2.c line 846 in H5FD_sec2_write(): file write failed: time =
Tue Dec 11 16:59:46 2012
, filename = 'test.h5', file descriptor = 3, errno = 14, error message =
'Bad address', buf = 0x7f9be0b5a1f8, size = 4115328, offset = 184320
    major: Low-level I/O
    minor: Write failed
terminate called after throwing an instance of 'H5::DataSetIException'

Nathan

···

On Tue, Dec 11, 2012 at 4:26 PM, Peter Cao <xcao@hdfgroup.org> wrote:

for fixed langth, just replace the H5T_VARIABLE with the length you want
to set.

On 12/11/2012 12:32 PM, Nathan Smith wrote:

I know the length of the string a priori. Can you point me to an example
of writing out a fixed length string that uses compression?

Thanks!

Nathan

On Tue, Dec 11, 2012 at 12:29 PM, Dana Robinson <derobins@hdfgroup.org>wrote:

Hi Nathan,

On Tue, Dec 11, 2012 at 11:20 AM, Nathan Smith <nathanjsmith@gmail.com>wrote:

Hi all,

I'm very new to HDF5 and am trying to find an example of how to
compress a string written to an HDF5 file. I've been using the C++ API and
writing scalar strings to my dataset, but I can't figure out how to use
data chunking so that I can enable compression.

Variable-length data cannot be compressed due to the way we store it in
the HDF5 library.

Dana

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.Hdf-forum@hdfgroup.orghttp://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Nathan,

I copied your code below to try and didn't get error. The dataset was
created and written. One difference is, because I didn't have your
function_that_returns_name[value]()s, I defined mine as below:

        std::string dataset_name("my dataset");

        std::string dataset_value("this dataset has one string");

Binh-Minh

···

_____

From: Hdf-forum [mailto:hdf-forum-bounces@hdfgroup.org] On Behalf Of Nathan
Smith
Sent: Tuesday, December 11, 2012 6:07 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] HDF5 string dataset compression

Hi all,

Thanks for your suggestions. I'm still missing something, though. Using
the h5group.cpp example, I can get my code to create a chunked dataset, but
I can't write anything to it.

            std::string dataset_name = function_that_returns_name();

            std::string dataset_value = function_that_returns_value();

            // Is the size the size of the buffer, or the length of the
string? (should I include the null-terminator?)

            hsize_t dataset_length = dataset_value.length() + 1;

            StrType datatype(0, dataset_length);

            hsize_t sdims[1];

            sdims[0] = dataset_length;

            DataSpace dataspace(1, sdims);

            DSetCreatPropList plist;

            hsize_t chunk_dims[1];

            chunk_dims[0] = 24;

            plist.setChunk(1, chunk_dims);

            DataSet sim_details = group.createDataSet(dataset_name,
datatype, dataspace, plist);

            // This raises a bad pointer exception.

            sim_details.write(dataset_value, datatype);

  #009: H5FDsec2.c line 846 in H5FD_sec2_write(): file write failed: time =
Tue Dec 11 16:59:46 2012

, filename = 'test.h5', file descriptor = 3, errno = 14, error message =
'Bad address', buf = 0x7f9be0b5a1f8, size = 4115328, offset = 184320

    major: Low-level I/O

    minor: Write failed

terminate called after throwing an instance of 'H5::DataSetIException'

Nathan

On Tue, Dec 11, 2012 at 4:26 PM, Peter Cao <xcao@hdfgroup.org> wrote:

for fixed langth, just replace the H5T_VARIABLE with the length you want to
set.

On 12/11/2012 12:32 PM, Nathan Smith wrote:

I know the length of the string a priori. Can you point me to an example of
writing out a fixed length string that uses compression?

Thanks!

Nathan

On Tue, Dec 11, 2012 at 12:29 PM, Dana Robinson <derobins@hdfgroup.org> wrote:

Hi Nathan,

On Tue, Dec 11, 2012 at 11:20 AM, Nathan Smith <nathanjsmith@gmail.com> wrote:

Hi all,

I'm very new to HDF5 and am trying to find an example of how to compress a
string written to an HDF5 file. I've been using the C++ API and writing
scalar strings to my dataset, but I can't figure out how to use data
chunking so that I can enable compression.

Variable-length data cannot be compressed due to the way we store it in the
HDF5 library.

Dana

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Thanks for the update. Using Binh-Minh's strings let me figure out the
problem. My dataset and chunk sizes were wrong. I was specifying that I
had dataset_value.length()+1 *strings* in the output. In fact, I have just
one. That Binh-Minh's example worked is somewhat of luck. HDF5 was
reading past the bounds of the array, which is why I was getting a bad
address exception. Changing the dataset and chunk sizes accordingly made
everything start working. Thanks everyone for your help!

For posterity, the following is the correct way to compress strings:

std::string dataset_name = function_that_returns_name();
std::string dataset_value = function_that_returns_value();

hsize_t dataspace_dims[1] = {1}; // This is the number of strings being
written
hsize_t chunk_dims[1] = {1};

StrType datatype(0, dataset_value.length()+1);
DataSpace dataspace(1, dataspace_dims);

DSetCreatPropList plist;
plist.setChunk(1, chunk_dims);
plist.setDeflate(6);

DataSet sim_details = group.createDataSet(dataset_name, datatype,
dataspace, plist);
sim_details.write(dataset_value, datatype);

Nathan

···

On Tue, Dec 11, 2012 at 9:11 PM, Binh-Minh Ribler <bmribler@hdfgroup.org>wrote:

Hi Nathan,****

** **

I copied your code below to try and didn’t get error. The dataset was
created and written. One difference is, because I didn’t have your
function_that_returns_name[value]()s, I defined mine as below:****

** **

        std::string dataset_name("my dataset");****

        std::string dataset_value("this dataset has one string");****

** **

Binh-Minh****

** **
------------------------------

*From:* Hdf-forum [mailto:hdf-forum-bounces@hdfgroup.org] *On Behalf Of *Nathan
Smith
*Sent:* Tuesday, December 11, 2012 6:07 PM

*To:* HDF Users Discussion List
*Subject:* Re: [Hdf-forum] HDF5 string dataset compression
****

** **

Hi all,****

** **

Thanks for your suggestions. I'm still missing something, though. Using
the h5group.cpp example, I can get my code to create a chunked dataset, but
I can't write anything to it.****

** **

            std::string dataset_name = function_that_returns_name(); ****

            std::string dataset_value = function_that_returns_value();****

        ****

            // Is the size the size of the buffer, or the length of the
string? (should I include the null-terminator?)****

            hsize_t dataset_length = dataset_value.length() + 1;****

            StrType datatype(0, dataset_length);****

            hsize_t sdims[1];****

            sdims[0] = dataset_length;****

            DataSpace dataspace(1, sdims);****

** **

            DSetCreatPropList plist;****

            hsize_t chunk_dims[1];****

            chunk_dims[0] = 24;****

            plist.setChunk(1, chunk_dims);****

** **

            DataSet sim_details = group.createDataSet(dataset_name,
datatype, dataspace, plist);****

** **

            // This raises a bad pointer exception.****

            sim_details.write(dataset_value, datatype);****

** **

** **

  #009: H5FDsec2.c line 846 in H5FD_sec2_write(): file write failed: time
= Tue Dec 11 16:59:46 2012****

, filename = 'test.h5', file descriptor = 3, errno = 14, error message =
'Bad address', buf = 0x7f9be0b5a1f8, size = 4115328, offset = 184320****

    major: Low-level I/O****

    minor: Write failed****

terminate called after throwing an instance of 'H5::DataSetIException'****

** **

Nathan****

** **

** **

On Tue, Dec 11, 2012 at 4:26 PM, Peter Cao <xcao@hdfgroup.org> wrote:****

for fixed langth, just replace the H5T_VARIABLE with the length you want
to set.****

** **

On 12/11/2012 12:32 PM, Nathan Smith wrote:****

I know the length of the string a priori. Can you point me to an example
of writing out a fixed length string that uses compression? ****

** **

Thanks!****

** **

Nathan****

On Tue, Dec 11, 2012 at 12:29 PM, Dana Robinson <derobins@hdfgroup.org>
wrote:****

Hi Nathan,****

On Tue, Dec 11, 2012 at 11:20 AM, Nathan Smith <nathanjsmith@gmail.com>
wrote:****

Hi all, ****

** **

I'm very new to HDF5 and am trying to find an example of how to compress a
string written to an HDF5 file. I've been using the C++ API and writing
scalar strings to my dataset, but I can't figure out how to use data
chunking so that I can enable compression. ****

** **

Variable-length data cannot be compressed due to the way we store it in
the HDF5 library.****

** **

Dana ****

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org****

** **

****

_______________________________________________****

Hdf-forum is for HDF software users discussion.****

Hdf-forum@hdfgroup.org****

http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org****

** **

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org****

** **

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org