Best way to repackage a dataset? (C program)

Hello everyone,

I am working with an HDF5 file that has a 5D dataset. What I'm wanting to
do is to create a C program that reads this dataset into memory and then
outputs it into a newly created file with only that dataset in it (perhaps
at the root directory of the file tree). What I don't understand is how to
read this entire 5D array using H5Dread into a 5D buffer that has been
previously allocated on the heap (note I cannot use an array allocated on
the stack, it would be too large and would create seg faults).

What is the general process I need to employ to do such a thing, and is
there maybe a more elegant solution to this than reading the entire dataset
into memory? This process seems easy to me for a 1 or 2D array but I am
lost with larger dimension arrays. Thanks.

Regards,
Landon

You might look at h5copy as a reference, or just use that tool to do the work for you.

Jarom

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Landon Clipp
Sent: Friday, September 02, 2016 11:56 AM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] Best way to repackage a dataset? (C program)

Hello everyone,

I am working with an HDF5 file that has a 5D dataset. What I'm wanting to do is to create a C program that reads this dataset into memory and then outputs it into a newly created file with only that dataset in it (perhaps at the root directory of the file tree). What I don't understand is how to read this entire 5D array using H5Dread into a 5D buffer that has been previously allocated on the heap (note I cannot use an array allocated on the stack, it would be too large and would create seg faults).

What is the general process I need to employ to do such a thing, and is there maybe a more elegant solution to this than reading the entire dataset into memory? This process seems easy to me for a 1 or 2D array but I am lost with larger dimension arrays. Thanks.

Regards,
Landon

You can take a look at the following source files. They are mean for C++ and templates but assuming you know a bit of C++ you can convert them back to pure "C" without any issues. The template parameter is on the POD type.

Take a look at H5Lite.h and H5Lite.cpp. There are functions in there to "readPointerDataset()", writePointerDataSet() and getDatasetInfo().

The basic flow would be the following (using some pure "C").

// Open the file and get the "Location ID"
hid_t fileId = ...

char* datasetName = ....
//Since you know it is a 5D array:
hsize_t_t dims[5];
H5T_class_t classType;
size_t type_size;
H5Lite::getDatasetInfo(fileId, datasetName, dims, classType, typesize);

// Now loop over all the dim[] values to compute the total number
// of elements that need to allocate, lets assume they are 32 bit
// signed ints
size_t totalElements = dims[0] * dims[1] * dims[2] * dims[3] * dims[4];
// Allocate the data
signed int* dataPtr = malloc(totalElements * sizeof(signed int));

herr_t err = H5Lite::readPointerDataset(fileId, datasetName, dataPtr);
// Check error
if (err < 0) { ..... }

// Open New file for writing
hid_t outFileId = ...
signed int rank = 5;
err = H5Lite::writePointerDataset(outFileid, datasetName, rank, dims, dataPtr);
// Check error
if (err < 0) { ..... }

This assumes that you take the code from GitHub and convert the necessary functions into pure "C" which should be straight forward to do.

The code referenced above is BSD licensed.

···

--
Michael A. Jackson
BlueQuartz Software, LLC
[e]: mike.jackson@bluequartz.net

Nelson, Jarom wrote:

You might look at h5copy as a reference, or just use that tool to do the
work for you.

Jarom

*From:*Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] *On
Behalf Of *Landon Clipp
*Sent:* Friday, September 02, 2016 11:56 AM
*To:* hdf-forum@lists.hdfgroup.org
*Subject:* [Hdf-forum] Best way to repackage a dataset? (C program)

Hello everyone,

I am working with an HDF5 file that has a 5D dataset. What I'm wanting
to do is to create a C program that reads this dataset into memory and
then outputs it into a newly created file with only that dataset in it
(perhaps at the root directory of the file tree). What I don't
understand is how to read this entire 5D array using H5Dread into a 5D
buffer that has been previously allocated on the heap (note I cannot use
an array allocated on the stack, it would be too large and would create
seg faults).

What is the general process I need to employ to do such a thing, and is
there maybe a more elegant solution to this than reading the entire
dataset into memory? This process seems easy to me for a 1 or 2D array
but I am lost with larger dimension arrays. Thanks.

Regards,

Landon

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hello,

Thank you everyone for your help. I figured out the problem, I was just
misunderstanding how the functions worked. I was able to successfully read
the dataset into a buffer. I did not realize that a 1D array was
sufficient, I was for some reason thinking that it had to be a contiguous
multidimensional array but it turns out that the functions know how to read
the arrays if you give it the rank and the size of each dimension.

Turns out I have another problem however. I am trying to now write this
buffer into a new file. The error happens when I try to create a new
dataset. When I ran my code, I got errors such as: "H5D.c line 194 in
H5Dcreate2(): unable to create dataset." I looked online and it turns out
that there is a size limit to the buffer and mine most certainly exceeds
that. So the solution is to create a dataset creation property list and set
it to chunk. Even after I have set a reasonable chunk size, I still get the
same errors. I will attach my code and the errors I am receiving. Relevant
code starts at line 122. Thank you SO MUCH for your help, I'm still trying
to learn all of this.

Landon

error.txt (1017 Bytes)

testcode.c (5.05 KB)

···

On Sat, Sep 3, 2016 at 11:45 AM, Michael Jackson < mike.jackson@bluequartz.net> wrote:

You can take a look at the following source files. They are mean for C++
and templates but assuming you know a bit of C++ you can convert them back
to pure "C" without any issues. The template parameter is on the POD type.

https://github.com/BlueQuartzSoftware/SIMPL/tree/develop/Source/H5Support

Take a look at H5Lite.h and H5Lite.cpp. There are functions in there to
"readPointerDataset()", writePointerDataSet() and getDatasetInfo().

The basic flow would be the following (using some pure "C").

// Open the file and get the "Location ID"
hid_t fileId = ...

char* datasetName = ....
//Since you know it is a 5D array:
hsize_t_t dims[5];
H5T_class_t classType;
size_t type_size;
H5Lite::getDatasetInfo(fileId, datasetName, dims, classType, typesize);

// Now loop over all the dim[] values to compute the total number
// of elements that need to allocate, lets assume they are 32 bit
// signed ints
size_t totalElements = dims[0] * dims[1] * dims[2] * dims[3] * dims[4];
// Allocate the data
signed int* dataPtr = malloc(totalElements * sizeof(signed int));

herr_t err = H5Lite::readPointerDataset(fileId, datasetName, dataPtr);
// Check error
if (err < 0) { ..... }

// Open New file for writing
hid_t outFileId = ...
signed int rank = 5;
err = H5Lite::writePointerDataset(outFileid, datasetName, rank, dims,
dataPtr);
// Check error
if (err < 0) { ..... }

This assumes that you take the code from GitHub and convert the necessary
functions into pure "C" which should be straight forward to do.

The code referenced above is BSD licensed.

--
Michael A. Jackson
BlueQuartz Software, LLC
[e]: mike.jackson@bluequartz.net

Nelson, Jarom wrote:

You might look at h5copy as a reference, or just use that tool to do the
work for you.

Jarom

*From:*Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] *On
Behalf Of *Landon Clipp
*Sent:* Friday, September 02, 2016 11:56 AM
*To:* hdf-forum@lists.hdfgroup.org
*Subject:* [Hdf-forum] Best way to repackage a dataset? (C program)

Hello everyone,

I am working with an HDF5 file that has a 5D dataset. What I'm wanting
to do is to create a C program that reads this dataset into memory and
then outputs it into a newly created file with only that dataset in it
(perhaps at the root directory of the file tree). What I don't
understand is how to read this entire 5D array using H5Dread into a 5D
buffer that has been previously allocated on the heap (note I cannot use
an array allocated on the stack, it would be too large and would create
seg faults).

What is the general process I need to employ to do such a thing, and is
there maybe a more elegant solution to this than reading the entire
dataset into memory? This process seems easy to me for a 1 or 2D array
but I am lost with larger dimension arrays. Thanks.

Regards,

Landon

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi,
Maybe I misunderstood the requirements, but if you want to just copy a
dataset to another file, why not just use H5Ocopy? It allows you to use a
different file as the destination. Could be a lot faster and simpler than
loading the data into memory.

https://www.hdfgroup.org/HDF5/doc/RM/RM_H5O.html#Object-Copy

Cheers,
Martijn

Hello,

Thank you everyone for your help. I figured out the problem, I was just
misunderstanding how the functions worked. I was able to successfully read
the dataset into a buffer. I did not realize that a 1D array was
sufficient, I was for some reason thinking that it had to be a contiguous
multidimensional array but it turns out that the functions know how to read
the arrays if you give it the rank and the size of each dimension.

Turns out I have another problem however. I am trying to now write this
buffer into a new file. The error happens when I try to create a new
dataset. When I ran my code, I got errors such as: "H5D.c line 194 in
H5Dcreate2(): unable to create dataset." I looked online and it turns out
that there is a size limit to the buffer and mine most certainly exceeds
that. So the solution is to create a dataset creation property list and set
it to chunk. Even after I have set a reasonable chunk size, I still get the
same errors. I will attach my code and the errors I am receiving. Relevant
code starts at line 122. Thank you SO MUCH for your help, I'm still trying
to learn all of this.

Landon

···

On 4 Sep 2016 14:50, "Landon Clipp" <clipp2@illinois.edu> wrote:

On Sat, Sep 3, 2016 at 11:45 AM, Michael Jackson < mike.jackson@bluequartz.net> wrote:

You can take a look at the following source files. They are mean for C++
and templates but assuming you know a bit of C++ you can convert them back
to pure "C" without any issues. The template parameter is on the POD type.

https://github.com/BlueQuartzSoftware/SIMPL/tree/develop/Source/H5Support

Take a look at H5Lite.h and H5Lite.cpp. There are functions in there to
"readPointerDataset()", writePointerDataSet() and getDatasetInfo().

The basic flow would be the following (using some pure "C").

// Open the file and get the "Location ID"
hid_t fileId = ...

char* datasetName = ....
//Since you know it is a 5D array:
hsize_t_t dims[5];
H5T_class_t classType;
size_t type_size;
H5Lite::getDatasetInfo(fileId, datasetName, dims, classType, typesize);

// Now loop over all the dim[] values to compute the total number
// of elements that need to allocate, lets assume they are 32 bit
// signed ints
size_t totalElements = dims[0] * dims[1] * dims[2] * dims[3] * dims[4];
// Allocate the data
signed int* dataPtr = malloc(totalElements * sizeof(signed int));

herr_t err = H5Lite::readPointerDataset(fileId, datasetName, dataPtr);
// Check error
if (err < 0) { ..... }

// Open New file for writing
hid_t outFileId = ...
signed int rank = 5;
err = H5Lite::writePointerDataset(outFileid, datasetName, rank, dims,
dataPtr);
// Check error
if (err < 0) { ..... }

This assumes that you take the code from GitHub and convert the necessary
functions into pure "C" which should be straight forward to do.

The code referenced above is BSD licensed.

--
Michael A. Jackson
BlueQuartz Software, LLC
[e]: mike.jackson@bluequartz.net

Nelson, Jarom wrote:

You might look at h5copy as a reference, or just use that tool to do the
work for you.

Jarom

*From:*Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] *On
Behalf Of *Landon Clipp
*Sent:* Friday, September 02, 2016 11:56 AM
*To:* hdf-forum@lists.hdfgroup.org
*Subject:* [Hdf-forum] Best way to repackage a dataset? (C program)

Hello everyone,

I am working with an HDF5 file that has a 5D dataset. What I'm wanting
to do is to create a C program that reads this dataset into memory and
then outputs it into a newly created file with only that dataset in it
(perhaps at the root directory of the file tree). What I don't
understand is how to read this entire 5D array using H5Dread into a 5D
buffer that has been previously allocated on the heap (note I cannot use
an array allocated on the stack, it would be too large and would create
seg faults).

What is the general process I need to employ to do such a thing, and is
there maybe a more elegant solution to this than reading the entire
dataset into memory? This process seems easy to me for a 1 or 2D array
but I am lost with larger dimension arrays. Thanks.

Regards,

Landon

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5