There is a third and maybe a fourth way to handle this…
3. Do the dynamic multi-Dim array as you normally would but when you turn around and write the beast to HDF5, unravel it into a temporary buffer just before H5Dwrite. Do the opposite just after H5Dread. That involves a data copy but can work just fine if the arrays are small. Its just a bit more work to write and read. This is similar to the previous respondents suggestion to "do the indexing yourself" except you don't change anything in *your* client code except the places where you interface to HDF5.
4. You may be able to do something more elegant using either HDF5 datatypes and custom type conversion routines or HDF5 filters. My first thought is a "filter" but it would be a bit of a kluge too. You define a custom filter (see https://www.hdfgroup.org/HDF5/doc/RM/RM_H5Z.html#Compression-Register) and you *ensure* that the chunk size you specify for the filter is large enough to at least cover the top-level array of pointers in your arrays. That might be a somewhat large chunk size but so what. Then, *assuming* HDF5 always sends chunks to the filter moving through memory starting with the pointer it was handed in the H5Dwrite call, upon the first entry to your filter, you would "see" the top-level set of pointers. You would have to cache those away for safe keeping inside the filter somehow. Then with each successive chunk request that comes through the filter, you would use the cached pointer structure to go find the actual chunk being processed in memory and then turn around and pass that chunk at the output of the filter. This is kinda sorta like a "streaming copy". You don't ever have copied more than a single chunks worth of your array at any moment so its better than #3 (which is a full copy of the array), but its also a bit klugey. And, I haven't given any though to how you would do the read back either. I'm just assuming its possible. If you go the datatype route, then you would define a custom datatype for (probably each instance of such an object) and then also register your own data conversion routine (see https://www.hdfgroup.org/HDF5/doc/RM/RM_H5T.html#Datatype-Register) for it. It would work somewhat similarly I think and might even be a better way to go than a filter. However, I've never worked with that aspect of HDF5.
Hope that helps.
Mark
···
From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of huebbe <nathanael.huebbe@informatik.uni-hamburg.de<mailto:nathanael.huebbe@informatik.uni-hamburg.de>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Monday, May 9, 2016 6:13 AM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: Re: [Hdf-forum] Dynamically allocated multidimensional arrays C++
Of course, you get garbage output: You are storing the array of pointers instead of the data,
along with whatever garbage happens to be after those pointers in memory.
Trouble is, C++ simply can't do true multidimensional arrays of dynamic size.
It's not part of the language. So you basically have two options:
1. Do the indexing yourself. Declare your multidimensional array as a 1D array, and access its elements
via `data[i*dims[1] + j]`. This is a nuisance, but still feasible.
2. Use C. C99 allows true multidimensional arrays of dynamic size. So, in C, you can just write
double (*data)[dims[1]] = malloc(dims[0] * sizeof(*data));
for ( size_t i = 0; i < dims[0]; ++i )
for ( size_t j = 0; j < dims[1]; ++j )
data[i][j] = i + j;
This will layout your data in memory the way HDF5 expects it, but it's not legal C++ code of any standard.
Of course, you can also use your pointer array, and read/write the data line by line. Or you can allocate
your data as a 1D array and alias it with a pointer array to be able to access it via `data[i][j]`.
But either way, it gets dirty.
Cheers,
Nathanael Hübbe
On 05/06/2016 10:29 PM, Steven Walton wrote:
So I am noticing some interesting behavior and is wondering if there is a way around this.
I am able so assign a rank 1 array dynamically and write this to an hdf5 filetype but I do not seem to be able to do with with higher order arrays. I would like to be able to write a PPx array to h5 and retain the data integrity. More specifically I am trying to create a easy to use vector to array library <https://github.com/stevenwalton/H5Easy> that can handle multidimensional data (works with rank 1).
Let me give some examples. I will also show the typenames of the arrays.
Works:
double *a = new double[numPts]; // typename: Pd
double a[numPts]; // typename A#pts_d
double a[num1][num2]; typename:Anum1_Anum2_d
What doesn't work:
double **a = new double*[num1];
for ( size_t i = 0; i < num1; ++i )
a[i] = new double[num2];
// typename PPd
Testing the saved arrays with h5dump (and loading and reading directly) I find that if I have typename PPx (not necessarily double) I get garbage stored. Here is an example code and output from h5dump showing the behavior.
------------------------------------------------------------
compiled with h5c++ -std=c++11
------------------------------------------------------------
#include "H5Cpp.h"
using namespace H5;
#define FILE "multi.h5"
int main()
{
hsize_t dims[2];
herr_t status;
H5File file(FILE, H5F_ACC_TRUNC);
dims[0] = 4;
dims[1] = 6;
double **data = new double*[dims[0]];
for ( size_t i = 0; i < dims[0]; ++i )
data[i] = new double[dims[1]];
for ( size_t i = 0; i < dims[0]; ++i )
for ( size_t j = 0; j < dims[1]; ++j )
data[i][j] = i + j;
DataSpace dataspace = DataSpace(2,dims);
DataSet dataset( file.createDataSet( "test", PredType::IEEE_F64LE, dataspace ) );
dataset.write(data, PredType::IEEE_F64LE);
dataset.close();
dataspace.close();
file.close();
return 0;
}
------------------------------------------------------------
h5dump
------------------------------------------------------------
HDF5 "multi.h5" {
GROUP "/" {
DATASET "test" {
DATATYPE H5T_IEEE_F64LE
DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) }
DATA {
(0,0): 1.86018e-316, 1.86018e-316, 1.86018e-316, 1.86019e-316, 0,
(0,5): 3.21143e-322,
(1,0): 0, 1, 2, 3, 4, 5,
(2,0): 0, 3.21143e-322, 1, 2, 3, 4,
(3,0): 5, 6, 0, 3.21143e-322, 2, 3
}
}
}
}
------------------------------------------------------------------
As can be seen the (0,0) set is absolute garbage (except the last character which is the first number of the actual array), (0,5) is out of bounds, and has garbage data. (1,0) has always contained real data (though it should be located at (0,0)). So this seems like some addressing problem.
Is this a bug in the h5 libraries that allows me to read and write Pd data as well as Ax0_...Axn_t data but not P...Pt data? Or is this for some reason intentional? As using new is a fairly standard way to assign arrays, making P...Pt type data common, I have a hard time seeing this as intentional. In the mean time is anyone aware of a workaround to this? The data I am taking in will be dynamically allocated so I do not see a way to get Ax_... type data.
Thank you,
Steven
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
--
Please be aware that the enemies of your civil rights and your freedom
are on CC of all unencrypted communication. Protect yourself.