Continuous while-loop data set extension and recording

manguita · March 25, 2020, 1:26pm

Hello,

I am currently trying to implement a while loop that will be constantly adding data to a three dimensional cube as it becomes available from a parallel thread. Because the period of time that this program can run for is indefinite, there is no preset size to the cube, and must therefore be constantly expanding. It consists of 2 vectors of 256 data points, and N (indefinite) rows.

I know that my data set extension is working well, because it will generate a cube of whatever size I choose when I leave out the “writing” portion of my code. (In the code I have limited the size to simply 10 rows using i = 9).

However when I try to write data, for some reason it throws an exception, and the console reads out the following:

HDF5-DIAG: Error detected in HDF5 (1.10.6) thread 0:
#000: C:\autotest\hdf5110-StdRelease-dist-10vs16\build\hdfsrc\src\H5Dio.c line 336 in H5Dwrite(): can’t write data
major: Dataset
minor: Write failed

#001: C:\autotest\hdf5110-StdRelease-dist-10vs16\build\hdfsrc\src\H5Dio.c line 722 in H5D__write(): src and dest dataspaces have different number of elements selected
major: Invalid arguments to routine
minor: Bad value

My code is as follows:

//Creating 3D data cube
const int rank = 3; 

// Create a new file using the default property lists. 
H5File file(name, H5F_ACC_TRUNC);

// Modify dataset creation properties to enable chunking
DSetCreatPropList prop;
hsize_t      chunk_dims[rank] = { 1, 256, 1};
prop.setChunk(rank, chunk_dims);

// Create the data space for the dataset of raw voltages (large cube dset).
hsize_t dims[rank];     //dataset rank
dims[0] = 1;		    //i dimmension: number of runs
dims[1] = 256;			//j dimmension: number of data points per run
dims[2] = 2;			//k dimmension: on/off runs

hsize_t maxdims[3] = { H5S_UNLIMITED, 256, 2};
DataSpace dataspace(rank, dims, maxdims);

// Create the dataset for raw voltages.      
DataSet dataset = file.createDataSet(dataName, PredType::NATIVE_DOUBLE, dataspace, prop);

//Location to insert data
hsize_t		offset[3];
offset[0] = 0;
offset[1] = 0;
offset[2] = 0;

//Size of dataset as it is extending
hsize_t		size[3];
size[0] = 1;
size[1] = 256; 
size[2] = 2;

//Dimensions of hyperslab to be written to
hsize_t		dims2[3];
dims2[0] = 1;
dims2[1] = 256;
dims2[2] = 1;

DataSpace fspace(2, dims2, NULL);
int i = 0;
while (i < 9) {
	
	//break if ESCAPE is hit
	if (GetAsyncKeyState(VK_ESCAPE))
	{
		return 1;
		break;
	}
	
	//Extend dataset size to accomodate incoming data
	dataset.extend(size);

	//Select a hyperslab.
	fspace = dataset.getSpace();		
	fspace.selectHyperslab(H5S_SELECT_SET, dims2, offset);

	//Write the data to the hyperslab.
	dataset.write(data, PredType::NATIVE_DOUBLE, fspace, dataspace);
	
	//Increase size  and offset by 1 to create and write to a new row
	size[0] += 1;
	offset[0] += 1;
	i += 1;
}

I appreciate your help!

Marty

steven · March 26, 2020, 1:15am

Hi Marty,

You might want to check out H5CPP, here are the ISC’19 presentation slides, and the example you asked for is added to the repo, profiled and tuned:

#include <Eigen/Dense> // armadillo, blitz,blaze, itpp, dlib [...] also supported 
#include <h5cpp/all>

template<class T> using Matrix  
   = Eigen::Matrix<T, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor>;

int main(){

   h5::fd_t fd = h5::create("example.h5",H5F_ACC_TRUNC);
   size_t nrows = 2, ncols=256, nframes=100;
   // create a dataset h5::ds_t, which is convertible to h5::pt_t packet table 
   h5::pt_t pt = h5::create<double>(fd, "stream of matrices",
      h5::max_dims{H5S_UNLIMITED,nrows,ncols}, // you asked for extendable dataset
      h5::chunk{1,nrows,ncols} ); // chunks specify packet table internal buffering

   Matrix<double> M(nrows,ncols); // <-- this is your frame

   // actual code, you may insert arbitrary number of frames: nrows x ncols
   for( int i = 0; i < nframes; i++)
   // when internal buffer reaches specified chunk, dataset is extended and the bucket 
   // is dumped
      h5::append( pt, M); 
   // RAII enabled descriptors close when leaving codeblock
}

best:steve

manguita · March 26, 2020, 1:32am

Ok awesome, thanks a lot. from what I understand reading this quickly is that you can append a matrix “M” to a data set “pt”, instead of extending, and then having to select a hyperslab.

thanks for the quick response, I will give this a try!

Marty

steven · March 26, 2020, 3:57am

Correct: h5:pt_t descriptor hides something similar you posted: it will extend the underlying dataset when required, and does the correct selection for you.
Most popular linear algebra packages are supported, as well as std::vector<T> with compiler assisted reflection arbitrary deep POD types can be persisted without you needing to write additional code.

The zero copy mechanism minimises unnecessary IO, data transfer. Since all linear algebra systems with BLAS/LAPACK behave similarly grabbing the RW pointer to the containers is trivial.
Of course the mechanism works with typed pointers as well.

H5CPP is tested on Intel DPC++ v2021(beta), Intel 19.1.0.166, g++-7, g++-8, g++-9, clang++-6.0, clang++-7, clang++-8, clang++-9, clang++-10 PGI is coming soon.

Link to the new documentation

steve

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Continuous while-loop data set extension and recording