Writing to an extendable dataset in a loop (C/C++)

Hello,
I wanted to get some advice on writing to an extendable dataset in a loop. So here’s a description of the scenario I have.

I have a data source that sends me a large arrays (5,242,880 elements per array to be precise).

Let’s say I have created an extendable dataset of rank 2 and with dim0 = H5S_UNLIMITED and dim1 = 5,242,880.

Then I start incrementally write these in a loop. In each iteration, I need to fetch my array of 5,242,880 elements from my source and then need to write it to my dataset.

In Python, this would have looked rather straightfoward like (i is my iteration variable)
dataset.resize(i+1, 0) # to extend the dataset along dimension 0
dataset[i, :] = incoming_data

Coming to the C/C++ case, I would call H5Dset_extent and then perform a hyperslab selection and then write the new data to the dataset.

I wanted to know if there is another better/faster way to do this without having to call H5Dset_extent and doing the hyperslab selection within each iteration.

In H5CPP, an easy to use high performance persistence for moden C++, this and many other cases are handled in a pythonic way; profiled and tuned. Currently the compiler assisted reflection works on linux/POSIX hosts – work in progress for Windows. Here is a short example to demonstrate h5::append capability; please see examples directory for more.
If any questions shoot me a line.

steve

	h5::fd_t fd = h5::create("example.h5",H5F_ACC_TRUNC);

	// SCALAR: integral 
	try { // centrally used error handling
		std::vector<int> stream(83);
		std::iota(std::begin(stream), std::end(stream), 1);
		// the leading dimension is extended once chunk is full, chunk is filled in row major order
		// zero copy writes directly to chunk buffer then pushed through filter chain if specified
		// works up to H5CPP_MAX_RANK default to 7
		// last chunk if partial filled with h5::fill_value<T>( some_value )  
		h5::pt_t pt = h5::create<int>(fd, "stream of integral",
				 h5::max_dims{H5S_UNLIMITED,3,5}, h5::chunk{2,3,5} | h5::gzip{9} | h5::fill_value<int>(3) );
		for( auto record : stream )
			h5::append(pt, record);
		//auto M = h5::read<arma::mat>(fd,"stream of integral" );
	} catch ( const h5::error::any& e ){
		std::cerr << "ERROR:" << e.what();
	}

Thanks for your quick reply Steven.
H5CPP sure does look like something I would heavily use. Any idea how soon the Windows version will be out?
My acquisition tool specifically needs to work in a Windows environment and therefore I do the development on Windows too.
But it looks like I would definitely try out H5CPP for my personal projects.

Thanks for the interest. Currently we: Gerd Heber(HDFGroup) and I are preparing for ISC’19 BOF with a related project; that opens up ways to run H5CPP in a massive parallel envirinment: the pHDF5 on MPI platform. So expect this feature by end of July.

Chris and others already worked out the details to make H5CPP header only work on windows, which is small modification of the header files. Currently I am away from that laptop – but will share their results once I got home.(expect a day)
As for the compiler you could use a Linux host to generate the header files for you: it takes arbitrary valid C and CPP headers; – you can trigger the compiler with a mock call, then use the generated C++ compliant files on any platform you desire.
Please also make sure to use the most recent 1.10 hdf5 capi and libs; this will give access to direct chunk write. The performance generally will be on par or better than you – as an expert in Posix IO, could do; because of the optimised IO calls: and clever caching.
Should you use 1.8 series : comment out offending lines in the property list headers.
Hope it helps:
Steve

Chris Drozdowski has provided a fix for Windows VS2017 on this branch: https://github.com/steven-varga/h5cpp/tree/vs2017-windows. Will review this addition as well as all other activity in July.

Thanks Chris, and everyone else who is pushing forward on the Windows front.

steve