Using GZIP compression does not compress, and cannot be opened by HDFView

This is a strange issue as I have been able to do this correctly in the past.

I can create a dataset with GZIP compression enabled - I.E

const std::vector<hsize_t> chunkSize = { 1, (hsize_t)nY, (hsize_t)nX };
// ...
dcpl_id = H5Pcreate(H5P_DATASET_CREATE);
status = H5Pset_deflate(dcpl_id, compressLevel); // This is 0-9, default 4
status = H5Pset_chunk(dcpl_id, 3, chunkSize.data());
// ...
dapl_id = H5Pcreate(H5P_DATASET_ACCESS);
hid_t dset_id = H5Dcreate(file_id, datasetPath.c_str(), H5T_NATIVE_FLOAT, dspace_id, lcpl_id, dcpl_id, dapl_id);

// Close handles. Will need to return later to actually write the data

Which I then close these handles. I later come back, open the dataset, and write data to it.

const std::vector<hsize_t> chunk_offset = { (hsize_t)frame_idx, (hsize_t)0, (hsize_t)0 };

// Read the data from this frame
std::vector<float> data;
data.reserve(nX * nY);
for (int j = 0; j < nY * gridSize; j = j + gridSize) {
	float* rowPtr = (float*)GetBufferRowPtr(in_buffer, j);
	for (int i = 0; i < nX * gridSize; i = i + gridSize) {
		data.push_back(rowPtr[i]);
	}
}

herr_t status;
hid_t file_id, dset_id;
// Get the file
file_id = KFH5_GetHDF5File(archivePath, false); // This just opens the specified file, optionally creating it if it doesn't exist if desired
dset_id = H5Dopen(file_id, datasetPath.c_str(), H5P_DEFAULT);

status = H5Dwrite_chunk(dset_id, H5P_DEFAULT, 0, chunk_offset.data(), nX * nY * sizeof(float), data.data());

This works fine if I comment out the line enabling GZIP compression. I can open the resulting file in HDFView and view everything as expected.

However, with the deflate line uncommented, HDFView shows that the dataset is GZIP compressed, but I have two issues. First, the dataset shows a compression ratio of 1, regardless of the compression level selected (HDFView also shows the compression level I selected correctly). Second, HDFView throws a “Filter not available” exception when trying to view the data. These are some screenshots showing the issue.

image
image

This is under Windows 10, and the application creating the file is a DLL built with Visual Studio and run through a third party application.

Am I spacing on a requirement for GZIP compression? I was under the impression the HDF library came with all the pieces needed for GZIP compression so it should work out of the box. I’m guessing I’ve got something done silly here, but I’m at a loss as to what.

Just adding that I have verified that the same HDF5 dataset also cannot be read using, I.E, H5Dump, though uncompressed datasets in the same file can.

Coordinates are stored uncompressed and can be read:

image

But the compressed datasets cannot:

image

Solved!

Replying to my own post in case anybody else has this issue and can’t find a solution like I did.

It turns out I was wrong in using H5Dwrite_chunk to write out to my file. In fact, had I read closer, the documentation even says that this bypasses a lot of the things you’re supposed to do. I’m guessing that, in doing so, I was corrupting something about my datasets.

The correct solution is to use H5Dwrite. You can retrieve the dataspace associated with your dataset using H5Dget_space, and then select the correct hyperslab to write to using H5Sselect_hyperslab. This dataspace is then given as the file_space_id in H5Dwrite.

In order to use direct chunk write you have to bring your own filtering pipeline or as you figured out just use the plain write.
The code below is from H5CPP custom pipeline, notice that each chunk has either NO_FILTER | SINGLE_FILTER| MULTIPLE_FILTERS set. An efficient implementation of the blocking mechanism is non-trivial; should consider CPU L1,L2,L3 cache sizes, incomplete chunk edges, etc… . When you don’t have the budget of developing a good implementation you are likely be better off with the off-shelf H5Dwrite and BLOSC filter.

inline void h5::impl::basic_pipeline_t::write_chunk_impl( const hsize_t* offset, size_t nbytes, const void* data ){

	size_t length = nbytes;                        // filter may change this, think of compression
	void *in = chunk0, *out=chunk1, *tmp = chunk0; // invariant: out must point to data block written
	uint32_t mask = 0x0;                           // filter mask = 0x0 all filters applied
	switch( tail ){ // tail = index pointing to queue holding filters
		case 0: // no filters, ( if blocking ) -> data == chunk0 otherwise directly from container 
			H5Dwrite_chunk( ds, dxpl, 0x0, offset, nbytes, data);
			break;
		case 1: // single filter
			length = filter[0](out, data, nbytes, flags[0], cd_size[0], cd_values[0] ) ;
			if( !length )
				mask = 1 << 0;
		default: // more than one filter
			for(hsize_t j=1; j<tail; j++){ // invariant: out == buffer holding final result
				tmp = in, in = out, out = tmp;
				length = filter[j](out,in,length, flags[j], cd_size[j], cd_values[j]);
				if( !length )
					mask |= 1 << j;
			}
			// direct write available from > 1.10.4
			H5Dwrite_chunk(ds, dxpl, mask, offset, length, out);
	}
}

1 Like