Replace 64-bit floats by 32-bit floats using h5tools

Hi there!

I’ve made a bash script that uses h5tools 1.14.2 (h5repack and h5ls) to generate a new HDF5 file with new chunk layout. Is it possible to use h5tools (like h5repack) to also replace 64-bit floats by 32-bit floats (so that it consumes less memory space)?

Thank you!

Hi @ailer.gen,

Not sure how the conversion can be done using one of the h5tools, but you could easily do this using HDFql, a high-level (declarative) programming language. Here is an example that illustrates how this could be done in C using HDFql:

// declare variables
char script[1024];
char *dataset_name;
int data_type;

// use (i.e. open) HDF5 file "test.h5"
hdfql_execute("USE FILE test.h5");

// get all datasets stored in the HDF5 file (in a recursive fashion)
hdfql_execute("SHOW DATASET LIKE **");

// register variable "data_type" for subsequent use (by HDFql)
hdfql_variable_register(&data_type);

// loop through cursor (containing all the datasets found)
while(hdfql_cursor_next(NULL))
{
	// get dataset name from cursor
	dataset_name = hdfql_cursor_get_char(NULL);

	// prepare script to get the data type of the dataset currently being processed
	strcpy(script, "SHOW DATA TYPE \"%s\" INTO MEMORY 0", dataset_name);

	// execute script
	hdfql_execute(script);

	// check if data type is double (i.e. 64 bit)
	if (data_type == HDFQL_DOUBLE)
	{
	
		// prepare script to read dataset currently being processed, convert its data into a float (i.e. 32 bit), and overwrite the dataset with the result of the conversion
		strcpy(script, "SELECT FROM CAST(\"%s\" AS FLOAT) INTO TRUNCATE \"%s\"", dataset_name, dataset_name);

		// execute script
		hdfql_execute(script);
	}
}

// unregister variable "data_type" as it is no longer used (by HDFql)
hdfql_variable_unregister(&data_type);

// close HDF5 file
hdfql_execute("CLOSE FILE");

Unfortunately, the WHERE option in the SHOW operation is currently broken, which would have simplified the above example. This will be fixed in the next release of HDFql. With this fix, the example becomes as follows:

// declare variables
char script[1024];
char *dataset_name;

// use (i.e. open) HDF5 file "test.h5"
hdfql_execute("USE FILE test.h5");

// get all datasets stored in the HDF5 file (in a recursive fashion) that are of data type double (i.e. 64 bit)
hdfql_execute("SHOW DATASET LIKE ** WHERE DATA TYPE == DOUBLE");

// loop through cursor (containing all the datasets found)
while(hdfql_cursor_next(NULL))
{
	// get dataset name from cursor
	dataset_name = hdfql_cursor_get_char(NULL);

	// prepare script to read dataset currently being processed, convert its data into a float (i.e. 32 bit), and overwrite the dataset with the result of the conversion
	strcpy(script, "SELECT FROM CAST(\"%s\" AS FLOAT) INTO TRUNCATE \"%s\"", dataset_name, dataset_name);

	// execute script
	hdfql_execute(script);
}

// close HDF5 file
hdfql_execute("CLOSE FILE");

FYI, besides C, HDFql also supports C++, C#, Java, Python, Fortran and R.

Hope it helps!

1 Like

I don’t know if I’ll be able to use HDFql since we need to keep things simple and install the minimum amount of things, that’s why we’re using a bash script. But if there’s no other way, I’ll talk to my colleagues about it. Thank you for your reply.

Hi @ailer.gen,

h5repack doesn’t currently have an option to do this, but I believe it’s a perfectly reasonable use case. I’d suggest opening a feature request at HDFGroup/hdf5 · Discussions · GitHub. In the meantime, you could create a very small program that preprocesses the file by opening it, opening a dataset you want to convert, creating a new dataset with a 32-bit float datatype, using H5Dread and H5Dwrite to transfer the data from the old dataset into the new one and then deleting the old dataset. If you want to keep the same dataset name, you can use H5Lmove to either rename the old dataset before creating the new one, or to rename the new dataset after deleting the old one. Using h5repack on the file after doing this preprocessing should free up the extra space used by the old dataset.

2 Likes

Thanks a lot for your reply. I’ll gladly open a feature request at the link provided. I’ll talk with my colleagues about using H5Dread and H5Dwrite, but we’ll probably wait for the feature to be implemented. I suppose there isn’t another tool besides h5repack which is also part of h5tools and is able to convert from 64-bit to 32-bit floats, right?

Correct, there is no tool currently that can accomplish what you’re looking to do, but this seems like a reasonable use case for h5repack to include.

1 Like
# Assuming your double variable is named 'data':
ncap2 -s 'data=float(data)' input.nc output.nc

Credit: Google Gemini

See also NCO netCDF Operators / Discussion / Help: ncap2 convert type double to float error (sourceforge.net).

1 Like

I don’t know this ncap2, I’ll search about it, thanks for your reply.