Compression filter that the employes multiple datasets ?


I may have a compression filter that works for UNstructured data. So, not so much for a nice, 2D/3D array but for a linearized list of values that represent the nodes of a 2D/3D UNstructured mesh, for example. So, the knowledge of what points in the linearized list to be compressed are “next to” each other is from a second list (like an integer nodelist).

My first thought is that the compression filter would need to interrogate another dataset in the file (the dataset with the integer nodelist) to do its work. Is that possible?

Another thought I had is that the memory type in an H5Dwrite call could include both the data to be written as well as the companion (integer nodelist) data and the compressor does its work on the data and tosses the companion data. But, how would readback work in that case???

I suppose both the data to be compressed and the companion data could be treated as a single, aggregate dataset as well. That might be ok. But, I might have many datasets all using the same companion data and I wouldn’t want to be storing that companion data multiple times for each case.

Just curious if anyone else has tackled this issue?


The implementation of HDF5-UDF shows how a dataset (variable) can “depend” on other datasets.

I would like to better understand your use case. You have two datasets, say, nodes (point list) and connectivity (integer tuples?). What exactly are you trying to achieve? You want to compress one or both? You are trying to create a “coupled” compression problem, where a joint order gives you the best joint compression ratio (with standard compression methods)? Do you have any publication for that?