Compression performance for compound datatype

Hi Zane,

The compression ratio depends largely on the redundancy in the data. For example, setting all the integers with 0 will help a great deal. If your data is noisy, then the ratio will be poor at best. Sometimes sorting or some custom preprocessing (shuffling) will help.

My guess is that compound will yield a worse compression ratio compared to separate arrays of int, long, and double. In case of compounds, HDF5 does allow shuffling parameters but I've never tried them.

With kind regards,

Pieter

"Zhengying Wang" <zhengying.wang@oxam.com> 08/08/08 11:01 AM >>>

Hello,

I just wrote a test program to test the compression performance of hdf
file.
There are 2 datasets in the test file. The definition of the dataset is
as follows,
Dataset1
{
  unsigned long long item1;
  unsigned int item2;
  int item3;
  int item4;
  unsigned long long item5;
};

Dataset2
{
  int item1;
  double item2;
};

To the test file, there are 62847260 records to Dataset1, and 831136075
records to Dataset2. Also the file is chunked with size 262144 and
compressed with ratio 9. The file size is 200204605 bytes.

In theory, the struct size of Dataset1 is 28 and 12 to Dataset2. The
size of the datasets should be:

62847260*28 + 831136075*12 = 3143421588

The compression ratio seems to be just 1.57?

Any ideas what's going on here? How will compound datatype affect the
compression performance?

Any help would be appreciated.

Thanks a lot,
Zane

···

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.