GZIP Compression and h5repack

Hi,

I'm trying to create a hdf5 file with some compound datatypes with GZIP
compression. The development is done in C# using the HDF5DotNet dll.
I need these compression options: shuffle & gzip=9 and I would like to
achieve the same compression ration as h5repack.

The problem however is that the compressed file is the same size as the not
compressed file. If I use h5repack on that file, this size is 10 times
smaller. Can someone see what I am doing wrong?

Part of my implementation:

// We want to write a compound datatype, which is a struct containing a int
and some bytes
  DataStruct[] data = new DataStruct[]{...}; //data has been filled

// Create the compound datatype for memory
H5DataTypeId dataTypeMem = H5T.create(H5T.CreateClass.COMPOUND,
(int)Marshal.SizeOf(default(DataStruct)));
  H5T.insert(dataTypeMem, "A", (int)Marshal.OffsetOf(typeof( DataStruct ),
"A"), H5T.H5Type.NATIVE_INT);
H5T.insert(dataTypeMem, "B", (int)Marshal.OffsetOf(typeof( DataStruct ),
"B"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "C", (int)Marshal.OffsetOf(typeof( DataStruct ),
"C"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "D", (int)Marshal.OffsetOf(typeof( DataStruct ),
"D"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "E", (int)Marshal.OffsetOf(typeof( DataStruct ),
"E"), H5T.H5Type.NATIVE_UCHAR);

// Create the compound datatype for the file. Because the standard
// types we are using for the file may have different sizes than
// the corresponding native types, we must manually calculate the
// offset of each member.
int offset = 0;
H5DataTypeId dataTypeFile = H5T.create(H5T.CreateClass.COMPOUND, (int)(4 + 1
+ 1 + 1 + 1));
H5T.insert(dataTypeFile, "A", offset, H5T.H5Type.STD_U32BE);
offset += 4;
H5T.insert(dataTypeFile, "B", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "C", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "D", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "E", offset, H5T.H5Type.STD_U8BE);
offset += 1;

long[] dims = { (long) data.Count() };

try
{
  // Create dataspace, with maximum = current
H5DataSpaceId dataSpace = H5S.create_simple(1, dims);

//Create compression properties
  long[] chunk = dims; //What value should be used as chunk?
H5PropertyListId compressProperty =
H5P.create(H5P.PropertyListClass.DATASET_CREATE);
H5P.setShuffle(compressProperty);
  H5P.setDeflate(compressProperty, 9)
H5P.setChunk(compressProperty, chunk);

// Create the data set
  H5DataSetId dataSet = H5D.create(fileId, "NAME", dataTypeFile, dataSpace,
new H5PropertyListId(H5P.Template.DEFAULT), compressProperty, new
H5PropertyListId(H5P.Template.DEFAULT));

// Write data to it
H5D.write(dataSet, dataTypeMem, new H5DataSpaceId(H5S.H5SType.ALL), new
H5DataSpaceId(H5S.H5SType.ALL), new H5PropertyListId(H5P.Template.DEFAULT),
new H5Array< DataStruct >(data));

// Cleanup
H5T.close(dataTypeMem);
H5T.close(dataTypeFile);
H5D.close(dataSet);
H5P.close(compressProperty);
H5S.close(dataSpace);
}
catch
{
...
}

All steps are: creating datatype for both file and memory, creating
dataspace, creating the dataset with shuffle and compression creation
properties and finally writing the data to file.
It might be a bit difficult to check this code, but are there any steps
missing/incorrect?

Help appreciated.

Best regards,

Bas Schoen

Hi Bas,

I have a couple questions.
   1. About the size differences between h5repack and your code, is it the size of HDF5 file or dataset?
   2. About the chunk, what the size of chunk used for h5repack and your code?

Jonathan

···

On 7/7/2011 10:00 AM, Bas Schoen wrote:

Hi,

I'm trying to create a hdf5 file with some compound datatypes with GZIP compression. The development is done in C# using the HDF5DotNet dll.
I need these compression options: shuffle & gzip=9 and I would like to achieve the same compression ration as h5repack.

The problem however is that the compressed file is the same size as the not compressed file. If I use h5repack on that file, this size is 10 times smaller. Can someone see what I am doing wrong?

Part of my implementation:

// We want to write a compound datatype, which is a struct containing a int and some bytes
  DataStruct[] data = new DataStruct[]{...}; //data has been filled

// Create the compound datatype for memory
H5DataTypeId dataTypeMem = H5T.create(H5T.CreateClass.COMPOUND, (int)Marshal.SizeOf(default(DataStruct)));
  H5T.insert(dataTypeMem, "A", (int)Marshal.OffsetOf(typeof( DataStruct ), "A"), H5T.H5Type.NATIVE_INT);
H5T.insert(dataTypeMem, "B", (int)Marshal.OffsetOf(typeof( DataStruct ), "B"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "C", (int)Marshal.OffsetOf(typeof( DataStruct ), "C"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "D", (int)Marshal.OffsetOf(typeof( DataStruct ), "D"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "E", (int)Marshal.OffsetOf(typeof( DataStruct ), "E"), H5T.H5Type.NATIVE_UCHAR);

// Create the compound datatype for the file. Because the standard
// types we are using for the file may have different sizes than
// the corresponding native types, we must manually calculate the
// offset of each member.
int offset = 0;
H5DataTypeId dataTypeFile = H5T.create(H5T.CreateClass.COMPOUND, (int)(4 + 1 + 1 + 1 + 1));
H5T.insert(dataTypeFile, "A", offset, H5T.H5Type.STD_U32BE);
offset += 4;
H5T.insert(dataTypeFile, "B", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "C", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "D", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "E", offset, H5T.H5Type.STD_U8BE);
offset += 1;

long[] dims = { (long) data.Count() };

try
{
  // Create dataspace, with maximum = current
H5DataSpaceId dataSpace = H5S.create_simple(1, dims);

//Create compression properties
  long[] chunk = dims; //What value should be used as chunk?
H5PropertyListId compressProperty = H5P.create(H5P.PropertyListClass.DATASET_CREATE);
H5P.setShuffle(compressProperty);
  H5P.setDeflate(compressProperty, 9)
H5P.setChunk(compressProperty, chunk);

// Create the data set
  H5DataSetId dataSet = H5D.create(fileId, "NAME", dataTypeFile, dataSpace, new H5PropertyListId(H5P.Template.DEFAULT), compressProperty, new H5PropertyListId(H5P.Template.DEFAULT));

// Write data to it
H5D.write(dataSet, dataTypeMem, new H5DataSpaceId(H5S.H5SType.ALL), new H5DataSpaceId(H5S.H5SType.ALL), new H5PropertyListId(H5P.Template.DEFAULT), new H5Array< DataStruct >(data));

// Cleanup
H5T.close(dataTypeMem);
H5T.close(dataTypeFile);
H5D.close(dataSet);
H5P.close(compressProperty);
H5S.close(dataSpace);
}
catch
{
...
}

All steps are: creating datatype for both file and memory, creating dataspace, creating the dataset with shuffle and compression creation properties and finally writing the data to file.
It might be a bit difficult to check this code, but are there any steps missing/incorrect?

Help appreciated.

Best regards,

Bas Schoen

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Jonathan,

Thanks for your reply.

1. The size difference is between the file sizes of two HDF5 files. One with
compression with my code, and one with repack. (and an original not
compressed file which is the same size as with my compression code)
2. The chunk size I used is the count of items written to the dataset, which
in this case was 32509. If I open the two files with the hdf5 viewer this
value is shown in the properties in both files.

When creating the dataset I am not really sure whether to use the
dataTypeFile or the dataTypeMem. I tried both and the result are the same
(at least the difference between my code and repack stays the same, the file
size of both files do however change).

Regards,

Bas

···

On Thu, Jul 7, 2011 at 6:15 PM, Jonathan Kim <jkm@hdfgroup.org> wrote:

Hi Bas,

I have a couple questions.
  1. About the size differences between h5repack and your code, is it the
size of HDF5 file or dataset?
  2. About the chunk, what the size of chunk used for h5repack and your
code?

Jonathan

On 7/7/2011 10:00 AM, Bas Schoen wrote:

Hi,

I'm trying to create a hdf5 file with some compound datatypes with GZIP
compression. The development is done in C# using the HDF5DotNet dll.
I need these compression options: shuffle & gzip=9 and I would like to
achieve the same compression ration as h5repack.

The problem however is that the compressed file is the same size as the
not compressed file. If I use h5repack on that file, this size is 10 times
smaller. Can someone see what I am doing wrong?

Part of my implementation:

// We want to write a compound datatype, which is a struct containing a
int and some bytes
  DataStruct[] data = new DataStruct[]{...}; //data has been filled

// Create the compound datatype for memory
H5DataTypeId dataTypeMem = H5T.create(H5T.CreateClass.COMPOUND,
(int)Marshal.SizeOf(default(DataStruct)));
  H5T.insert(dataTypeMem, "A", (int)Marshal.OffsetOf(typeof( DataStruct ),
"A"), H5T.H5Type.NATIVE_INT);
H5T.insert(dataTypeMem, "B", (int)Marshal.OffsetOf(typeof( DataStruct ),
"B"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "C", (int)Marshal.OffsetOf(typeof( DataStruct ),
"C"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "D", (int)Marshal.OffsetOf(typeof( DataStruct ),
"D"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "E", (int)Marshal.OffsetOf(typeof( DataStruct ),
"E"), H5T.H5Type.NATIVE_UCHAR);

// Create the compound datatype for the file. Because the standard
// types we are using for the file may have different sizes than
// the corresponding native types, we must manually calculate the
// offset of each member.
int offset = 0;
H5DataTypeId dataTypeFile = H5T.create(H5T.CreateClass.COMPOUND, (int)(4 +
1 + 1 + 1 + 1));
H5T.insert(dataTypeFile, "A", offset, H5T.H5Type.STD_U32BE);
offset += 4;
H5T.insert(dataTypeFile, "B", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "C", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "D", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "E", offset, H5T.H5Type.STD_U8BE);
offset += 1;

long[] dims = { (long) data.Count() };

try
{
  // Create dataspace, with maximum = current
H5DataSpaceId dataSpace = H5S.create_simple(1, dims);

//Create compression properties
  long[] chunk = dims; //What value should be used as chunk?
H5PropertyListId compressProperty =
H5P.create(H5P.PropertyListClass.DATASET_CREATE);
H5P.setShuffle(compressProperty);
  H5P.setDeflate(compressProperty, 9)
H5P.setChunk(compressProperty, chunk);

// Create the data set
  H5DataSetId dataSet = H5D.create(fileId, "NAME", dataTypeFile, dataSpace,
new H5PropertyListId(H5P.Template.DEFAULT), compressProperty, new
H5PropertyListId(H5P.Template.DEFAULT));

// Write data to it
H5D.write(dataSet, dataTypeMem, new H5DataSpaceId(H5S.H5SType.ALL), new
H5DataSpaceId(H5S.H5SType.ALL), new H5PropertyListId(H5P.Template.DEFAULT),
new H5Array< DataStruct >(data));

// Cleanup
H5T.close(dataTypeMem);
H5T.close(dataTypeFile);
H5D.close(dataSet);
H5P.close(compressProperty);
H5S.close(dataSpace);
}
catch
{
...
}

All steps are: creating datatype for both file and memory, creating
dataspace, creating the dataset with shuffle and compression creation
properties and finally writing the data to file.
It might be a bit difficult to check this code, but are there any steps
missing/incorrect?

Help appreciated.

  Best regards,

Bas Schoen

_______________________________________________
Hdf-forum is for HDF software users discussion.Hdf-forum@hdfgroup.orghttp://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Bas,

It sounds that h5repack did the work, but your code didn't do as expected if the result files have same size. Just to mention, h5repack can potential reduce more size when dealing with entire file as it refreshes all the objects from previous changes.

According to your reply there are 3 HDF5 files; 1. original , 2. result from h5repack , 3. result from your code.
Could you send us either the outputs from "h5dump -p -H <HDF5 file>" or sanpshots from hdf-view's 'show properties' pop-up window for the 3 files?

Also could you send me how you ran h5repack?

Regards,

Jonathan

···

On 7/8/2011 2:44 AM, Bas Schoen wrote:

Hi Jonathan,

Thanks for your reply.

1. The size difference is between the file sizes of two HDF5 files. One with compression with my code, and one with repack. (and an original not compressed file which is the same size as with my compression code)
2. The chunk size I used is the count of items written to the dataset, which in this case was 32509. If I open the two files with the hdf5 viewer this value is shown in the properties in both files.

When creating the dataset I am not really sure whether to use the dataTypeFile or the dataTypeMem. I tried both and the result are the same (at least the difference between my code and repack stays the same, the file size of both files do however change).

Regards,

Bas

On Thu, Jul 7, 2011 at 6:15 PM, Jonathan Kim <jkm@hdfgroup.org > <mailto:jkm@hdfgroup.org>> wrote:

    Hi Bas,

    I have a couple questions.
      1. About the size differences between h5repack and your code, is
    it the size of HDF5 file or dataset?
      2. About the chunk, what the size of chunk used for h5repack and
    your code?

    Jonathan

    On 7/7/2011 10:00 AM, Bas Schoen wrote:

    Hi,

    I'm trying to create a hdf5 file with some compound datatypes
    with GZIP compression. The development is done in C# using the
    HDF5DotNet dll.
    I need these compression options: shuffle & gzip=9 and I would
    like to achieve the same compression ration as h5repack.

    The problem however is that the compressed file is the same size
    as the not compressed file. If I use h5repack on that file, this
    size is 10 times smaller. Can someone see what I am doing wrong?

    Part of my implementation:

    // We want to write a compound datatype, which is a struct
    containing a int and some bytes
      DataStruct[] data = new DataStruct[]{...}; //data has been
    filled

    // Create the compound datatype for memory
    H5DataTypeId dataTypeMem = H5T.create(H5T.CreateClass.COMPOUND,
    (int)Marshal.SizeOf(default(DataStruct)));
      H5T.insert(dataTypeMem, "A", (int)Marshal.OffsetOf(typeof(
    DataStruct ), "A"), H5T.H5Type.NATIVE_INT);
    H5T.insert(dataTypeMem, "B", (int)Marshal.OffsetOf(typeof(
    DataStruct ), "B"), H5T.H5Type.NATIVE_UCHAR);
    H5T.insert(dataTypeMem, "C", (int)Marshal.OffsetOf(typeof(
    DataStruct ), "C"), H5T.H5Type.NATIVE_UCHAR);
    H5T.insert(dataTypeMem, "D", (int)Marshal.OffsetOf(typeof(
    DataStruct ), "D"), H5T.H5Type.NATIVE_UCHAR);
    H5T.insert(dataTypeMem, "E", (int)Marshal.OffsetOf(typeof(
    DataStruct ), "E"), H5T.H5Type.NATIVE_UCHAR);

    // Create the compound datatype for the file. Because the standard
    // types we are using for the file may have different sizes than
    // the corresponding native types, we must manually calculate the
    // offset of each member.
    int offset = 0;
    H5DataTypeId dataTypeFile = H5T.create(H5T.CreateClass.COMPOUND,
    (int)(4 + 1 + 1 + 1 + 1));
    H5T.insert(dataTypeFile, "A", offset, H5T.H5Type.STD_U32BE);
    offset += 4;
    H5T.insert(dataTypeFile, "B", offset, H5T.H5Type.STD_U8BE);
    offset += 1;
    H5T.insert(dataTypeFile, "C", offset, H5T.H5Type.STD_U8BE);
    offset += 1;
    H5T.insert(dataTypeFile, "D", offset, H5T.H5Type.STD_U8BE);
    offset += 1;
    H5T.insert(dataTypeFile, "E", offset, H5T.H5Type.STD_U8BE);
    offset += 1;

    long[] dims = { (long) data.Count() };

    try
    {
      // Create dataspace, with maximum = current
    H5DataSpaceId dataSpace = H5S.create_simple(1, dims);

    //Create compression properties
      long[] chunk = dims; //What value should be used as chunk?
    H5PropertyListId compressProperty =
    H5P.create(H5P.PropertyListClass.DATASET_CREATE);
    H5P.setShuffle(compressProperty);
      H5P.setDeflate(compressProperty, 9)
    H5P.setChunk(compressProperty, chunk);

    // Create the data set
      H5DataSetId dataSet = H5D.create(fileId, "NAME", dataTypeFile,
    dataSpace, new H5PropertyListId(H5P.Template.DEFAULT),
    compressProperty, new H5PropertyListId(H5P.Template.DEFAULT));

    // Write data to it
    H5D.write(dataSet, dataTypeMem, new
    H5DataSpaceId(H5S.H5SType.ALL), new
    H5PropertyListId(H5P.Template.DEFAULT), new H5Array< DataStruct
    >(data));

    // Cleanup
    H5T.close(dataTypeMem);
    H5T.close(dataTypeFile);
    H5D.close(dataSet);
    H5P.close(compressProperty);
    H5S.close(dataSpace);
    }
    catch
    {
    ...
    }

    All steps are: creating datatype for both file and memory,
    creating dataspace, creating the dataset with shuffle and
    compression creation properties and finally writing the data to file.
    It might be a bit difficult to check this code, but are there any
    steps missing/incorrect?

    Help appreciated.

    Best regards,

    Bas Schoen

    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    Hdf-forum@hdfgroup.org <mailto:Hdf-forum@hdfgroup.org>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Jonathan,

I've attached the output from h5dump in 3 files, original, compressed(using
my code) and repack.

I ran h5repack with the following commands "h5repack -f SHUF -f GZIP=9
<input.h5> <output.h5>

Just to make sure: The problems I'm having are not really related to
h5repack. My implementation of gzip compression just doesn't compress the
hdf5 file at all. Or even worse, if I use a small chunk size (say: 20) the
filesize increases compared to the original hdf5 file.

I've tried to make things easier and written a small test function which
doesn't write a compound datatype but just an int array. But my compression
still doesn't work. I've got the feeling I am missing a important step in
the compression process.

This is what I tried:

Random rand = new Random();
  int[] data = new int[32508];

  //Fill some dummy data
for (int i = 0; i < 32508; i++)
       data[i] = rand.Next();

//Create DataSpace
H5DataSpaceId dataSpace = H5S.create_simple(1, new long[] { data.Length });

//Create Creation Property List
H5PropertyListId compressProperty =
H5P.create(H5P.PropertyListClass.DATASET_CREATE);
H5P.setShuffle(compressProperty);
H5P.setDeflate(compressProperty, 9);
H5P.setChunk(compressProperty, new long[] { data.Length } );

//Create DataSet with compression enabled
  H5DataSetId dataSet = H5D.create(fileId, "Test", H5T.H5Type.STD_I32LE,
dataSpace, new H5PropertyListId(H5P.Template.DEFAULT), compressProperty, new
H5PropertyListId(H5P.Template.DEFAULT)); //this line has been used to turn
compression on
  //H5DataSetId dataSet = H5D.create(fileId, "Test", H5T.H5Type.STD_I32LE,
dataSpace); //this line has been used to turn compression off

//Write data to file
H5D.write(dataSet, new H5DataTypeId(H5T.H5Type.NATIVE_INT), new
H5DataSpaceId(H5S.H5SType.ALL), new H5DataSpaceId(H5S.H5SType.ALL),
new H5PropertyListId(H5P.Template.DEFAULT), new H5Array<int>(data));

H5P.close(compressProperty);
H5D.close(dataSet);
H5S.close(dataSpace);

Regards,

Bas

Original.txt (668 Bytes)

Compressed.txt (731 Bytes)

Repack.txt (733 Bytes)

···

On Fri, Jul 8, 2011 at 5:03 PM, Jonathan Kim <jkm@hdfgroup.org> wrote:

Hi Bas,

It sounds that h5repack did the work, but your code didn't do as expected
if the result files have same size. Just to mention, h5repack can
potential reduce more size when dealing with entire file as it refreshes all
the objects from previous changes.

According to your reply there are 3 HDF5 files; 1. original , 2. result
from h5repack , 3. result from your code.
Could you send us either the outputs from "h5dump -p -H <HDF5 file>" or
sanpshots from hdf-view's 'show properties' pop-up window for the 3 files?

Also could you send me how you ran h5repack?

Regards,

Jonathan

On 7/8/2011 2:44 AM, Bas Schoen wrote:

Hi Jonathan,

Thanks for your reply.

1. The size difference is between the file sizes of two HDF5 files. One
with compression with my code, and one with repack. (and an original not
compressed file which is the same size as with my compression code)
2. The chunk size I used is the count of items written to the dataset,
which in this case was 32509. If I open the two files with the hdf5 viewer
this value is shown in the properties in both files.

When creating the dataset I am not really sure whether to use the
dataTypeFile or the dataTypeMem. I tried both and the result are the same
(at least the difference between my code and repack stays the same, the file
size of both files do however change).

Regards,

Bas

On Thu, Jul 7, 2011 at 6:15 PM, Jonathan Kim <jkm@hdfgroup.org> wrote:

Hi Bas,

I have a couple questions.
  1. About the size differences between h5repack and your code, is it the
size of HDF5 file or dataset?
  2. About the chunk, what the size of chunk used for h5repack and your
code?

Jonathan

On 7/7/2011 10:00 AM, Bas Schoen wrote:

Hi,

I'm trying to create a hdf5 file with some compound datatypes with GZIP
compression. The development is done in C# using the HDF5DotNet dll.
I need these compression options: shuffle & gzip=9 and I would like to
achieve the same compression ration as h5repack.

The problem however is that the compressed file is the same size as the
not compressed file. If I use h5repack on that file, this size is 10 times
smaller. Can someone see what I am doing wrong?

Part of my implementation:

// We want to write a compound datatype, which is a struct containing a
int and some bytes
  DataStruct[] data = new DataStruct[]{...}; //data has been filled

// Create the compound datatype for memory
H5DataTypeId dataTypeMem = H5T.create(H5T.CreateClass.COMPOUND,
(int)Marshal.SizeOf(default(DataStruct)));
  H5T.insert(dataTypeMem, "A", (int)Marshal.OffsetOf(typeof( DataStruct ),
"A"), H5T.H5Type.NATIVE_INT);
H5T.insert(dataTypeMem, "B", (int)Marshal.OffsetOf(typeof( DataStruct ),
"B"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "C", (int)Marshal.OffsetOf(typeof( DataStruct ),
"C"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "D", (int)Marshal.OffsetOf(typeof( DataStruct ),
"D"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "E", (int)Marshal.OffsetOf(typeof( DataStruct ),
"E"), H5T.H5Type.NATIVE_UCHAR);

// Create the compound datatype for the file. Because the standard
// types we are using for the file may have different sizes than
// the corresponding native types, we must manually calculate the
// offset of each member.
int offset = 0;
H5DataTypeId dataTypeFile = H5T.create(H5T.CreateClass.COMPOUND, (int)(4 +
1 + 1 + 1 + 1));
H5T.insert(dataTypeFile, "A", offset, H5T.H5Type.STD_U32BE);
offset += 4;
H5T.insert(dataTypeFile, "B", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "C", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "D", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "E", offset, H5T.H5Type.STD_U8BE);
offset += 1;

long[] dims = { (long) data.Count() };

try
{
  // Create dataspace, with maximum = current
H5DataSpaceId dataSpace = H5S.create_simple(1, dims);

//Create compression properties
  long[] chunk = dims; //What value should be used as chunk?
H5PropertyListId compressProperty =
H5P.create(H5P.PropertyListClass.DATASET_CREATE);
H5P.setShuffle(compressProperty);
  H5P.setDeflate(compressProperty, 9)
H5P.setChunk(compressProperty, chunk);

// Create the data set
  H5DataSetId dataSet = H5D.create(fileId, "NAME", dataTypeFile, dataSpace,
new H5PropertyListId(H5P.Template.DEFAULT), compressProperty, new
H5PropertyListId(H5P.Template.DEFAULT));

// Write data to it
H5D.write(dataSet, dataTypeMem, new H5DataSpaceId(H5S.H5SType.ALL), new
H5DataSpaceId(H5S.H5SType.ALL), new H5PropertyListId(H5P.Template.DEFAULT),
new H5Array< DataStruct >(data));

// Cleanup
H5T.close(dataTypeMem);
H5T.close(dataTypeFile);
H5D.close(dataSet);
H5P.close(compressProperty);
H5S.close(dataSpace);
}
catch
{
...
}

All steps are: creating datatype for both file and memory, creating
dataspace, creating the dataset with shuffle and compression creation
properties and finally writing the data to file.
It might be a bit difficult to check this code, but are there any steps
missing/incorrect?

Help appreciated.

  Best regards,

Bas Schoen

_______________________________________________
Hdf-forum is for HDF software users discussion.Hdf-forum@hdfgroup.orghttp://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.Hdf-forum@hdfgroup.orghttp://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Bas, how are you? Attached is an IronPython
(http://ironpython.codeplex.com/\) script that you might want

to run and recreate in C#. (IronPython uses HDF5DotNet.dll the same way you
do from C#.)

It creates a compressed, chunked dataset of a compound type (int, float,
double).

'h5dump -p -H SDScompound.h5' yields.

HDF5 "SDScompound.h5" {

GROUP "/" {

   DATASET "ArrayOfStructures" {

      DATATYPE H5T_COMPOUND {

         H5T_STD_I32LE "a_name";

         H5T_IEEE_F32LE "b_name";

         H5T_IEEE_F64LE "c_name";

      }

      DATASPACE SIMPLE { ( 1024 ) / ( 1024 ) }

      STORAGE_LAYOUT {

         CHUNKED ( 128 )

         SIZE 4944 (3.314:1 COMPRESSION)

       }

     FILTERS {

         COMPRESSION DEFLATE { LEVEL 9 }

      }

      FILLVALUE {

         FILL_TIME H5D_FILL_TIME_IFSET

         VALUE {

         0,

         0,

         0

      }

      }

      ALLOCATION_TIME {

         H5D_ALLOC_TIME_INCR

      }

   }

}

}

Can you reproduce that?

Best, G.

compound_compressed.py (2.55 KB)

···

From: hdf-forum-bounces@hdfgroup.org [mailto:hdf-forum-bounces@hdfgroup.org]
On Behalf Of Bas Schoen
Sent: Friday, July 08, 2011 11:01 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Fwd: GZIP Compression and h5repack

Hi Jonathan,

I've attached the output from h5dump in 3 files, original, compressed(using
my code) and repack.

I ran h5repack with the following commands "h5repack -f SHUF -f GZIP=9
<input.h5> <output.h5>

Just to make sure: The problems I'm having are not really related to
h5repack. My implementation of gzip compression just doesn't compress the
hdf5 file at all. Or even worse, if I use a small chunk size (say: 20) the
filesize increases compared to the original hdf5 file.

I've tried to make things easier and written a small test function which
doesn't write a compound datatype but just an int array. But my compression
still doesn't work. I've got the feeling I am missing a important step in
the compression process.

This is what I tried:

Random rand = new Random();
  int[] data = new int[32508];

  //Fill some dummy data

for (int i = 0; i < 32508; i++)
       data[i] = rand.Next();
   
//Create DataSpace
H5DataSpaceId dataSpace = H5S.create_simple(1, new long[] { data.Length });

//Create Creation Property List
H5PropertyListId compressProperty =
H5P.create(H5P.PropertyListClass.DATASET_CREATE);
H5P.setShuffle(compressProperty);
H5P.setDeflate(compressProperty, 9);
H5P.setChunk(compressProperty, new long[] { data.Length } );

//Create DataSet with compression enabled
  H5DataSetId dataSet = H5D.create(fileId, "Test", H5T.H5Type.STD_I32LE,
dataSpace, new H5PropertyListId(H5P.Template.DEFAULT), compressProperty, new
H5PropertyListId(H5P.Template.DEFAULT)); //this line has been used to turn
compression on
  //H5DataSetId dataSet = H5D.create(fileId, "Test", H5T.H5Type.STD_I32LE,
dataSpace); //this line has been used to turn compression off

//Write data to file
H5D.write(dataSet, new H5DataTypeId(H5T.H5Type.NATIVE_INT), new
H5DataSpaceId(H5S.H5SType.ALL), new H5DataSpaceId(H5S.H5SType.ALL),
new H5PropertyListId(H5P.Template.DEFAULT), new H5Array<int>(data));

H5P.close(compressProperty);
H5D.close(dataSet);
H5S.close(dataSpace);

Regards,

Bas

On Fri, Jul 8, 2011 at 5:03 PM, Jonathan Kim <jkm@hdfgroup.org> wrote:

Hi Bas,

It sounds that h5repack did the work, but your code didn't do as expected if
the result files have same size. Just to mention, h5repack can potential
reduce more size when dealing with entire file as it refreshes all the
objects from previous changes.

According to your reply there are 3 HDF5 files; 1. original , 2. result from
h5repack , 3. result from your code.
Could you send us either the outputs from "h5dump -p -H <HDF5 file>" or
sanpshots from hdf-view's 'show properties' pop-up window for the 3 files?

Also could you send me how you ran h5repack?

Regards,

Jonathan

On 7/8/2011 2:44 AM, Bas Schoen wrote:

Hi Jonathan,

Thanks for your reply.

1. The size difference is between the file sizes of two HDF5 files. One with
compression with my code, and one with repack. (and an original not
compressed file which is the same size as with my compression code)

2. The chunk size I used is the count of items written to the dataset, which
in this case was 32509. If I open the two files with the hdf5 viewer this
value is shown in the properties in both files.

When creating the dataset I am not really sure whether to use the
dataTypeFile or the dataTypeMem. I tried both and the result are the same
(at least the difference between my code and repack stays the same, the file
size of both files do however change).

Regards,

Bas

On Thu, Jul 7, 2011 at 6:15 PM, Jonathan Kim <jkm@hdfgroup.org> wrote:

Hi Bas,

I have a couple questions.
  1. About the size differences between h5repack and your code, is it the
size of HDF5 file or dataset?
  2. About the chunk, what the size of chunk used for h5repack and your
code?

Jonathan

On 7/7/2011 10:00 AM, Bas Schoen wrote:

Hi,

I'm trying to create a hdf5 file with some compound datatypes with GZIP
compression. The development is done in C# using the HDF5DotNet dll.

I need these compression options: shuffle & gzip=9 and I would like to
achieve the same compression ration as h5repack.

The problem however is that the compressed file is the same size as the not
compressed file. If I use h5repack on that file, this size is 10 times
smaller. Can someone see what I am doing wrong?

Part of my implementation:

// We want to write a compound datatype, which is a struct containing a int
and some bytes

  DataStruct[] data = new DataStruct[]{...}; //data has been filled

// Create the compound datatype for memory
H5DataTypeId dataTypeMem = H5T.create(H5T.CreateClass.COMPOUND,
(int)Marshal.SizeOf(default(DataStruct)));
  H5T.insert(dataTypeMem, "A", (int)Marshal.OffsetOf(typeof( DataStruct ),
"A"), H5T.H5Type.NATIVE_INT);
H5T.insert(dataTypeMem, "B", (int)Marshal.OffsetOf(typeof( DataStruct ),
"B"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "C", (int)Marshal.OffsetOf(typeof( DataStruct ),
"C"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "D", (int)Marshal.OffsetOf(typeof( DataStruct ),
"D"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "E", (int)Marshal.OffsetOf(typeof( DataStruct ),
"E"), H5T.H5Type.NATIVE_UCHAR);

// Create the compound datatype for the file. Because the standard
// types we are using for the file may have different sizes than
// the corresponding native types, we must manually calculate the
// offset of each member.
int offset = 0;
H5DataTypeId dataTypeFile = H5T.create(H5T.CreateClass.COMPOUND, (int)(4 + 1
+ 1 + 1 + 1));
H5T.insert(dataTypeFile, "A", offset, H5T.H5Type.STD_U32BE);
offset += 4;
H5T.insert(dataTypeFile, "B", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "C", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "D", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "E", offset, H5T.H5Type.STD_U8BE);
offset += 1;

long[] dims = { (long) data.Count() };

try
{
  // Create dataspace, with maximum = current
H5DataSpaceId dataSpace = H5S.create_simple(1, dims);

//Create compression properties
  long[] chunk = dims; //What value should be used as chunk?
H5PropertyListId compressProperty =
H5P.create(H5P.PropertyListClass.DATASET_CREATE);
H5P.setShuffle(compressProperty);
  H5P.setDeflate(compressProperty, 9)
H5P.setChunk(compressProperty, chunk);

// Create the data set
  H5DataSetId dataSet = H5D.create(fileId, "NAME", dataTypeFile, dataSpace,
new H5PropertyListId(H5P.Template.DEFAULT), compressProperty, new
H5PropertyListId(H5P.Template.DEFAULT));

// Write data to it
H5D.write(dataSet, dataTypeMem, new H5DataSpaceId(H5S.H5SType.ALL), new
H5DataSpaceId(H5S.H5SType.ALL), new H5PropertyListId(H5P.Template.DEFAULT),
new H5Array< DataStruct >(data));

// Cleanup
H5T.close(dataTypeMem);
H5T.close(dataTypeFile);
H5D.close(dataSet);
H5P.close(compressProperty);
H5S.close(dataSpace);

}

catch

{

...

}

All steps are: creating datatype for both file and memory, creating
dataspace, creating the dataset with shuffle and compression creation
properties and finally writing the data to file.

It might be a bit difficult to check this code, but are there any steps
missing/incorrect?

Help appreciated.

Best regards,

Bas Schoen

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Gerd,

Thank you very much for your reply. I managed to reproduce your sample in C#
with the same results. I am not really sure what I have been doing wrong the
whole time, but at least I've got a working sample now. This even gives the
same result compared to h5repack, which is great!

I'll try to implement this in the real project, and see if it works. Thanks
to both of you (Jonathan & Gerd).

Once I have figured out my mistake I will post it here, so others can
benefit from it.

Regards,

Bas

···

On Fri, Jul 8, 2011 at 9:00 PM, Gerd Heber <gheber@hdfgroup.org> wrote:

Bas, how are you? Attached is an IronPython (
http://ironpython.codeplex.com/) script that you might want

to run and recreate in C#. (IronPython uses HDF5DotNet.dll the same way you
do from C#.)

It creates a compressed, chunked dataset of a compound type (int, float,
double).

‘h5dump –p –H SDScompound.h5’ yields.

HDF5 "SDScompound.h5" {

GROUP "/" {

   DATASET "ArrayOfStructures" {

      DATATYPE H5T_COMPOUND {

         H5T_STD_I32LE "a_name";

         H5T_IEEE_F32LE "b_name";

         H5T_IEEE_F64LE "c_name";

      }

      DATASPACE SIMPLE { ( 1024 ) / ( 1024 ) }

      STORAGE_LAYOUT {

         CHUNKED ( 128 )

         SIZE 4944 (3.314:1 COMPRESSION)

       }

     FILTERS {

         COMPRESSION DEFLATE { LEVEL 9 }

      }

      FILLVALUE {

         FILL_TIME H5D_FILL_TIME_IFSET

         VALUE {

         0,

         0,

         0

      }

      }

      ALLOCATION_TIME {

         H5D_ALLOC_TIME_INCR

      }

   }

}

}

Can you reproduce that?

Best, G.

*From:* hdf-forum-bounces@hdfgroup.org [mailto:
hdf-forum-bounces@hdfgroup.org] *On Behalf Of *Bas Schoen
*Sent:* Friday, July 08, 2011 11:01 AM
*To:* HDF Users Discussion List
*Subject:* Re: [Hdf-forum] Fwd: GZIP Compression and h5repack

Hi Jonathan,

I've attached the output from h5dump in 3 files, original, compressed(using
my code) and repack.

I ran h5repack with the following commands "h5repack -f SHUF -f GZIP=9
<input.h5> <output.h5>

Just to make sure: The problems I'm having are not really related to
h5repack. My implementation of gzip compression just doesn't compress the
hdf5 file at all. Or even worse, if I use a small chunk size (say: 20) the
filesize increases compared to the original hdf5 file.

I've tried to make things easier and written a small test function which
doesn't write a compound datatype but just an int array. But my compression
still doesn't work. I've got the feeling I am missing a important step in
the compression process.

This is what I tried:

Random rand = new Random();
  int[] data = new int[32508];

  //Fill some dummy data

for (int i = 0; i < 32508; i++)
       data[i] = rand.Next();

//Create DataSpace
H5DataSpaceId dataSpace = H5S.create_simple(1, new long[] { data.Length });

//Create Creation Property List
H5PropertyListId compressProperty =
H5P.create(H5P.PropertyListClass.DATASET_CREATE);
H5P.setShuffle(compressProperty);
H5P.setDeflate(compressProperty, 9);
H5P.setChunk(compressProperty, new long[] { data.Length } );

//Create DataSet with compression enabled
  H5DataSetId dataSet = H5D.create(fileId, "Test", H5T.H5Type.STD_I32LE,
dataSpace, new H5PropertyListId(H5P.Template.DEFAULT), compressProperty, new
H5PropertyListId(H5P.Template.DEFAULT)); //this line has been used to turn
compression on
  //H5DataSetId dataSet = H5D.create(fileId, "Test", H5T.H5Type.STD_I32LE,
dataSpace); //this line has been used to turn compression off

//Write data to file
H5D.write(dataSet, new H5DataTypeId(H5T.H5Type.NATIVE_INT), new
H5DataSpaceId(H5S.H5SType.ALL), new H5DataSpaceId(H5S.H5SType.ALL),
new H5PropertyListId(H5P.Template.DEFAULT), new H5Array<int>(data));

H5P.close(compressProperty);
H5D.close(dataSet);
H5S.close(dataSpace);

Regards,

Bas

On Fri, Jul 8, 2011 at 5:03 PM, Jonathan Kim <jkm@hdfgroup.org> wrote:

Hi Bas,

It sounds that h5repack did the work, but your code didn't do as expected
if the result files have same size. Just to mention, h5repack can
potential reduce more size when dealing with entire file as it refreshes all
the objects from previous changes.

According to your reply there are 3 HDF5 files; 1. original , 2. result
from h5repack , 3. result from your code.
Could you send us either the outputs from "h5dump -p -H <HDF5 file>" or
sanpshots from hdf-view's 'show properties' pop-up window for the 3 files?

Also could you send me how you ran h5repack?

Regards,

Jonathan

On 7/8/2011 2:44 AM, Bas Schoen wrote:

Hi Jonathan,

Thanks for your reply.

1. The size difference is between the file sizes of two HDF5 files. One
with compression with my code, and one with repack. (and an original not
compressed file which is the same size as with my compression code)

2. The chunk size I used is the count of items written to the dataset,
which in this case was 32509. If I open the two files with the hdf5 viewer
this value is shown in the properties in both files.

When creating the dataset I am not really sure whether to use the
dataTypeFile or the dataTypeMem. I tried both and the result are the same
(at least the difference between my code and repack stays the same, the file
size of both files do however change).

Regards,

Bas

On Thu, Jul 7, 2011 at 6:15 PM, Jonathan Kim <jkm@hdfgroup.org> wrote:

Hi Bas,

I have a couple questions.
  1. About the size differences between h5repack and your code, is it the
size of HDF5 file or dataset?
  2. About the chunk, what the size of chunk used for h5repack and your
code?

Jonathan

On 7/7/2011 10:00 AM, Bas Schoen wrote:

Hi,

I'm trying to create a hdf5 file with some compound datatypes with GZIP
compression. The development is done in C# using the HDF5DotNet dll.

I need these compression options: shuffle & gzip=9 and I would like to
achieve the same compression ration as h5repack.

The problem however is that the compressed file is the same size as the not
compressed file. If I use h5repack on that file, this size is 10 times
smaller. Can someone see what I am doing wrong?

Part of my implementation:

// We want to write a compound datatype, which is a struct containing a int
and some bytes

  DataStruct[] data = new DataStruct[]{...}; //data has been filled

// Create the compound datatype for memory
H5DataTypeId dataTypeMem = H5T.create(H5T.CreateClass.COMPOUND,
(int)Marshal.SizeOf(default(DataStruct)));
  H5T.insert(dataTypeMem, "A", (int)Marshal.OffsetOf(typeof( DataStruct ),
"A"), H5T.H5Type.NATIVE_INT);
H5T.insert(dataTypeMem, "B", (int)Marshal.OffsetOf(typeof( DataStruct ),
"B"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "C", (int)Marshal.OffsetOf(typeof( DataStruct ),
"C"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "D", (int)Marshal.OffsetOf(typeof( DataStruct ),
"D"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "E", (int)Marshal.OffsetOf(typeof( DataStruct ),
"E"), H5T.H5Type.NATIVE_UCHAR);

// Create the compound datatype for the file. Because the standard
// types we are using for the file may have different sizes than
// the corresponding native types, we must manually calculate the
// offset of each member.
int offset = 0;
H5DataTypeId dataTypeFile = H5T.create(H5T.CreateClass.COMPOUND, (int)(4 +
1 + 1 + 1 + 1));
H5T.insert(dataTypeFile, "A", offset, H5T.H5Type.STD_U32BE);
offset += 4;
H5T.insert(dataTypeFile, "B", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "C", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "D", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "E", offset, H5T.H5Type.STD_U8BE);
offset += 1;

long[] dims = { (long) data.Count() };

try
{
  // Create dataspace, with maximum = current
H5DataSpaceId dataSpace = H5S.create_simple(1, dims);

//Create compression properties
  long[] chunk = dims; //What value should be used as chunk?
H5PropertyListId compressProperty =
H5P.create(H5P.PropertyListClass.DATASET_CREATE);
H5P.setShuffle(compressProperty);
  H5P.setDeflate(compressProperty, 9)
H5P.setChunk(compressProperty, chunk);

// Create the data set
  H5DataSetId dataSet = H5D.create(fileId, "NAME", dataTypeFile, dataSpace,
new H5PropertyListId(H5P.Template.DEFAULT), compressProperty, new
H5PropertyListId(H5P.Template.DEFAULT));

// Write data to it
H5D.write(dataSet, dataTypeMem, new H5DataSpaceId(H5S.H5SType.ALL), new
H5DataSpaceId(H5S.H5SType.ALL), new H5PropertyListId(H5P.Template.DEFAULT),
new H5Array< DataStruct >(data));

// Cleanup
H5T.close(dataTypeMem);
H5T.close(dataTypeFile);
H5D.close(dataSet);
H5P.close(compressProperty);
H5S.close(dataSpace);

}

catch

{

...

}

All steps are: creating datatype for both file and memory, creating
dataspace, creating the dataset with shuffle and compression creation
properties and finally writing the data to file.

It might be a bit difficult to check this code, but are there any steps
missing/incorrect?

Help appreciated.

Best regards,

Bas Schoen

_______________________________________________

Hdf-forum is for HDF software users discussion.

Hdf-forum@hdfgroup.org

http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________

Hdf-forum is for HDF software users discussion.

Hdf-forum@hdfgroup.org

http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi,

I've figured out what the problem with the GZIP compression was. It turned
out I was using a wrong version hdf5dll.dll. Replacing this file solved the
issues I was having.

Thanks for your help guys!

Regards,

Bas

···

From: Bas Schoen <bschoen@engineering-spirit.nl>
Date: Sat, Jul 9, 2011 at 10:51 PM
Subject: Re: [Hdf-forum] Fwd: GZIP Compression and h5repack
To: HDF Users Discussion List <hdf-forum@hdfgroup.org>

Hi Gerd,

Thank you very much for your reply. I managed to reproduce your sample in C#
with the same results. I am not really sure what I have been doing wrong the
whole time, but at least I've got a working sample now. This even gives the
same result compared to h5repack, which is great!

I'll try to implement this in the real project, and see if it works. Thanks
to both of you (Jonathan & Gerd).

Once I have figured out my mistake I will post it here, so others can
benefit from it.

Regards,

Bas

On Fri, Jul 8, 2011 at 9:00 PM, Gerd Heber <gheber@hdfgroup.org> wrote:

Bas, how are you? Attached is an IronPython (
http://ironpython.codeplex.com/) script that you might want

to run and recreate in C#. (IronPython uses HDF5DotNet.dll the same way you
do from C#.)

It creates a compressed, chunked dataset of a compound type (int, float,
double).

‘h5dump –p –H SDScompound.h5’ yields.

HDF5 "SDScompound.h5" {

GROUP "/" {

   DATASET "ArrayOfStructures" {

      DATATYPE H5T_COMPOUND {

         H5T_STD_I32LE "a_name";

         H5T_IEEE_F32LE "b_name";

         H5T_IEEE_F64LE "c_name";

      }

      DATASPACE SIMPLE { ( 1024 ) / ( 1024 ) }

      STORAGE_LAYOUT {

         CHUNKED ( 128 )

         SIZE 4944 (3.314:1 COMPRESSION)

       }

     FILTERS {

         COMPRESSION DEFLATE { LEVEL 9 }

      }

      FILLVALUE {

         FILL_TIME H5D_FILL_TIME_IFSET

         VALUE {

         0,

         0,

         0

      }

      }

      ALLOCATION_TIME {

         H5D_ALLOC_TIME_INCR

      }

   }

}

}

Can you reproduce that?

Best, G.

*From:* hdf-forum-bounces@hdfgroup.org [mailto:
hdf-forum-bounces@hdfgroup.org] *On Behalf Of *Bas Schoen
*Sent:* Friday, July 08, 2011 11:01 AM
*To:* HDF Users Discussion List
*Subject:* Re: [Hdf-forum] Fwd: GZIP Compression and h5repack

Hi Jonathan,

I've attached the output from h5dump in 3 files, original, compressed(using
my code) and repack.

I ran h5repack with the following commands "h5repack -f SHUF -f GZIP=9
<input.h5> <output.h5>

Just to make sure: The problems I'm having are not really related to
h5repack. My implementation of gzip compression just doesn't compress the
hdf5 file at all. Or even worse, if I use a small chunk size (say: 20) the
filesize increases compared to the original hdf5 file.

I've tried to make things easier and written a small test function which
doesn't write a compound datatype but just an int array. But my compression
still doesn't work. I've got the feeling I am missing a important step in
the compression process.

This is what I tried:

Random rand = new Random();
  int[] data = new int[32508];

  //Fill some dummy data

for (int i = 0; i < 32508; i++)
       data[i] = rand.Next();

//Create DataSpace
H5DataSpaceId dataSpace = H5S.create_simple(1, new long[] { data.Length });

//Create Creation Property List
H5PropertyListId compressProperty =
H5P.create(H5P.PropertyListClass.DATASET_CREATE);
H5P.setShuffle(compressProperty);
H5P.setDeflate(compressProperty, 9);
H5P.setChunk(compressProperty, new long[] { data.Length } );

//Create DataSet with compression enabled
  H5DataSetId dataSet = H5D.create(fileId, "Test", H5T.H5Type.STD_I32LE,
dataSpace, new H5PropertyListId(H5P.Template.DEFAULT), compressProperty, new
H5PropertyListId(H5P.Template.DEFAULT)); //this line has been used to turn
compression on
  //H5DataSetId dataSet = H5D.create(fileId, "Test", H5T.H5Type.STD_I32LE,
dataSpace); //this line has been used to turn compression off

//Write data to file
H5D.write(dataSet, new H5DataTypeId(H5T.H5Type.NATIVE_INT), new
H5DataSpaceId(H5S.H5SType.ALL), new H5DataSpaceId(H5S.H5SType.ALL),
new H5PropertyListId(H5P.Template.DEFAULT), new H5Array<int>(data));

H5P.close(compressProperty);
H5D.close(dataSet);
H5S.close(dataSpace);

Regards,

Bas

On Fri, Jul 8, 2011 at 5:03 PM, Jonathan Kim <jkm@hdfgroup.org> wrote:

Hi Bas,

It sounds that h5repack did the work, but your code didn't do as expected
if the result files have same size. Just to mention, h5repack can
potential reduce more size when dealing with entire file as it refreshes all
the objects from previous changes.

According to your reply there are 3 HDF5 files; 1. original , 2. result
from h5repack , 3. result from your code.
Could you send us either the outputs from "h5dump -p -H <HDF5 file>" or
sanpshots from hdf-view's 'show properties' pop-up window for the 3 files?

Also could you send me how you ran h5repack?

Regards,

Jonathan

On 7/8/2011 2:44 AM, Bas Schoen wrote:

Hi Jonathan,

Thanks for your reply.

1. The size difference is between the file sizes of two HDF5 files. One
with compression with my code, and one with repack. (and an original not
compressed file which is the same size as with my compression code)

2. The chunk size I used is the count of items written to the dataset,
which in this case was 32509. If I open the two files with the hdf5 viewer
this value is shown in the properties in both files.

When creating the dataset I am not really sure whether to use the
dataTypeFile or the dataTypeMem. I tried both and the result are the same
(at least the difference between my code and repack stays the same, the file
size of both files do however change).

Regards,

Bas

On Thu, Jul 7, 2011 at 6:15 PM, Jonathan Kim <jkm@hdfgroup.org> wrote:

Hi Bas,

I have a couple questions.
  1. About the size differences between h5repack and your code, is it the
size of HDF5 file or dataset?
  2. About the chunk, what the size of chunk used for h5repack and your
code?

Jonathan

On 7/7/2011 10:00 AM, Bas Schoen wrote:

Hi,

I'm trying to create a hdf5 file with some compound datatypes with GZIP
compression. The development is done in C# using the HDF5DotNet dll.

I need these compression options: shuffle & gzip=9 and I would like to
achieve the same compression ration as h5repack.

The problem however is that the compressed file is the same size as the not
compressed file. If I use h5repack on that file, this size is 10 times
smaller. Can someone see what I am doing wrong?

Part of my implementation:

// We want to write a compound datatype, which is a struct containing a int
and some bytes

  DataStruct[] data = new DataStruct[]{...}; //data has been filled

// Create the compound datatype for memory
H5DataTypeId dataTypeMem = H5T.create(H5T.CreateClass.COMPOUND,
(int)Marshal.SizeOf(default(DataStruct)));
  H5T.insert(dataTypeMem, "A", (int)Marshal.OffsetOf(typeof( DataStruct ),
"A"), H5T.H5Type.NATIVE_INT);
H5T.insert(dataTypeMem, "B", (int)Marshal.OffsetOf(typeof( DataStruct ),
"B"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "C", (int)Marshal.OffsetOf(typeof( DataStruct ),
"C"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "D", (int)Marshal.OffsetOf(typeof( DataStruct ),
"D"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "E", (int)Marshal.OffsetOf(typeof( DataStruct ),
"E"), H5T.H5Type.NATIVE_UCHAR);

// Create the compound datatype for the file. Because the standard
// types we are using for the file may have different sizes than
// the corresponding native types, we must manually calculate the
// offset of each member.
int offset = 0;
H5DataTypeId dataTypeFile = H5T.create(H5T.CreateClass.COMPOUND, (int)(4 +
1 + 1 + 1 + 1));
H5T.insert(dataTypeFile, "A", offset, H5T.H5Type.STD_U32BE);
offset += 4;
H5T.insert(dataTypeFile, "B", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "C", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "D", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "E", offset, H5T.H5Type.STD_U8BE);
offset += 1;

long[] dims = { (long) data.Count() };

try
{
  // Create dataspace, with maximum = current
H5DataSpaceId dataSpace = H5S.create_simple(1, dims);

//Create compression properties
  long[] chunk = dims; //What value should be used as chunk?
H5PropertyListId compressProperty =
H5P.create(H5P.PropertyListClass.DATASET_CREATE);
H5P.setShuffle(compressProperty);
  H5P.setDeflate(compressProperty, 9)
H5P.setChunk(compressProperty, chunk);

// Create the data set
  H5DataSetId dataSet = H5D.create(fileId, "NAME", dataTypeFile, dataSpace,
new H5PropertyListId(H5P.Template.DEFAULT), compressProperty, new
H5PropertyListId(H5P.Template.DEFAULT));

// Write data to it
H5D.write(dataSet, dataTypeMem, new H5DataSpaceId(H5S.H5SType.ALL), new
H5DataSpaceId(H5S.H5SType.ALL), new H5PropertyListId(H5P.Template.DEFAULT),
new H5Array< DataStruct >(data));

// Cleanup
H5T.close(dataTypeMem);
H5T.close(dataTypeFile);
H5D.close(dataSet);
H5P.close(compressProperty);
H5S.close(dataSpace);

}

catch

{

...

}

All steps are: creating datatype for both file and memory, creating
dataspace, creating the dataset with shuffle and compression creation
properties and finally writing the data to file.

It might be a bit difficult to check this code, but are there any steps
missing/incorrect?

Help appreciated.

Best regards,

Bas Schoen

_______________________________________________

Hdf-forum is for HDF software users discussion.

Hdf-forum@hdfgroup.org

http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________

Hdf-forum is for HDF software users discussion.

Hdf-forum@hdfgroup.org

http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Bas,

Glad to know that you solved the problem.

It is strange that your program did not give you any error when the
wrong dll was used.

Thanks
--pc

···

On 7/11/2011 3:00 AM, Bas Schoen wrote:

Hi,

I've figured out what the problem with the GZIP compression was. It turned out I was using a wrong version hdf5dll.dll. Replacing this file solved the issues I was having.

Thanks for your help guys!

Regards,

Bas