Bas, how are you? Attached is an IronPython (
http://ironpython.codeplex.com/) script that you might want
to run and recreate in C#. (IronPython uses HDF5DotNet.dll the same way you
do from C#.)
It creates a compressed, chunked dataset of a compound type (int, float,
double).
‘h5dump –p –H SDScompound.h5’ yields.
HDF5 "SDScompound.h5" {
GROUP "/" {
DATASET "ArrayOfStructures" {
DATATYPE H5T_COMPOUND {
H5T_STD_I32LE "a_name";
H5T_IEEE_F32LE "b_name";
H5T_IEEE_F64LE "c_name";
}
DATASPACE SIMPLE { ( 1024 ) / ( 1024 ) }
STORAGE_LAYOUT {
CHUNKED ( 128 )
SIZE 4944 (3.314:1 COMPRESSION)
}
FILTERS {
COMPRESSION DEFLATE { LEVEL 9 }
}
FILLVALUE {
FILL_TIME H5D_FILL_TIME_IFSET
VALUE {
0,
0,
0
}
}
ALLOCATION_TIME {
H5D_ALLOC_TIME_INCR
}
}
}
}
Can you reproduce that?
Best, G.
*From:* hdf-forum-bounces@hdfgroup.org [mailto:
hdf-forum-bounces@hdfgroup.org] *On Behalf Of *Bas Schoen
*Sent:* Friday, July 08, 2011 11:01 AM
*To:* HDF Users Discussion List
*Subject:* Re: [Hdf-forum] Fwd: GZIP Compression and h5repack
Hi Jonathan,
I've attached the output from h5dump in 3 files, original, compressed(using
my code) and repack.
I ran h5repack with the following commands "h5repack -f SHUF -f GZIP=9
<input.h5> <output.h5>
Just to make sure: The problems I'm having are not really related to
h5repack. My implementation of gzip compression just doesn't compress the
hdf5 file at all. Or even worse, if I use a small chunk size (say: 20) the
filesize increases compared to the original hdf5 file.
I've tried to make things easier and written a small test function which
doesn't write a compound datatype but just an int array. But my compression
still doesn't work. I've got the feeling I am missing a important step in
the compression process.
This is what I tried:
Random rand = new Random();
int[] data = new int[32508];
//Fill some dummy data
for (int i = 0; i < 32508; i++)
data[i] = rand.Next();
//Create DataSpace
H5DataSpaceId dataSpace = H5S.create_simple(1, new long[] { data.Length });
//Create Creation Property List
H5PropertyListId compressProperty =
H5P.create(H5P.PropertyListClass.DATASET_CREATE);
H5P.setShuffle(compressProperty);
H5P.setDeflate(compressProperty, 9);
H5P.setChunk(compressProperty, new long[] { data.Length } );
//Create DataSet with compression enabled
H5DataSetId dataSet = H5D.create(fileId, "Test", H5T.H5Type.STD_I32LE,
dataSpace, new H5PropertyListId(H5P.Template.DEFAULT), compressProperty, new
H5PropertyListId(H5P.Template.DEFAULT)); //this line has been used to turn
compression on
//H5DataSetId dataSet = H5D.create(fileId, "Test", H5T.H5Type.STD_I32LE,
dataSpace); //this line has been used to turn compression off
//Write data to file
H5D.write(dataSet, new H5DataTypeId(H5T.H5Type.NATIVE_INT), new
H5DataSpaceId(H5S.H5SType.ALL), new H5DataSpaceId(H5S.H5SType.ALL),
new H5PropertyListId(H5P.Template.DEFAULT), new H5Array<int>(data));
H5P.close(compressProperty);
H5D.close(dataSet);
H5S.close(dataSpace);
Regards,
Bas
On Fri, Jul 8, 2011 at 5:03 PM, Jonathan Kim <jkm@hdfgroup.org> wrote:
Hi Bas,
It sounds that h5repack did the work, but your code didn't do as expected
if the result files have same size. Just to mention, h5repack can
potential reduce more size when dealing with entire file as it refreshes all
the objects from previous changes.
According to your reply there are 3 HDF5 files; 1. original , 2. result
from h5repack , 3. result from your code.
Could you send us either the outputs from "h5dump -p -H <HDF5 file>" or
sanpshots from hdf-view's 'show properties' pop-up window for the 3 files?
Also could you send me how you ran h5repack?
Regards,
Jonathan
On 7/8/2011 2:44 AM, Bas Schoen wrote:
Hi Jonathan,
Thanks for your reply.
1. The size difference is between the file sizes of two HDF5 files. One
with compression with my code, and one with repack. (and an original not
compressed file which is the same size as with my compression code)
2. The chunk size I used is the count of items written to the dataset,
which in this case was 32509. If I open the two files with the hdf5 viewer
this value is shown in the properties in both files.
When creating the dataset I am not really sure whether to use the
dataTypeFile or the dataTypeMem. I tried both and the result are the same
(at least the difference between my code and repack stays the same, the file
size of both files do however change).
Regards,
Bas
On Thu, Jul 7, 2011 at 6:15 PM, Jonathan Kim <jkm@hdfgroup.org> wrote:
Hi Bas,
I have a couple questions.
1. About the size differences between h5repack and your code, is it the
size of HDF5 file or dataset?
2. About the chunk, what the size of chunk used for h5repack and your
code?
Jonathan
On 7/7/2011 10:00 AM, Bas Schoen wrote:
Hi,
I'm trying to create a hdf5 file with some compound datatypes with GZIP
compression. The development is done in C# using the HDF5DotNet dll.
I need these compression options: shuffle & gzip=9 and I would like to
achieve the same compression ration as h5repack.
The problem however is that the compressed file is the same size as the not
compressed file. If I use h5repack on that file, this size is 10 times
smaller. Can someone see what I am doing wrong?
Part of my implementation:
// We want to write a compound datatype, which is a struct containing a int
and some bytes
DataStruct[] data = new DataStruct[]{...}; //data has been filled
// Create the compound datatype for memory
H5DataTypeId dataTypeMem = H5T.create(H5T.CreateClass.COMPOUND,
(int)Marshal.SizeOf(default(DataStruct)));
H5T.insert(dataTypeMem, "A", (int)Marshal.OffsetOf(typeof( DataStruct ),
"A"), H5T.H5Type.NATIVE_INT);
H5T.insert(dataTypeMem, "B", (int)Marshal.OffsetOf(typeof( DataStruct ),
"B"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "C", (int)Marshal.OffsetOf(typeof( DataStruct ),
"C"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "D", (int)Marshal.OffsetOf(typeof( DataStruct ),
"D"), H5T.H5Type.NATIVE_UCHAR);
H5T.insert(dataTypeMem, "E", (int)Marshal.OffsetOf(typeof( DataStruct ),
"E"), H5T.H5Type.NATIVE_UCHAR);
// Create the compound datatype for the file. Because the standard
// types we are using for the file may have different sizes than
// the corresponding native types, we must manually calculate the
// offset of each member.
int offset = 0;
H5DataTypeId dataTypeFile = H5T.create(H5T.CreateClass.COMPOUND, (int)(4 +
1 + 1 + 1 + 1));
H5T.insert(dataTypeFile, "A", offset, H5T.H5Type.STD_U32BE);
offset += 4;
H5T.insert(dataTypeFile, "B", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "C", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "D", offset, H5T.H5Type.STD_U8BE);
offset += 1;
H5T.insert(dataTypeFile, "E", offset, H5T.H5Type.STD_U8BE);
offset += 1;
long[] dims = { (long) data.Count() };
try
{
// Create dataspace, with maximum = current
H5DataSpaceId dataSpace = H5S.create_simple(1, dims);
//Create compression properties
long[] chunk = dims; //What value should be used as chunk?
H5PropertyListId compressProperty =
H5P.create(H5P.PropertyListClass.DATASET_CREATE);
H5P.setShuffle(compressProperty);
H5P.setDeflate(compressProperty, 9)
H5P.setChunk(compressProperty, chunk);
// Create the data set
H5DataSetId dataSet = H5D.create(fileId, "NAME", dataTypeFile, dataSpace,
new H5PropertyListId(H5P.Template.DEFAULT), compressProperty, new
H5PropertyListId(H5P.Template.DEFAULT));
// Write data to it
H5D.write(dataSet, dataTypeMem, new H5DataSpaceId(H5S.H5SType.ALL), new
H5DataSpaceId(H5S.H5SType.ALL), new H5PropertyListId(H5P.Template.DEFAULT),
new H5Array< DataStruct >(data));
// Cleanup
H5T.close(dataTypeMem);
H5T.close(dataTypeFile);
H5D.close(dataSet);
H5P.close(compressProperty);
H5S.close(dataSpace);
}
catch
{
...
}
All steps are: creating datatype for both file and memory, creating
dataspace, creating the dataset with shuffle and compression creation
properties and finally writing the data to file.
It might be a bit difficult to check this code, but are there any steps
missing/incorrect?
Help appreciated.
Best regards,
Bas Schoen
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org