Is unlimited dimension the reasons for file size difference ?

Hi,

  I created two files from the same source with different data dimensions

  lucy.cpm is created with a fix dataset
  ilucy.cpm is created with one of the dataset dimension as unlimited and I
use hyperslab to incrementally append data by extending the space.

  lucy.cpm file size is 673,656,116 bytes
  ilucy.cpm file size is 2,472,905,160 bytes

  Here is the h5dump -H output for each of the files

HDF5 "lucy.cpm" {
GROUP "/" {
   DATASET "bounding_box" {
      DATATYPE H5T_IEEE_F32LE
      DATASPACE SIMPLE { ( 6 ) / ( 6 ) }
   }
   DATASET "face_indices" {
      DATATYPE H5T_STD_U32LE
      DATASPACE SIMPLE { ( 28055742, 3 ) / ( 28055742, 3 ) }
   }
   DATASET "vertex_positions" {
      DATATYPE H5T_IEEE_F32LE
      DATASPACE SIMPLE { ( 14028019, 3 ) / ( 14028019, 3 ) }
   }
   DATASET "voxel_bounding_boxes" {
      DATATYPE H5T_IEEE_F32LE
      DATASPACE SIMPLE { ( 64, 6 ) / ( 64, 6 ) }
   }
   DATASET "voxel_face_indices" {
      DATATYPE H5T_VLEN { H5T_STD_U32LE}
      DATASPACE SIMPLE { ( 64 ) / ( 64 ) }
   }
   DATASET "voxel_vertex_indices" {
      DATATYPE H5T_VLEN { H5T_STD_U32LE}
      DATASPACE SIMPLE { ( 64 ) / ( 64 ) }
   }
}
}

HDF5 "ilucy.cpm" {
GROUP "/" {
   DATASET "bounding_box" {
      DATATYPE H5T_IEEE_F32LE
      DATASPACE SIMPLE { ( 6 ) / ( 6 ) }
   }
   DATASET "face_indices" {
      DATATYPE H5T_STD_U32LE
      DATASPACE SIMPLE { ( 28055742, 3 ) / ( H5S_UNLIMITED, 3 ) }
   }
   DATASET "vertex_positions" {
      DATATYPE H5T_IEEE_F32LE
      DATASPACE SIMPLE { ( 14027872, 3 ) / ( H5S_UNLIMITED, 3 ) }
   }
}
}

Cheers

···

--
Nicholas Yue
Graphics - Arnold, Alembic, RenderMan, OpenGL, HDF5
Custom Dev - C++ porting, OSX, Linux, Windows
http://au.linkedin.com/in/nicholasyue

Nick, the unlimited dimension per se is not the reason.
I suspect your chunk size is way too small.
(That will bloat your chunk index and that's what might be going on.)

What are the chunk sizes on /face_indices and /vertex positions ?

G.

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Nicholas Yue
Sent: Sunday, November 24, 2013 1:10 AM
To: HDF Users Discussion List
Subject: [Hdf-forum] Is unlimited dimension the reasons for file size difference ?

Hi,

  I created two files from the same source with different data dimensions

  lucy.cpm is created with a fix dataset
  ilucy.cpm is created with one of the dataset dimension as unlimited and I use hyperslab to incrementally append data by extending the space.

  lucy.cpm file size is 673,656,116 bytes
  ilucy.cpm file size is 2,472,905,160 bytes

  Here is the h5dump -H output for each of the files

HDF5 "lucy.cpm" {
GROUP "/" {
   DATASET "bounding_box" {
      DATATYPE H5T_IEEE_F32LE
      DATASPACE SIMPLE { ( 6 ) / ( 6 ) }
   }
   DATASET "face_indices" {
      DATATYPE H5T_STD_U32LE
      DATASPACE SIMPLE { ( 28055742, 3 ) / ( 28055742, 3 ) }
   }
   DATASET "vertex_positions" {
      DATATYPE H5T_IEEE_F32LE
      DATASPACE SIMPLE { ( 14028019, 3 ) / ( 14028019, 3 ) }
   }
   DATASET "voxel_bounding_boxes" {
      DATATYPE H5T_IEEE_F32LE
      DATASPACE SIMPLE { ( 64, 6 ) / ( 64, 6 ) }
   }
   DATASET "voxel_face_indices" {
      DATATYPE H5T_VLEN { H5T_STD_U32LE}
      DATASPACE SIMPLE { ( 64 ) / ( 64 ) }
   }
   DATASET "voxel_vertex_indices" {
      DATATYPE H5T_VLEN { H5T_STD_U32LE}
      DATASPACE SIMPLE { ( 64 ) / ( 64 ) }
   }
}
}

HDF5 "ilucy.cpm" {
GROUP "/" {
   DATASET "bounding_box" {
      DATATYPE H5T_IEEE_F32LE
      DATASPACE SIMPLE { ( 6 ) / ( 6 ) }
   }
   DATASET "face_indices" {
      DATATYPE H5T_STD_U32LE
      DATASPACE SIMPLE { ( 28055742, 3 ) / ( H5S_UNLIMITED, 3 ) }
   }
   DATASET "vertex_positions" {
      DATATYPE H5T_IEEE_F32LE
      DATASPACE SIMPLE { ( 14027872, 3 ) / ( H5S_UNLIMITED, 3 ) }
   }
}
}

Cheers

--
Nicholas Yue
Graphics - Arnold, Alembic, RenderMan, OpenGL, HDF5
Custom Dev - C++ porting, OSX, Linux, Windows
http://au.linkedin.com/in/nicholasyue

Hi Gerd,

     The chunk size is indeed the problem (I am new to chunking).

     I set it to '1' :frowning:

     I went back to the documentation about chunking and about b-tree and figure that I'd better get a good understanding of how chunking is supported.

     When I set it to something I felt reasonable in my usage pattern i.e. 8196, my HDF file becomes smaller than the original input binary file and the previous non-unlimited version of the same HDF5 file.

     All is well now.

Cheers

···

On 25/11/13 2:34 AM, Gerd Heber wrote:

Nick, the unlimited dimension per se is not the reason.

I suspect your chunk size is way too small.

(That will bloat your chunk index and that's what might be going on.)

What are the chunk sizes on /face_indices and /vertex positions ?

--
Nicholas Yue
Graphics - RenderMan, Visualization, OpenGL, HDF5
Custom Dev - C++ porting, OSX, Linux, Windows
http://au.linkedin.com/in/nicholasyue