Allocated file space for chunked datasets

Hallo all,

I have a question concerning the allocated file space of chunked datasets. I use a chunked dataset with a fill value and incremental allocation time as followed:

    hsize_tchunk_dims[3]={10,10,10};
    const int rank = 3;

    H5::DSetCreatPropList cparms;
    cparms.setChunk( rank, chunk_dims );

    /* Set fill value for the dataset. */
    double fill_val = -999.999;
    cparms.setFillValue( datatype, &fill_val );

    /* Set allocation time. */
    cparms.setAllocTime(H5D_ALLOC_TIME_INCR);

    /*
    * create dataspace with min/max dimensions.
    */
    hsize_t min_dims[] = {10000,1000,1000};

    hsize_t max_dims[] = {
       H5S_UNLIMITED,
       H5S_UNLIMITED
    };

    H5::DataSpace dataspace( rank, min_dims, max_dims );

    ....

As I understand, memory is only allocated for chunks where data is actually written to. In other words, no data is allocated for chunks that contain only fill values. My question is, is this also true for the file space on the disk? My observance is, that memory for the whole dataset (also "empty" chunks) is allocated on the disk. I compared sparse matrices with full matrices and the allocated memory is nearly identical. Is there a way to reduce the size of sparse matrices on the disc? I am thinking of using compression. Is this a common procedure to achive this, or do you recommend something different?

Thank you in advance,

Jannis

It seems like simply enabling compression does not change anything. The file sizes for sparse and dense matrices still have the same size.

Can anyone give me a hint on how to work this out?

···

On 09.03.2010, at 12:01, Jan Linxweiler wrote:

Hallo all,

I have a question concerning the allocated file space of chunked datasets. I use a chunked dataset with a fill value and incremental allocation time as followed:

  hsize_tchunk_dims[3]={10,10,10};
  const int rank = 3;

  H5::DSetCreatPropList cparms;
  cparms.setChunk( rank, chunk_dims );

  /* Set fill value for the dataset. */
  double fill_val = -999.999;
  cparms.setFillValue( datatype, &fill_val );

  /* Set allocation time. */
  cparms.setAllocTime(H5D_ALLOC_TIME_INCR);

  /*
  * create dataspace with min/max dimensions.
  */
  hsize_t min_dims[] = {10000,1000,1000};

  hsize_t max_dims[] = {
     H5S_UNLIMITED,
     H5S_UNLIMITED,
     H5S_UNLIMITED
  };

  H5::DataSpace dataspace( rank, min_dims, max_dims );

  ....

As I understand, memory is only allocated for chunks where data is actually written to. In other words, no data is allocated for chunks that contain only fill values. My question is, is this also true for the file space on the disk? My observance is, that memory for the whole dataset (also "empty" chunks) is allocated on the disk. I compared sparse matrices with full matrices and the allocated memory is nearly identical. Is there a way to reduce the size of sparse matrices on the disc? I am thinking of using compression. Is this a common procedure to achive this, or do you recommend something different?

Thank you in advance,

Jannis

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Jan,

It seems like simply enabling compression does not change anything. The file sizes for sparse and dense matrices still have the same size.

Can anyone give me a hint on how to work this out?

  Hmm, I would think that you are correct in your expectations. Can you try without setting the fill value and see what happens?

  Quincey

···

On Mar 9, 2010, at 5:22 AM, Jan Linxweiler wrote:

On 09.03.2010, at 12:01, Jan Linxweiler wrote:

Hallo all,

I have a question concerning the allocated file space of chunked datasets. I use a chunked dataset with a fill value and incremental allocation time as followed:

hsize_tchunk_dims[3]={10,10,10};
const int rank = 3;

H5::DSetCreatPropList cparms;
cparms.setChunk( rank, chunk_dims );

/* Set fill value for the dataset. */
double fill_val = -999.999;
cparms.setFillValue( datatype, &fill_val );

/* Set allocation time. */
cparms.setAllocTime(H5D_ALLOC_TIME_INCR);

/*
* create dataspace with min/max dimensions.
*/
hsize_t min_dims[] = {10000,1000,1000};

hsize_t max_dims[] = {
    H5S_UNLIMITED,
    H5S_UNLIMITED,
    H5S_UNLIMITED
};

H5::DataSpace dataspace( rank, min_dims, max_dims );

....

As I understand, memory is only allocated for chunks where data is actually written to. In other words, no data is allocated for chunks that contain only fill values. My question is, is this also true for the file space on the disk? My observance is, that memory for the whole dataset (also "empty" chunks) is allocated on the disk. I compared sparse matrices with full matrices and the allocated memory is nearly identical. Is there a way to reduce the size of sparse matrices on the disc? I am thinking of using compression. Is this a common procedure to achive this, or do you recommend something different?

Thank you in advance,

Jannis

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Quincey,

thank you for your answer. I tried not setting a fill value, but I think the dataset is not valid than. I could not figure out how to identify not valid chunks. Also HDFView was not able to read those files. Therefore I suppose it is not the common way to use chunked datasets. Isn't it?

Jan

···

On 09.03.2010, at 18:20, Quincey Koziol wrote:

Hi Jan,

On Mar 9, 2010, at 5:22 AM, Jan Linxweiler wrote:

It seems like simply enabling compression does not change anything. The file sizes for sparse and dense matrices still have the same size.

Can anyone give me a hint on how to work this out?

  Hmm, I would think that you are correct in your expectations. Can you try without setting the fill value and see what happens?

  Quincey

On 09.03.2010, at 12:01, Jan Linxweiler wrote:

Hallo all,

I have a question concerning the allocated file space of chunked datasets. I use a chunked dataset with a fill value and incremental allocation time as followed:

hsize_tchunk_dims[3]={10,10,10};
const int rank = 3;

H5::DSetCreatPropList cparms;
cparms.setChunk( rank, chunk_dims );

/* Set fill value for the dataset. */
double fill_val = -999.999;
cparms.setFillValue( datatype, &fill_val );

/* Set allocation time. */
cparms.setAllocTime(H5D_ALLOC_TIME_INCR);

/*
* create dataspace with min/max dimensions.
*/
hsize_t min_dims[] = {10000,1000,1000};

hsize_t max_dims[] = {
   H5S_UNLIMITED,
   H5S_UNLIMITED,
   H5S_UNLIMITED
};

H5::DataSpace dataspace( rank, min_dims, max_dims );

....

As I understand, memory is only allocated for chunks where data is actually written to. In other words, no data is allocated for chunks that contain only fill values. My question is, is this also true for the file space on the disk? My observance is, that memory for the whole dataset (also "empty" chunks) is allocated on the disk. I compared sparse matrices with full matrices and the allocated memory is nearly identical. Is there a way to reduce the size of sparse matrices on the disc? I am thinking of using compression. Is this a common procedure to achive this, or do you recommend something different?

Thank you in advance,

Jannis

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Jan,

Hi Quincey,

thank you for your answer. I tried not setting a fill value, but I think the dataset is not valid than. I could not figure out how to identify not valid chunks. Also HDFView was not able to read those files. Therefore I suppose it is not the common way to use chunked datasets. Isn't it?

  When you didn't set the fill value, was the file smaller (because the chunks weren't allocated)? If that's true, then I think this is a bug in the HDF5 library and we should fix it so that it doesn't allocate chunks (when the allocation time is increment or late) when a fill value is defined.

  Quincey

···

On Mar 9, 2010, at 11:42 AM, Jan Linxweiler wrote:

Jan

On 09.03.2010, at 18:20, Quincey Koziol wrote:

Hi Jan,

On Mar 9, 2010, at 5:22 AM, Jan Linxweiler wrote:

It seems like simply enabling compression does not change anything. The file sizes for sparse and dense matrices still have the same size.

Can anyone give me a hint on how to work this out?

  Hmm, I would think that you are correct in your expectations. Can you try without setting the fill value and see what happens?

  Quincey

On 09.03.2010, at 12:01, Jan Linxweiler wrote:

Hallo all,

I have a question concerning the allocated file space of chunked datasets. I use a chunked dataset with a fill value and incremental allocation time as followed:

hsize_tchunk_dims[3]={10,10,10};
const int rank = 3;

H5::DSetCreatPropList cparms;
cparms.setChunk( rank, chunk_dims );

/* Set fill value for the dataset. */
double fill_val = -999.999;
cparms.setFillValue( datatype, &fill_val );

/* Set allocation time. */
cparms.setAllocTime(H5D_ALLOC_TIME_INCR);

/*
* create dataspace with min/max dimensions.
*/
hsize_t min_dims[] = {10000,1000,1000};

hsize_t max_dims[] = {
  H5S_UNLIMITED,
  H5S_UNLIMITED,
  H5S_UNLIMITED
};

H5::DataSpace dataspace( rank, min_dims, max_dims );

....

As I understand, memory is only allocated for chunks where data is actually written to. In other words, no data is allocated for chunks that contain only fill values. My question is, is this also true for the file space on the disk? My observance is, that memory for the whole dataset (also "empty" chunks) is allocated on the disk. I compared sparse matrices with full matrices and the allocated memory is nearly identical. Is there a way to reduce the size of sparse matrices on the disc? I am thinking of using compression. Is this a common procedure to achive this, or do you recommend something different?

Thank you in advance,

Jannis

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Quincey,

I tried again.. I did not set the the fill value but set the allocation time to be incremental. In this case the fill value equals 0... The file size is the same...

Jan

···

On 09.03.2010, at 18:45, Quincey Koziol wrote:

Hi Jan,

On Mar 9, 2010, at 11:42 AM, Jan Linxweiler wrote:

Hi Quincey,

thank you for your answer. I tried not setting a fill value, but I think the dataset is not valid than. I could not figure out how to identify not valid chunks. Also HDFView was not able to read those files. Therefore I suppose it is not the common way to use chunked datasets. Isn't it?

  When you didn't set the fill value, was the file smaller (because the chunks weren't allocated)? If that's true, then I think this is a bug in the HDF5 library and we should fix it so that it doesn't allocate chunks (when the allocation time is increment or late) when a fill value is defined.

  Quincey

Jan

On 09.03.2010, at 18:20, Quincey Koziol wrote:

Hi Jan,

On Mar 9, 2010, at 5:22 AM, Jan Linxweiler wrote:

It seems like simply enabling compression does not change anything. The file sizes for sparse and dense matrices still have the same size.

Can anyone give me a hint on how to work this out?

  Hmm, I would think that you are correct in your expectations. Can you try without setting the fill value and see what happens?

  Quincey

On 09.03.2010, at 12:01, Jan Linxweiler wrote:

Hallo all,

I have a question concerning the allocated file space of chunked datasets. I use a chunked dataset with a fill value and incremental allocation time as followed:

hsize_tchunk_dims[3]={10,10,10};
const int rank = 3;

H5::DSetCreatPropList cparms;
cparms.setChunk( rank, chunk_dims );

/* Set fill value for the dataset. */
double fill_val = -999.999;
cparms.setFillValue( datatype, &fill_val );

/* Set allocation time. */
cparms.setAllocTime(H5D_ALLOC_TIME_INCR);

/*
* create dataspace with min/max dimensions.
*/
hsize_t min_dims[] = {10000,1000,1000};

hsize_t max_dims[] = {
H5S_UNLIMITED,
H5S_UNLIMITED,
H5S_UNLIMITED
};

H5::DataSpace dataspace( rank, min_dims, max_dims );

....

As I understand, memory is only allocated for chunks where data is actually written to. In other words, no data is allocated for chunks that contain only fill values. My question is, is this also true for the file space on the disk? My observance is, that memory for the whole dataset (also "empty" chunks) is allocated on the disk. I compared sparse matrices with full matrices and the allocated memory is nearly identical. Is there a way to reduce the size of sparse matrices on the disc? I am thinking of using compression. Is this a common procedure to achive this, or do you recommend something different?

Thank you in advance,

Jannis

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Jan,

Hi Quincey,

I tried again.. I did not set the the fill value but set the allocation time to be incremental. In this case the fill value equals 0... The file size is the same...

  Hmm, that's odd. Can you send me a C program that shows the issue?

  Quincey

···

On Mar 9, 2010, at 12:09 PM, Jan Linxweiler wrote:

Jan

On 09.03.2010, at 18:45, Quincey Koziol wrote:

Hi Jan,

On Mar 9, 2010, at 11:42 AM, Jan Linxweiler wrote:

Hi Quincey,

thank you for your answer. I tried not setting a fill value, but I think the dataset is not valid than. I could not figure out how to identify not valid chunks. Also HDFView was not able to read those files. Therefore I suppose it is not the common way to use chunked datasets. Isn't it?

  When you didn't set the fill value, was the file smaller (because the chunks weren't allocated)? If that's true, then I think this is a bug in the HDF5 library and we should fix it so that it doesn't allocate chunks (when the allocation time is increment or late) when a fill value is defined.

  Quincey

Jan

On 09.03.2010, at 18:20, Quincey Koziol wrote:

Hi Jan,

On Mar 9, 2010, at 5:22 AM, Jan Linxweiler wrote:

It seems like simply enabling compression does not change anything. The file sizes for sparse and dense matrices still have the same size.

Can anyone give me a hint on how to work this out?

  Hmm, I would think that you are correct in your expectations. Can you try without setting the fill value and see what happens?

  Quincey

On 09.03.2010, at 12:01, Jan Linxweiler wrote:

Hallo all,

I have a question concerning the allocated file space of chunked datasets. I use a chunked dataset with a fill value and incremental allocation time as followed:

hsize_tchunk_dims[3]={10,10,10};
const int rank = 3;

H5::DSetCreatPropList cparms;
cparms.setChunk( rank, chunk_dims );

/* Set fill value for the dataset. */
double fill_val = -999.999;
cparms.setFillValue( datatype, &fill_val );

/* Set allocation time. */
cparms.setAllocTime(H5D_ALLOC_TIME_INCR);

/*
* create dataspace with min/max dimensions.
*/
hsize_t min_dims[] = {10000,1000,1000};

hsize_t max_dims[] = {
H5S_UNLIMITED,
H5S_UNLIMITED,
H5S_UNLIMITED
};

H5::DataSpace dataspace( rank, min_dims, max_dims );

....

As I understand, memory is only allocated for chunks where data is actually written to. In other words, no data is allocated for chunks that contain only fill values. My question is, is this also true for the file space on the disk? My observance is, that memory for the whole dataset (also "empty" chunks) is allocated on the disk. I compared sparse matrices with full matrices and the allocated memory is nearly identical. Is there a way to reduce the size of sparse matrices on the disc? I am thinking of using compression. Is this a common procedure to achive this, or do you recommend something different?

Thank you in advance,

Jannis

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Quincey,

I prepared a little program to show you the issue and now it works like charm. In this example every second chunk is empty and the file size seems to be half of the completely filled dataset. Is this reasonable? Or did I possibly make a mistake again? Maybe I messed up with the chunk size is my real world example. I'm going to find out what the difference in the two is and than report to you.

Thank you very, very much so far.

Jan

···

On 09.03.2010, at 19:19, Quincey Koziol wrote:

Hi Jan,

On Mar 9, 2010, at 12:09 PM, Jan Linxweiler wrote:

Hi Quincey,

I tried again.. I did not set the the fill value but set the allocation time to be incremental. In this case the fill value equals 0... The file size is the same...

  Hmm, that's odd. Can you send me a C program that shows the issue?

  Quincey

Jan

On 09.03.2010, at 18:45, Quincey Koziol wrote:

Hi Jan,

On Mar 9, 2010, at 11:42 AM, Jan Linxweiler wrote:

Hi Quincey,

thank you for your answer. I tried not setting a fill value, but I think the dataset is not valid than. I could not figure out how to identify not valid chunks. Also HDFView was not able to read those files. Therefore I suppose it is not the common way to use chunked datasets. Isn't it?

  When you didn't set the fill value, was the file smaller (because the chunks weren't allocated)? If that's true, then I think this is a bug in the HDF5 library and we should fix it so that it doesn't allocate chunks (when the allocation time is increment or late) when a fill value is defined.

  Quincey

Jan

On 09.03.2010, at 18:20, Quincey Koziol wrote:

Hi Jan,

On Mar 9, 2010, at 5:22 AM, Jan Linxweiler wrote:

It seems like simply enabling compression does not change anything. The file sizes for sparse and dense matrices still have the same size.

Can anyone give me a hint on how to work this out?

  Hmm, I would think that you are correct in your expectations. Can you try without setting the fill value and see what happens?

  Quincey

On 09.03.2010, at 12:01, Jan Linxweiler wrote:

Hallo all,

I have a question concerning the allocated file space of chunked datasets. I use a chunked dataset with a fill value and incremental allocation time as followed:

hsize_tchunk_dims[3]={10,10,10};
const int rank = 3;

H5::DSetCreatPropList cparms;
cparms.setChunk( rank, chunk_dims );

/* Set fill value for the dataset. */
double fill_val = -999.999;
cparms.setFillValue( datatype, &fill_val );

/* Set allocation time. */
cparms.setAllocTime(H5D_ALLOC_TIME_INCR);

/*
* create dataspace with min/max dimensions.
*/
hsize_t min_dims[] = {10000,1000,1000};

hsize_t max_dims[] = {
H5S_UNLIMITED,
H5S_UNLIMITED,
H5S_UNLIMITED
};

H5::DataSpace dataspace( rank, min_dims, max_dims );

....

As I understand, memory is only allocated for chunks where data is actually written to. In other words, no data is allocated for chunks that contain only fill values. My question is, is this also true for the file space on the disk? My observance is, that memory for the whole dataset (also "empty" chunks) is allocated on the disk. I compared sparse matrices with full matrices and the allocated memory is nearly identical. Is there a way to reduce the size of sparse matrices on the disc? I am thinking of using compression. Is this a common procedure to achive this, or do you recommend something different?

Thank you in advance,

Jannis

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Jan,

Hi Quincey,

I prepared a little program to show you the issue and now it works like charm. In this example every second chunk is empty and the file size seems to be half of the completely filled dataset. Is this reasonable?

  Yes, that's what I would expect.

Or did I possibly make a mistake again? Maybe I messed up with the chunk size is my real world example. I'm going to find out what the difference in the two is and than report to you.

  Yes, please let us know what the difference is.

    Quincey

···

On Mar 9, 2010, at 2:31 PM, Jan Linxweiler wrote:

Thank you very, very much so far.

Jan

On 09.03.2010, at 19:19, Quincey Koziol wrote:

Hi Jan,

On Mar 9, 2010, at 12:09 PM, Jan Linxweiler wrote:

Hi Quincey,

I tried again.. I did not set the the fill value but set the allocation time to be incremental. In this case the fill value equals 0... The file size is the same...

  Hmm, that's odd. Can you send me a C program that shows the issue?

  Quincey

Jan

On 09.03.2010, at 18:45, Quincey Koziol wrote:

Hi Jan,

On Mar 9, 2010, at 11:42 AM, Jan Linxweiler wrote:

Hi Quincey,

thank you for your answer. I tried not setting a fill value, but I think the dataset is not valid than. I could not figure out how to identify not valid chunks. Also HDFView was not able to read those files. Therefore I suppose it is not the common way to use chunked datasets. Isn't it?

  When you didn't set the fill value, was the file smaller (because the chunks weren't allocated)? If that's true, then I think this is a bug in the HDF5 library and we should fix it so that it doesn't allocate chunks (when the allocation time is increment or late) when a fill value is defined.

  Quincey

Jan

On 09.03.2010, at 18:20, Quincey Koziol wrote:

Hi Jan,

On Mar 9, 2010, at 5:22 AM, Jan Linxweiler wrote:

It seems like simply enabling compression does not change anything. The file sizes for sparse and dense matrices still have the same size.

Can anyone give me a hint on how to work this out?

  Hmm, I would think that you are correct in your expectations. Can you try without setting the fill value and see what happens?

  Quincey

On 09.03.2010, at 12:01, Jan Linxweiler wrote:

Hallo all,

I have a question concerning the allocated file space of chunked datasets. I use a chunked dataset with a fill value and incremental allocation time as followed:

hsize_tchunk_dims[3]={10,10,10};
const int rank = 3;

H5::DSetCreatPropList cparms;
cparms.setChunk( rank, chunk_dims );

/* Set fill value for the dataset. */
double fill_val = -999.999;
cparms.setFillValue( datatype, &fill_val );

/* Set allocation time. */
cparms.setAllocTime(H5D_ALLOC_TIME_INCR);

/*
* create dataspace with min/max dimensions.
*/
hsize_t min_dims[] = {10000,1000,1000};

hsize_t max_dims[] = {
H5S_UNLIMITED,
H5S_UNLIMITED,
H5S_UNLIMITED
};

H5::DataSpace dataspace( rank, min_dims, max_dims );

....

As I understand, memory is only allocated for chunks where data is actually written to. In other words, no data is allocated for chunks that contain only fill values. My question is, is this also true for the file space on the disk? My observance is, that memory for the whole dataset (also "empty" chunks) is allocated on the disk. I compared sparse matrices with full matrices and the allocated memory is nearly identical. Is there a way to reduce the size of sparse matrices on the disc? I am thinking of using compression. Is this a common procedure to achive this, or do you recommend something different?

Thank you in advance,

Jannis

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Quincey,

I managed to work it out. Seems like I messed up the chunk size. Now it works like expected. Thank you so much for your support!!

Jan

···

On 09.03.2010, at 21:41, Quincey Koziol wrote:

Hi Jan,

On Mar 9, 2010, at 2:31 PM, Jan Linxweiler wrote:

Hi Quincey,

I prepared a little program to show you the issue and now it works like charm. In this example every second chunk is empty and the file size seems to be half of the completely filled dataset. Is this reasonable?

  Yes, that's what I would expect.

Or did I possibly make a mistake again? Maybe I messed up with the chunk size is my real world example. I'm going to find out what the difference in the two is and than report to you.

  Yes, please let us know what the difference is.

    Quincey

Thank you very, very much so far.

Jan

On 09.03.2010, at 19:19, Quincey Koziol wrote:

Hi Jan,

On Mar 9, 2010, at 12:09 PM, Jan Linxweiler wrote:

Hi Quincey,

I tried again.. I did not set the the fill value but set the allocation time to be incremental. In this case the fill value equals 0... The file size is the same...

  Hmm, that's odd. Can you send me a C program that shows the issue?

  Quincey

Jan

On 09.03.2010, at 18:45, Quincey Koziol wrote:

Hi Jan,

On Mar 9, 2010, at 11:42 AM, Jan Linxweiler wrote:

Hi Quincey,

thank you for your answer. I tried not setting a fill value, but I think the dataset is not valid than. I could not figure out how to identify not valid chunks. Also HDFView was not able to read those files. Therefore I suppose it is not the common way to use chunked datasets. Isn't it?

  When you didn't set the fill value, was the file smaller (because the chunks weren't allocated)? If that's true, then I think this is a bug in the HDF5 library and we should fix it so that it doesn't allocate chunks (when the allocation time is increment or late) when a fill value is defined.

  Quincey

Jan

On 09.03.2010, at 18:20, Quincey Koziol wrote:

Hi Jan,

On Mar 9, 2010, at 5:22 AM, Jan Linxweiler wrote:

It seems like simply enabling compression does not change anything. The file sizes for sparse and dense matrices still have the same size.

Can anyone give me a hint on how to work this out?

  Hmm, I would think that you are correct in your expectations. Can you try without setting the fill value and see what happens?

  Quincey

On 09.03.2010, at 12:01, Jan Linxweiler wrote:

Hallo all,

I have a question concerning the allocated file space of chunked datasets. I use a chunked dataset with a fill value and incremental allocation time as followed:

hsize_tchunk_dims[3]={10,10,10};
const int rank = 3;

H5::DSetCreatPropList cparms;
cparms.setChunk( rank, chunk_dims );

/* Set fill value for the dataset. */
double fill_val = -999.999;
cparms.setFillValue( datatype, &fill_val );

/* Set allocation time. */
cparms.setAllocTime(H5D_ALLOC_TIME_INCR);

/*
* create dataspace with min/max dimensions.
*/
hsize_t min_dims[] = {10000,1000,1000};

hsize_t max_dims[] = {
H5S_UNLIMITED,
H5S_UNLIMITED,
H5S_UNLIMITED
};

H5::DataSpace dataspace( rank, min_dims, max_dims );

....

As I understand, memory is only allocated for chunks where data is actually written to. In other words, no data is allocated for chunks that contain only fill values. My question is, is this also true for the file space on the disk? My observance is, that memory for the whole dataset (also "empty" chunks) is allocated on the disk. I compared sparse matrices with full matrices and the allocated memory is nearly identical. Is there a way to reduce the size of sparse matrices on the disc? I am thinking of using compression. Is this a common procedure to achive this, or do you recommend something different?

Thank you in advance,

Jannis

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Jan,

Hi Quincey,

I managed to work it out. Seems like I messed up the chunk size. Now it works like expected. Thank you so much for your support!!

  Excellent! I'm glad things are working correctly now, cheers,
    Quincey

···

On Mar 10, 2010, at 7:31 AM, Jan Linxweiler wrote:

Jan
On 09.03.2010, at 21:41, Quincey Koziol wrote:

Hi Jan,

On Mar 9, 2010, at 2:31 PM, Jan Linxweiler wrote:

Hi Quincey,

I prepared a little program to show you the issue and now it works like charm. In this example every second chunk is empty and the file size seems to be half of the completely filled dataset. Is this reasonable?

  Yes, that's what I would expect.

Or did I possibly make a mistake again? Maybe I messed up with the chunk size is my real world example. I'm going to find out what the difference in the two is and than report to you.

  Yes, please let us know what the difference is.

    Quincey

Thank you very, very much so far.

Jan

On 09.03.2010, at 19:19, Quincey Koziol wrote:

Hi Jan,

On Mar 9, 2010, at 12:09 PM, Jan Linxweiler wrote:

Hi Quincey,

I tried again.. I did not set the the fill value but set the allocation time to be incremental. In this case the fill value equals 0... The file size is the same...

  Hmm, that's odd. Can you send me a C program that shows the issue?

  Quincey

Jan

On 09.03.2010, at 18:45, Quincey Koziol wrote:

Hi Jan,

On Mar 9, 2010, at 11:42 AM, Jan Linxweiler wrote:

Hi Quincey,

thank you for your answer. I tried not setting a fill value, but I think the dataset is not valid than. I could not figure out how to identify not valid chunks. Also HDFView was not able to read those files. Therefore I suppose it is not the common way to use chunked datasets. Isn't it?

  When you didn't set the fill value, was the file smaller (because the chunks weren't allocated)? If that's true, then I think this is a bug in the HDF5 library and we should fix it so that it doesn't allocate chunks (when the allocation time is increment or late) when a fill value is defined.

  Quincey

Jan

On 09.03.2010, at 18:20, Quincey Koziol wrote:

Hi Jan,

On Mar 9, 2010, at 5:22 AM, Jan Linxweiler wrote:

It seems like simply enabling compression does not change anything. The file sizes for sparse and dense matrices still have the same size.

Can anyone give me a hint on how to work this out?

  Hmm, I would think that you are correct in your expectations. Can you try without setting the fill value and see what happens?

  Quincey

On 09.03.2010, at 12:01, Jan Linxweiler wrote:

Hallo all,

I have a question concerning the allocated file space of chunked datasets. I use a chunked dataset with a fill value and incremental allocation time as followed:

hsize_tchunk_dims[3]={10,10,10};
const int rank = 3;

H5::DSetCreatPropList cparms;
cparms.setChunk( rank, chunk_dims );

/* Set fill value for the dataset. */
double fill_val = -999.999;
cparms.setFillValue( datatype, &fill_val );

/* Set allocation time. */
cparms.setAllocTime(H5D_ALLOC_TIME_INCR);

/*
* create dataspace with min/max dimensions.
*/
hsize_t min_dims[] = {10000,1000,1000};

hsize_t max_dims[] = {
H5S_UNLIMITED,
H5S_UNLIMITED,
H5S_UNLIMITED
};

H5::DataSpace dataspace( rank, min_dims, max_dims );

....

As I understand, memory is only allocated for chunks where data is actually written to. In other words, no data is allocated for chunks that contain only fill values. My question is, is this also true for the file space on the disk? My observance is, that memory for the whole dataset (also "empty" chunks) is allocated on the disk. I compared sparse matrices with full matrices and the allocated memory is nearly identical. Is there a way to reduce the size of sparse matrices on the disc? I am thinking of using compression. Is this a common procedure to achive this, or do you recommend something different?

Thank you in advance,

Jannis

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org