setting chunk dimensions

Hi!
I´m trying to write my data via hyperslabs, ad there is also a nice example how to do it on the HDF5.org webpage. I just don´t understand how to set the chunk_dims, or, more precisely, what do this chunking dimensions do?
Here is the part of the example code using the chunk-dims:

nt
main (void)
{
    hid_t file; /* handles */
    hid_t dataspace, dataset;
    hid_t filespace;
    hid_t cparms;
    hsize_t dims[2] = { 3, 3}; /*
                         * dataset dimensions
                         * at the creation time
                         */
    hsize_t dims1[2] = { 3, 3}; /* data1 dimensions */
    hsize_t dims2[2] = { 7, 1}; /* data2 dimensions */
    
    hsize_t dims3[2] = { 2, 2}; /* data3 dimensions */

    hsize_t maxdims[2] = {H5S_UNLIMITED, H5S_UNLIMITED};
    hsize_t chunk_dims[2] ={2, 5};
    hsize_t size[2];
    hsize_t offset[2];

    herr_t status;

    int data1[3][3] = { {1, 1, 1}, /* data to write */
                {1, 1, 1},
                {1, 1, 1} };

    int data2[7] = { 2, 2, 2, 2, 2, 2, 2};

    int data3[2][2] = { {3, 3},
                {3, 3} };
    int fillvalue = 0;

    /*
     * Create the data space with unlimited dimensions.
     */
    dataspace = H5Screate_simple(RANK, dims, maxdims);

    /*
     * Create a new file. If file exists its contents will be overwritten.
     */
    file = H5Fcreate(H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

    /*
     * Modify dataset creation properties, i.e. enable chunking.
     */
    cparms = H5Pcreate(H5P_DATASET_CREATE);
    status = H5Pset_chunk( cparms, RANK, chunk_dims);
    status = H5Pset_fill_value (cparms, H5T_NATIVE_INT, &fillvalue );

chunk_dims is set to {2,5}, which I don´t understand, because the initial dataset is 3x3 and is then extended to 10x3 - why the {2,5}?

thx,
NH

     * Create a new dataset within the file using cparms
     * creation properties.
     */
    dataset = H5Dcreate2(file, DATASETNAME, H5T_NATIVE_INT, dataspace, H5P_DEFAULT,
            cparms, H5P_DEFAULT);

···

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

Hi Natalie,

A Tuesday 21 October 2008, Natalie Happenhofer escrigué:

Hi!
I´m trying to write my data via hyperslabs, ad there is also a nice
example how to do it on the HDF5.org webpage. I just don´t understand
how to set the chunk_dims, or, more precisely, what do this chunking
dimensions do? Here is the part of the example code using the
chunk-dims:

[clip]

hsize_t chunk_dims[2] ={2, 5};

[clip]

chunk_dims is set to {2,5}, which I don´t understand, because the
initial dataset is 3x3 and is then extended to 10x3 - why the {2,5}?

Because you told HDF5 that your chunk dimensions are {2,5} (see above).
Chunk sizes don't have nothing to do with the dimensions of your
dataset, but rather on the way the I/O is done.

I'd recommend you to carefully read the "Datasets" section of the User's
Guide at:

http://www.hdfgroup.org/HDF5/doc/UG/UG_frame10Datasets.html

and particularly the section labeled as "Chunked".

···

--
Francesc Alted
Freelance Developer & Consultant

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Natalie,

You can think of the hyperslabs as the way you logically access (write or read) subsets of a complete dataset from your application's perspective. By specifying different hyperslabs you can access different subsets of the dataset. You can also access the entire dataset -- it just depends on what you specify in the write or read.

Chunked storage defines how the dataset is physically written to / read from disk. The chunk size is set when the dataset is created and remains constant. Typically you want to chose a chunk layout that will perform well for the most frequent logical access pattern -- or for the access pattern that you want the best performance with.

So hyberslabs are about logical access and chunks are about physical storage organization on disk. Both hyperslabs and chunks will have the same number of dimensions as the dataset. But, the dimension *sizes* for both hyberslabs and chunks may be (and usually are) different than your dataset's dimension sizes.

The interaction of chunk sizes, hyperslab selections, and various other factors can dramatically impact performance.

You may be interested in sections 4.1 and 5 of the NetCDF-4 Performance Report found at www.hdfgroup.org/pubs/papers. They give some explanation about hyperslabs and chunked storage, and how performance may vary, as well as how chunked storage may impact filesize.

-Ruth

···

On Oct 21, 2008, at 3:10 AM, Natalie Happenhofer wrote:

Hi!
I´m trying to write my data via hyperslabs, ad there is also a nice example how to do it on the HDF5.org webpage. I just don´t understand how to set the chunk_dims, or, more precisely, what do this chunking dimensions do?
Here is the part of the example code using the chunk-dims:

nt
main (void)
{
    hid_t file; /* handles */
    hid_t dataspace, dataset;
    hid_t filespace;
    hid_t cparms;
    hsize_t dims[2] = { 3, 3}; /*
                         * dataset dimensions
                         * at the creation time
                         */
    hsize_t dims1[2] = { 3, 3}; /* data1 dimensions */
    hsize_t dims2[2] = { 7, 1}; /* data2 dimensions */

    hsize_t dims3[2] = { 2, 2}; /* data3 dimensions */

    hsize_t maxdims[2] = {H5S_UNLIMITED, H5S_UNLIMITED};
    hsize_t chunk_dims[2] ={2, 5};
    hsize_t size[2];
    hsize_t offset[2];

    herr_t status;

    int data1[3][3] = { {1, 1, 1}, /* data to write */
                {1, 1, 1},
                {1, 1, 1} };

    int data2[7] = { 2, 2, 2, 2, 2, 2, 2};

    int data3[2][2] = { {3, 3},
                {3, 3} };
    int fillvalue = 0;

    /*
     * Create the data space with unlimited dimensions.
     */
    dataspace = H5Screate_simple(RANK, dims, maxdims);

    /*
     * Create a new file. If file exists its contents will be overwritten.
     */
    file = H5Fcreate(H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

    /*
     * Modify dataset creation properties, i.e. enable chunking.
     */
    cparms = H5Pcreate(H5P_DATASET_CREATE);
    status = H5Pset_chunk( cparms, RANK, chunk_dims);
    status = H5Pset_fill_value (cparms, H5T_NATIVE_INT, &fillvalue );

chunk_dims is set to {2,5}, which I don´t understand, because the initial dataset is 3x3 and is then extended to 10x3 - why the {2,5}?

thx,
NH

     * Create a new dataset within the file using cparms
     * creation properties.
     */
    dataset = H5Dcreate2(file, DATASETNAME, H5T_NATIVE_INT, dataspace, H5P_DEFAULT,
            cparms, H5P_DEFAULT);

Express yourself instantly with MSN Messenger! MSN Messenger

------------------------------------------------------------
Ruth Aydt
The HDF Group
1901 South First Street, Suite C-2
Champaign, IL 61820

aydt@hdfgroup.org (217)265-7837
------------------------------------------------------------