Rookie question

george.lewandowski · September 15, 2009, 6:47pm

I feel silly asking such a basic question, but I need to be sure.

Can I define a chunk size larger than the amount of data I am storing?
I have code that creates a dataset that is appended to many times, and
for reasons unknown to me right now, the first call is truncating the
default chunk size (64k/sizeof) to the number of elements in the first
call, so I am winding up with a chunk size of 1. What would be the
negatives of defining too large a chunk size? If I always define the
chunk size to be 64K bytes, will that much space always be used in the
file?

George Lewandowski
(314)777-7890
Mail Code S270-2204
Building 270-E Level 2E Room 20E
P-8A

Quincey_Koziol · September 15, 2009, 6:54pm

Hi George!

I feel silly asking such a basic question, but I need to be sure.

Can I define a chunk size larger than the amount of data I am storing?

Yes, definitely.

I have code that creates a dataset that is appended to many times, and
for reasons unknown to me right now, the first call is truncating the
default chunk size (64k/sizeof) to the number of elements in the first
call, so I am winding up with a chunk size of 1. What would be the
negatives of defining too large a chunk size? If I always define the
chunk size to be 64K bytes, will that much space always be used in the
file?

The HDF5 library currently always stores full chunks, even if they are only partially covered by the current size of the dataspace in a dimension. We will be changing that for the 1.10.0 release however, so it should eventually be more space efficient.

Quincey

···

On Sep 15, 2009, at 1:47 PM, Lewandowski, George wrote:

george.lewandowski · September 15, 2009, 7:07pm

Okay, that tells me what I need to know, thanks!

But, I am wondering what this change in 1.10 will do. If I store a
single 4 byte integer into a dataset that has a chunk size of, say 16K,
will the library store only 4 bytes and reallocate every time I append,
or will there be some mechanism to allow efficient appending?

George Lewandowski
(314)777-7890
Mail Code S270-2204
Building 270-E Level 2E Room 20E
P-8A

···

-----Original Message-----
From: Quincey Koziol [mailto:koziol@hdfgroup.org]
Sent: Tuesday, September 15, 2009 1:54 PM
To: hdf-forum@hdfgroup.org
Subject: Re: [Hdf-forum] Rookie question

Hi George!

On Sep 15, 2009, at 1:47 PM, Lewandowski, George wrote:

I feel silly asking such a basic question, but I need to be sure.

Can I define a chunk size larger than the amount of data I am storing?

Yes, definitely.

I have code that creates a dataset that is appended to many times, and

for reasons unknown to me right now, the first call is truncating the
default chunk size (64k/sizeof) to the number of elements in the first

call, so I am winding up with a chunk size of 1. What would be the
negatives of defining too large a chunk size? If I always define the
chunk size to be 64K bytes, will that much space always be used in the

file?

The HDF5 library currently always stores full chunks, even if
they are only partially covered by the current size of the dataspace in
a dimension. We will be changing that for the 1.10.0 release however,
so it should eventually be more space efficient.

Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Quincey_Koziol · September 15, 2009, 7:16pm

Hi George,

Okay, that tells me what I need to know, thanks!

But, I am wondering what this change in 1.10 will do. If I store a
single 4 byte integer into a dataset that has a chunk size of, say 16K,
will the library store only 4 bytes and reallocate every time I append,
or will there be some mechanism to allow efficient appending?

It will probably depend if the dimension is "unlimited" or not. I'm currently thinking that chunks will have the "full" chunk dimension for unlimited dimensions, but will store partial "edge" chunks for fixed dimensions. (This will have new API calls/properties to choose the desired behavior, which should give some flexibility) Also, the plan is to provide an API for determining if "edge" chunks (in unlimited or fixed dimensions) get I/O filters (like compression) applied to them. This will speed things up for applications that append one record at a time...

Quincey

···

On Sep 15, 2009, at 2:07 PM, Lewandowski, George wrote:

George Lewandowski
(314)777-7890
Mail Code S270-2204
Building 270-E Level 2E Room 20E
P-8A

-----Original Message-----
From: Quincey Koziol [mailto:koziol@hdfgroup.org]
Sent: Tuesday, September 15, 2009 1:54 PM
To: hdf-forum@hdfgroup.org
Subject: Re: [Hdf-forum] Rookie question

Hi George!

On Sep 15, 2009, at 1:47 PM, Lewandowski, George wrote:

I feel silly asking such a basic question, but I need to be sure.

Can I define a chunk size larger than the amount of data I am storing?

  Yes, definitely.

I have code that creates a dataset that is appended to many times, and

for reasons unknown to me right now, the first call is truncating the
default chunk size (64k/sizeof) to the number of elements in the first

call, so I am winding up with a chunk size of 1. What would be the
negatives of defining too large a chunk size? If I always define the
chunk size to be 64K bytes, will that much space always be used in the

file?

  The HDF5 library currently always stores full chunks, even if
they are only partially covered by the current size of the dataspace in
a dimension. We will be changing that for the 1.10.0 release however,
so it should eventually be more space efficient.

  Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

george.lewandowski · September 15, 2009, 7:18pm

Okay, thanks. I'm good to go now.

George Lewandowski
(314)777-7890
Mail Code S270-2204
Building 270-E Level 2E Room 20E
P-8A

···

-----Original Message-----
From: Quincey Koziol [mailto:koziol@hdfgroup.org]
Sent: Tuesday, September 15, 2009 2:17 PM
To: hdf-forum@hdfgroup.org
Subject: Re: [Hdf-forum] Rookie question

Hi George,

On Sep 15, 2009, at 2:07 PM, Lewandowski, George wrote:

Okay, that tells me what I need to know, thanks!

But, I am wondering what this change in 1.10 will do. If I store a
single 4 byte integer into a dataset that has a chunk size of, say
16K, will the library store only 4 bytes and reallocate every time I
append, or will there be some mechanism to allow efficient appending?

It will probably depend if the dimension is "unlimited" or not.
I'm currently thinking that chunks will have the "full" chunk dimension
for unlimited dimensions, but will store partial "edge" chunks for fixed
dimensions. (This will have new API calls/properties to choose the
desired behavior, which should give some flexibility) Also, the plan is
to provide an API for determining if "edge" chunks (in unlimited or
fixed dimensions) get I/O filters (like compression) applied to them.
This will speed things up for applications that append one record at a
time...

Quincey

George Lewandowski
(314)777-7890
Mail Code S270-2204
Building 270-E Level 2E Room 20E
P-8A

-----Original Message-----
From: Quincey Koziol [mailto:koziol@hdfgroup.org]
Sent: Tuesday, September 15, 2009 1:54 PM
To: hdf-forum@hdfgroup.org
Subject: Re: [Hdf-forum] Rookie question

Hi George!

On Sep 15, 2009, at 1:47 PM, Lewandowski, George wrote:

I feel silly asking such a basic question, but I need to be sure.

Can I define a chunk size larger than the amount of data I am
storing?

Yes, definitely.

I have code that creates a dataset that is appended to many times,
and

for reasons unknown to me right now, the first call is truncating the

default chunk size (64k/sizeof) to the number of elements in the
first

call, so I am winding up with a chunk size of 1. What would be the
negatives of defining too large a chunk size? If I always define the

chunk size to be 64K bytes, will that much space always be used in
the

file?

The HDF5 library currently always stores full chunks, even if

they

are only partially covered by the current size of the dataspace in a
dimension. We will be changing that for the 1.10.0 release however,
so it should eventually be more space efficient.

Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org