How HDF5 reads OR'ed hyperslabs?

Hi,

I'm trying to reduce as much as possible the time due to seek times of
the disk during a multi-chunk read operation in chunked datasets.

The first thing I've tried is to programatically 'join' continguous
chunks in one single H5Sselect_hyperslab() call. This seems to
significantly improve the read times, which is good news indeed.

The next thing has been asking the disk subsystem for a complete set of
chunks, possibly discontiguous, in just one shot. To that goal, I've
been doing some tests with the H5_SELECT_OR operator in the
H5Sselect_hyperslab() call. However, and despite this correctly
delivering the correct data, I'm a bit disappointed because, instead of
getting a speed-up, I'm observing a small, but noticeable, slowdown.

My guess is that the cause for this might well be the H5_SELECT_OR
selection operation taking a significant time to compute the binary OR
between the different hyperslices, and in addition, that the HDF5
library is not sending a single request to the operating system for the
complete set of OR'ed hyperslices (as I initially thought), in which
case, this is not a better option for reducing disk latency. Do you
think this is correct? If yes, anybody can suggest a better strategy
(if any at all)?

Thanks!

···

--

0,0< Francesc Altet http://www.carabos.com/

V V Cárabos Coop. V. Enjoy Data
"-"

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Francesc,

Hi,

I'm trying to reduce as much as possible the time due to seek times of
the disk during a multi-chunk read operation in chunked datasets.

The first thing I've tried is to programatically 'join' contiguous
chunks in one single H5Sselect_hyperslab() call. This seems to
significantly improve the read times, which is good news indeed.

The next thing has been asking the disk subsystem for a complete set of
chunks, possibly discontiguous, in just one shot. To that goal, I've
been doing some tests with the H5_SELECT_OR operator in the
H5Sselect_hyperslab() call. However, and despite this correctly
delivering the correct data, I'm a bit disappointed because, instead of
getting a speed-up, I'm observing a small, but noticeable, slowdown.

My guess is that the cause for this might well be the H5_SELECT_OR
selection operation taking a significant time to compute the binary OR
between the different hyperslices, and in addition, that the HDF5
library is not sending a single request to the operating system for the
complete set of OR'ed hyperslices (as I initially thought), in which
case, this is not a better option for reducing disk latency. Do you
think this is correct? If yes, anybody can suggest a better strategy
(if any at all)?

  It's possible that there's some small overhead in computing the binary OR, but it shouldn't be too much. If you have a chance to profile your benchmark with Purify (from IBM/Rational) or a similar tool, let me know where you are seeing any issues and I'll see what we can do about speeding things up.

  We do try to be smart about the I/O requests that we pass along to the operating system, but since the readv/writev POSIX I/O calls aren't sufficiently general, we have to perform single I/O accesses instead of giving the operating system all the information at once. If you've got some alternate solutions here, I'm interested in hearing about them also.

  Quincey

···

On Sep 26, 2007, at 8:19 AM, Francesc Altet wrote:

Hi Quincey,

A Wednesday 26 September 2007, Quincey Koziol escrigué:

> My guess is that the cause for this might well be the H5_SELECT_OR
> selection operation taking a significant time to compute the binary
> OR between the different hyperslices, and in addition, that the
> HDF5 library is not sending a single request to the operating
> system for the
> complete set of OR'ed hyperslices (as I initially thought), in
> which case, this is not a better option for reducing disk latency.
> Do you think this is correct? If yes, anybody can suggest a better
> strategy (if any at all)?

  It's possible that there's some small overhead in computing the
binary OR, but it shouldn't be too much. If you have a chance to
profile your benchmark with Purify (from IBM/Rational) or a similar
tool, let me know where you are seeing any issues and I'll see what
we can do about speeding things up.

First of all, thanks for looking into this. I've run my small benchmark
through oprofile [1]. On my machine, oprofile does an heuristic
counting of CPU ticks spent on each task; so the counts are not exact,
but precise enough.

Here you have the top binary consumers when running the original
benchmark (using HDF5 1.8.0beta3 here, but 1.6.6 timings are similar):

  samples> %|

···

------------------
     6032 74.8945 no-vmlinux
      558 6.9282 libhdf5.so.1.3.4
      309 3.8366 liblzo2.so.2.0.0
      259 3.2158 python2.5
      221 2.7440 interpreter.so
      192 2.3839 libc-2.5.so
      146 1.8128 multiarray.so
      128 1.5893 libfb.so
       84 1.0430 _sort.so
       70 0.8691 umath.so

The first process is 'no-vmlinux' which is the Linux kernel. It
reflects the fact that most of the time is spent in doing I/O, which is
normal. Next follows the HDF5 library, the LZO2 (decompressor) and the
Python interpreter. You can ignore the other libraries (they are
called by PyTables for doing another kind of computations).

Next, the same program but using H5_SELECT_OR for gathering several
discontiguous chunks at once:

  samples> %|
------------------
     6457 60.8347 no-vmlinux
     2783 26.2201 libhdf5.so.1.3.4
      245 2.3083 liblzo2.so.2.0.0
      224 2.1104 interpreter.so
      209 1.9691 libc-2.5.so
      205 1.9314 python2.5
      144 1.3567 multiarray.so
      141 1.3284 libfb.so
       88 0.8291 _sort.so
       61 0.5747 umath.so

As you can see, most of the times for the libraries are more or less the
same (remember, counts are not exact), except for HDF5 itself which
takes around 5x more that the original version. Also noticeable is
that the kernel spends here a 7% more as well (I don't know if this is
significant, probably not).

If we look into the details of the routines consuming more cpu inside
HDF5, this is what we get for the original version:

samples % symbol name
417 68.3607 H5Z_filter_shuffle
10 1.6393 .plt
8 1.3115 H5D_create_chunk_map
8 1.3115 H5S_select_hyperslab
6 0.9836 H5D_istore_cmp3
6 0.9836 H5FD_sec2_read
6 0.9836 H5T_cmp
5 0.8197 H5B_find
5 0.8197 H5C_protect
5 0.8197 H5C_unprotect
5 0.8197 H5D_istore_lock
5 0.8197 H5FL_reg_malloc

And for the H5_SELECT_OR version:

samples % symbol name
652 23.4280 __udivdi3
543 19.5113 __umoddi3
393 14.1215 H5Z_filter_shuffle
186 6.6834 H5V_array_calc
149 5.3539 H5S_select_iterate
134 4.8149 H5D_chunk_mem_cb
128 4.5994 H5S_hyper_add_span_element
116 4.1682 H5S_all_iter_next
95 3.4136 H5V_chunk_index
92 3.3058 H5V_array_offset_pre
67 2.4075 H5S_select_iter_next
37 1.3295 H5S_all_iter_coords
30 1.0780 __i686.get_pc_thunk.bx
22 0.7905 H5S_select_iter_coords
21 0.7546 .plt
9 0.3234 H5S_hyper_get_seq_list
6 0.2156 H5D_read

I don't know the internals of HDF5, but it seems like if __udivdi3 and
__umoddi3 (computing the binary OR?) are using a lot of CPU. In
addition, there are another routines (mainly H5V_array_calc,
H5S_select_iterate, H5D_chunk_mem_cb, H5S_hyper_add_span_element,
H5S_all_iter_next, H5V_chunk_index and H5S_all_iter_next and
H5S_all_iter_coords) that did not appear before and which consume quite
a bit of CPU.

  We do try to be smart about the I/O requests that we pass along to
the operating system, but since the readv/writev POSIX I/O calls
aren't sufficiently general, we have to perform single I/O accesses
instead of giving the operating system all the information at once.
If you've got some alternate solutions here, I'm interested in
hearing about them also.

I'm not really an expert on POSIX I/O, sorry. However, and looking at
the profiles above, it doesn't seem like if the chunk OR'ing is going
to allow better performance, at least for the time being. Nonetheless,
it would be great to optimize this aspect in the long term.

[1] http://oprofile.sourceforge.net

--

0,0< Francesc Altet http://www.carabos.com/

V V Cárabos Coop. V. Enjoy Data
"-"

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Francesc,

Hi Quincey,

A Wednesday 26 September 2007, Quincey Koziol escrigué:

My guess is that the cause for this might well be the H5_SELECT_OR
selection operation taking a significant time to compute the binary
OR between the different hyperslices, and in addition, that the
HDF5 library is not sending a single request to the operating
system for the
complete set of OR'ed hyperslices (as I initially thought), in
which case, this is not a better option for reducing disk latency.
Do you think this is correct? If yes, anybody can suggest a better
strategy (if any at all)?

  It's possible that there's some small overhead in computing the
binary OR, but it shouldn't be too much. If you have a chance to
profile your benchmark with Purify (from IBM/Rational) or a similar

  Sorry, I meant profile with Quantify here, not Purify. (Probably not important to you, but might be for those on the mailing list :slight_smile:

tool, let me know where you are seeing any issues and I'll see what
we can do about speeding things up.

First of all, thanks for looking into this. I've run my small benchmark
through oprofile [1]. On my machine, oprofile does an heuristic
counting of CPU ticks spent on each task; so the counts are not exact,
but precise enough.

Here you have the top binary consumers when running the original
benchmark (using HDF5 1.8.0beta3 here, but 1.6.6 timings are similar):

  samples> %|
------------------
     6032 74.8945 no-vmlinux
      558 6.9282 libhdf5.so.1.3.4
      309 3.8366 liblzo2.so.2.0.0
      259 3.2158 python2.5
      221 2.7440 interpreter.so
      192 2.3839 libc-2.5.so
      146 1.8128 multiarray.so
      128 1.5893 libfb.so
       84 1.0430 _sort.so
       70 0.8691 umath.so

The first process is 'no-vmlinux' which is the Linux kernel. It
reflects the fact that most of the time is spent in doing I/O, which is
normal. Next follows the HDF5 library, the LZO2 (decompressor) and the
Python interpreter. You can ignore the other libraries (they are
called by PyTables for doing another kind of computations).

Next, the same program but using H5_SELECT_OR for gathering several
discontiguous chunks at once:

  samples> %|
------------------
     6457 60.8347 no-vmlinux
     2783 26.2201 libhdf5.so.1.3.4
      245 2.3083 liblzo2.so.2.0.0
      224 2.1104 interpreter.so
      209 1.9691 libc-2.5.so
      205 1.9314 python2.5
      144 1.3567 multiarray.so
      141 1.3284 libfb.so
       88 0.8291 _sort.so
       61 0.5747 umath.so

As you can see, most of the times for the libraries are more or less the
same (remember, counts are not exact), except for HDF5 itself which
takes around 5x more that the original version. Also noticeable is
that the kernel spends here a 7% more as well (I don't know if this is
significant, probably not).

  Hmm, yes, that's a bit more than I'd expect, although if you are doing complicated selections, having the extra time in the HDF5 routines might be reasonable.

If we look into the details of the routines consuming more cpu inside
HDF5, this is what we get for the original version:

samples % symbol name
417 68.3607 H5Z_filter_shuffle
10 1.6393 .plt
8 1.3115 H5D_create_chunk_map
8 1.3115 H5S_select_hyperslab
6 0.9836 H5D_istore_cmp3
6 0.9836 H5FD_sec2_read
6 0.9836 H5T_cmp
5 0.8197 H5B_find
5 0.8197 H5C_protect
5 0.8197 H5C_unprotect
5 0.8197 H5D_istore_lock
5 0.8197 H5FL_reg_malloc

  Hmm, I wonder why the shuffle filter is taking so much CPU time... :-?

And for the H5_SELECT_OR version:

samples % symbol name
652 23.4280 __udivdi3
543 19.5113 __umoddi3
393 14.1215 H5Z_filter_shuffle
186 6.6834 H5V_array_calc
149 5.3539 H5S_select_iterate
134 4.8149 H5D_chunk_mem_cb
128 4.5994 H5S_hyper_add_span_element
116 4.1682 H5S_all_iter_next
95 3.4136 H5V_chunk_index
92 3.3058 H5V_array_offset_pre
67 2.4075 H5S_select_iter_next
37 1.3295 H5S_all_iter_coords
30 1.0780 __i686.get_pc_thunk.bx
22 0.7905 H5S_select_iter_coords
21 0.7546 .plt
9 0.3234 H5S_hyper_get_seq_list
6 0.2156 H5D_read

I don't know the internals of HDF5, but it seems like if __udivdi3 and
__umoddi3 (computing the binary OR?) are using a lot of CPU. In
addition, there are another routines (mainly H5V_array_calc,
H5S_select_iterate, H5D_chunk_mem_cb, H5S_hyper_add_span_element,
H5S_all_iter_next, H5V_chunk_index and H5S_all_iter_next and
H5S_all_iter_coords) that did not appear before and which consume quite
a bit of CPU.

  The __udivdi3 and __umoddi3 routines are for 64-bit integer division and 'mod' operations. Computing the offsets for "complicated" hyperslab selections involves some division & mod operations, so it's reasonable to see them in the profile, although I do think it's possible there's a problem with the algorithm that they are consuming so much CPU time.

  Can you describe the hyperslab operations you are performing in order to generate the profile above? Also, if you could translate a sample program showing the problem from Python into C and send it to me, that would make it easier for me to work on speedups/fixes at this end.

  We do try to be smart about the I/O requests that we pass along to
the operating system, but since the readv/writev POSIX I/O calls
aren't sufficiently general, we have to perform single I/O accesses
instead of giving the operating system all the information at once.
If you've got some alternate solutions here, I'm interested in
hearing about them also.

I'm not really an expert on POSIX I/O, sorry. However, and looking at
the profiles above, it doesn't seem like if the chunk OR'ing is going
to allow better performance, at least for the time being. Nonetheless,
it would be great to optimize this aspect in the long term.

  Sure, no problem.

[1] http://oprofile.sourceforge.net

  Nice tool for profiling - thanks for the pointer. :slight_smile:

    Quincey

···

On Sep 26, 2007, at 12:25 PM, Francesc Altet wrote:

Hi Quincey,

A Thursday 27 September 2007, Quincey Koziol escrigué:

> As you can see, most of the times for the libraries are more or
> less the
> same (remember, counts are not exact), except for HDF5 itself which
> takes around 5x more that the original version. Also noticeable is
> that the kernel spends here a 7% more as well (I don't know if this
> is significant, probably not).

  Hmm, yes, that's a bit more than I'd expect, although if you are
doing complicated selections, having the extra time in the HDF5
routines might be reasonable.

Yeah, for the general purposes I guess so. It's just that I wanted to
give logical combinations of chunks a chance for speeding-up my own
stuff :wink:

> If we look into the details of the routines consuming more cpu
> inside HDF5, this is what we get for the original version:
>
> samples % symbol name
> 417 68.3607 H5Z_filter_shuffle
> 10 1.6393 .plt
> 8 1.3115 H5D_create_chunk_map
> 8 1.3115 H5S_select_hyperslab
> 6 0.9836 H5D_istore_cmp3
> 6 0.9836 H5FD_sec2_read
> 6 0.9836 H5T_cmp
> 5 0.8197 H5B_find
> 5 0.8197 H5C_protect
> 5 0.8197 H5C_unprotect
> 5 0.8197 H5D_istore_lock
> 5 0.8197 H5FL_reg_malloc

  Hmm, I wonder why the shuffle filter is taking so much CPU time...
:-?

I'll tell you why: I work with pretty large chunksizes (64 KB in this
example), with compound elements (24 bytes each). I've noticed that
shuffle times increases quite a lot with the chunksize (which is
expected anyway).

  The __udivdi3 and __umoddi3 routines are for 64-bit integer division
and 'mod' operations. Computing the offsets for "complicated"
hyperslab selections involves some division & mod operations, so it's
reasonable to see them in the profile, although I do think it's
possible there's a problem with the algorithm that they are consuming
so much CPU time.

  Can you describe the hyperslab operations you are performing in
order to generate the profile above? Also, if you could translate a
sample program showing the problem from Python into C and send it to
me, that would make it easier for me to work on speedups/fixes at
this end.

My setup is pretty simple: it reads 5000 chunks (64 KB in size as said
before) randomly chosen from a compound (24 bytes/element) dataset (100
Melements, for a total of 2.4 GB) using compression (225% of
efficiency, for a grand total of 1.1 GB). Also, the access order is
sorted so that the disk heads doesn't have to travel too much going
from one cylinder to the other. Another optimization was that, for
reading puposes, the chunks are grouped in buffers of 8 chunks, and if
two or more chunks were continguous, these are joined into a bigger
hyperslice for issuing a single I/O request (but this kind of joins
only happens in about the 10% of the chunks, so you can disregard this
optimization for your benchmarking purposes).

For your reference, here you have the basic loop routine that builds the
dataspace with the 8 OR'ed chunks to be read:

/* ****************************** */
/* Fill the dataset with the hyperslices to be read */
nrecout = 0;
for (n=0; n < nhyperslices; n++) {
   /* Define a hyperslab in the dataset of the size of the records */
   offset[0] = starts[n];
   count[0] = nrecords[n];
   H5Sselect_hyperslab(space_id, (n==0 ? H5S_SELECT_SET :
H5S_SELECT_OR), offset, NULL, count, NULL);
   nrecout += nrecords[n];
}
/* ****************************** */

I'm a bit swamped right now, so I won't be able to provide a sample
program in C anytime soon. I'll try to do this after I finished my
current duties if you still need it.

At any rate, I'd be more interested (and I guess you will as well) in
knowing if it is possible to pack an N-chunk sparse read in a single
shot request to the operating system, than optimizing the CPU times.
As you can guess, the bottleneck with this kind of reads is more the
seek times than the CPU overhead, and this will become more and more
important as the time goes (CPU's are more and powerful each year, but
disk seek times are still barely the same now than 10 years ago).
Unfortunately, it is bad news that Posix doesn't seem to provide an
easy solution for this.

> [1] http://oprofile.sourceforge.net

  Nice tool for profiling - thanks for the pointer. :slight_smile:

It works pretty well, and use it quite a lot. Besides, it's free :slight_smile:

···

--

0,0< Francesc Altet http://www.carabos.com/

V V Cárabos Coop. V. Enjoy Data
"-"

Hi Francesc,

Hi Quincey,

A Thursday 27 September 2007, Quincey Koziol escrigué:

  The __udivdi3 and __umoddi3 routines are for 64-bit integer division
and 'mod' operations. Computing the offsets for "complicated"
hyperslab selections involves some division & mod operations, so it's
reasonable to see them in the profile, although I do think it's
possible there's a problem with the algorithm that they are consuming
so much CPU time.

  Can you describe the hyperslab operations you are performing in
order to generate the profile above? Also, if you could translate a
sample program showing the problem from Python into C and send it to
me, that would make it easier for me to work on speedups/fixes at
this end.

My setup is pretty simple: it reads 5000 chunks (64 KB in size as said
before) randomly chosen from a compound (24 bytes/element) dataset (100
Melements, for a total of 2.4 GB) using compression (225% of
efficiency, for a grand total of 1.1 GB). Also, the access order is
sorted so that the disk heads doesn't have to travel too much going
from one cylinder to the other. Another optimization was that, for
reading puposes, the chunks are grouped in buffers of 8 chunks, and if
two or more chunks were continguous, these are joined into a bigger
hyperslice for issuing a single I/O request (but this kind of joins
only happens in about the 10% of the chunks, so you can disregard this
optimization for your benchmarking purposes).

For your reference, here you have the basic loop routine that builds the
dataspace with the 8 OR'ed chunks to be read:

/* ****************************** */
/* Fill the dataset with the hyperslices to be read */
nrecout = 0;
for (n=0; n < nhyperslices; n++) {
   /* Define a hyperslab in the dataset of the size of the records */
   offset[0] = starts[n];
   count[0] = nrecords[n];
   H5Sselect_hyperslab(space_id, (n==0 ? H5S_SELECT_SET :
H5S_SELECT_OR), offset, NULL, count, NULL);
   nrecout += nrecords[n];
}
/* ****************************** */

I'm a bit swamped right now, so I won't be able to provide a sample
program in C anytime soon. I'll try to do this after I finished my
current duties if you still need it.

  OK, I'll file a bug report about the issue and try to come back to it when I'm working on optimizations for benchmarks.

At any rate, I'd be more interested (and I guess you will as well) in
knowing if it is possible to pack an N-chunk sparse read in a single
shot request to the operating system, than optimizing the CPU times.
As you can guess, the bottleneck with this kind of reads is more the
seek times than the CPU overhead, and this will become more and more
important as the time goes (CPU's are more and powerful each year, but
disk seek times are still barely the same now than 10 years ago).
Unfortunately, it is bad news that Posix doesn't seem to provide an
easy solution for this.

  Well, eventually, there will be POSIX routines to support this, but until there are, it's very difficult to address. One possibility would be to layout the chunks in the file as contiguously as possible and then make I/O requests for more than one chunk at a time. That would take some work in the HDF5 library to make happen, but could improve times in some access patterns.

  Quincey

···

On Sep 27, 2007, at 7:33 AM, Francesc Altet wrote:

Quincey,

A Thursday 27 September 2007, Quincey Koziol escrigué:
[snip]

> I'm a bit swamped right now, so I won't be able to provide a sample
> program in C anytime soon. I'll try to do this after I finished my
> current duties if you still need it.

  OK, I'll file a bug report about the issue and try to come back to
it when I'm working on optimizations for benchmarks.

OK. Sounds good.

> At any rate, I'd be more interested (and I guess you will as well)
> in knowing if it is possible to pack an N-chunk sparse read in a
> single shot request to the operating system, than optimizing the
> CPU times. As you can guess, the bottleneck with this kind of reads
> is more the seek times than the CPU overhead, and this will become
> more and more important as the time goes (CPU's are more and
> powerful each year, but disk seek times are still barely the same
> now than 10 years ago). Unfortunately, it is bad news that Posix
> doesn't seem to provide an easy solution for this.

  Well, eventually, there will be POSIX routines to support this, but
until there are, it's very difficult to address. One possibility
would be to layout the chunks in the file as contiguously as possible
and then make I/O requests for more than one chunk at a time. That
would take some work in the HDF5 library to make happen, but could
improve times in some access patterns.

I had no idea that chunks are not placed as contiguously as possible.
Mmm, there seemingly are several fronts for optimizing the access to
chunks (even those that are logically contiguous). Very interesting.

···

--

0,0< Francesc Altet http://www.carabos.com/

V V Cárabos Coop. V. Enjoy Data
"-"

Hi,
Could anyone please suggest the best practice to specify the chunk size for
compressing an arbitrary sized dataset? Currently I am taking chunk sizes
equal to dataset dimensions but I guess this may be generally a bad idea.
regards,
Dominik

···

--
Dominik Szczerba, Ph.D.
Computer Vision Lab CH-8092 Zurich
http://www.vision.ee.ethz.ch/~domi

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi,

There is no H5LTread_dataset_unsigned_char in the Lite interface but there is
H5T_NATIVE_UCHAR for H5LTmake_dataset. How do I read unsigned chars then?

Thanks, Dominik

···

--
Dominik Szczerba, Ph.D.
Computer Vision Lab CH-8092 Zurich
http://www.vision.ee.ethz.ch/~domi

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Dominik,

Could anyone please suggest the best practice to specify the chunk size for
compressing an arbitrary sized dataset? Currently I am taking chunk sizes
equal to dataset dimensions but I guess this may be generally a bad idea.

  It really depends on the access pattern for your chunks. You should tune your chunk size/dimensions to how you will be writing/reading the chunks. If you don't know (perhaps you are writing a generic library or application), you should probably try to make the chunks as "square" as possible (try to get all the chunk dimensions about the same size) and try to not make the chunks smaller than 1MB or so.

  Quincey

···

On Nov 24, 2007, at 10:05 AM, Dominik Szczerba wrote:

regards,
Dominik
--
Dominik Szczerba, Ph.D.
Computer Vision Lab CH-8092 Zurich
http://www.vision.ee.ethz.ch/~domi

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Thanks a lot for the answer.
I see your point. What do I do in case the dataset size is not a multiple of
an easy number? (if my dataset is 1100 bytes and I set the chunk size to 128
bytes for example)
-- Dominik

···

On Saturday 24 November 2007 17.13:17 Quincey Koziol wrote:

Hi Dominik,

On Nov 24, 2007, at 10:05 AM, Dominik Szczerba wrote:
> Could anyone please suggest the best practice to specify the chunk
> size for
> compressing an arbitrary sized dataset? Currently I am taking chunk
> sizes
> equal to dataset dimensions but I guess this may be generally a bad
> idea.

  It really depends on the access pattern for your chunks. You should
tune your chunk size/dimensions to how you will be writing/reading
the chunks. If you don't know (perhaps you are writing a generic
library or application), you should probably try to make the chunks
as "square" as possible (try to get all the chunk dimensions about
the same size) and try to not make the chunks smaller than 1MB or so.

  Quincey

> regards,
> Dominik
> --
> Dominik Szczerba, Ph.D.
> Computer Vision Lab CH-8092 Zurich
> http://www.vision.ee.ethz.ch/~domi
>
> ----------------------------------------------------------------------
> This mailing list is for HDF software users discussion.
> To subscribe to this list, send a message to hdf-forum-
> subscribe@hdfgroup.org.
> To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to
hdf-forum-subscribe@hdfgroup.org. To unsubscribe, send a message to
hdf-forum-unsubscribe@hdfgroup.org.

--
Dominik Szczerba, Ph.D.
Computer Vision Lab CH-8092 Zurich
http://www.vision.ee.ethz.ch/~domi

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Quincey,

A Saturday 24 November 2007, Quincey Koziol escrigué:

···

On Nov 24, 2007, at 10:05 AM, Dominik Szczerba wrote:
> Could anyone please suggest the best practice to specify the chunk
> size for
> compressing an arbitrary sized dataset? Currently I am taking chunk
> sizes
> equal to dataset dimensions but I guess this may be generally a bad
> idea.

  It really depends on the access pattern for your chunks. You should
tune your chunk size/dimensions to how you will be writing/reading
the chunks. If you don't know (perhaps you are writing a generic
library or application), you should probably try to make the chunks
as "square" as possible (try to get all the chunk dimensions about
the same size) and try to not make the chunks smaller than 1MB or so.

                                                ^^^^^^^
Er, I think you meant to write *smaller* that 1MB, no?

--

0,0< Francesc Altet http://www.carabos.com/

V V Cárabos Coop. V. Enjoy Data
"-"

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

hello, Dominik

Hi,

There is no H5LTread_dataset_unsigned_char in the Lite interface but there is
H5T_NATIVE_UCHAR for H5LTmake_dataset. How do I read unsigned chars then?

you can just use H5LTread_dataset with H5T_NATIVE_UCHAR as the type parameter. Typical usage would be

call H5LTmake_dataset with H5T_NATIVE_UCHAR to make and optionally write a dataset (NULL as buffer does not write, just creates)
call H5LTread_dataset with H5T_NATIVE_UCHAR to read

there are a couple of examples here

hl\test\test_lite.c

and docs here

http://www.hdfgroup.uiuc.edu/HDF5/doc_1.8pre/doc/HL/RM_H5LT.html

Pedro

···

At 02:36 PM 1/28/2008, Dominik Szczerba wrote:

Thanks, Dominik

--
Dominik Szczerba, Ph.D.
Computer Vision Lab CH-8092 Zurich
http://www.vision.ee.ethz.ch/~domi

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Pedro,

Thanks a lot for a quick response. I was already trying H5LTread_dataset but
with the H5T_INTEGER (because according to specs char in hdf5 is integer of
size 1) - of course with no luck. As you say, using H5T_NATIVE_UCHAR works
instantly. Datatypes are a bit unclear for me at times, but your hints have
again proven very helpful.

Thanks again,
Dominik

hello, Dominik

>Hi,
>
>There is no H5LTread_dataset_unsigned_char in the Lite interface but there

is

>H5T_NATIVE_UCHAR for H5LTmake_dataset. How do I read unsigned chars then?

you can just use H5LTread_dataset with H5T_NATIVE_UCHAR as the type

parameter. Typical usage would be

call H5LTmake_dataset with H5T_NATIVE_UCHAR to make and optionally write a

dataset (NULL as buffer does not write, just creates)

call H5LTread_dataset with H5T_NATIVE_UCHAR to read

there are a couple of examples here

hl\test\test_lite.c

and docs here

http://www.hdfgroup.uiuc.edu/HDF5/doc_1.8pre/doc/HL/RM_H5LT.html

Pedro

>Thanks, Dominik
>
>--
>Dominik Szczerba, Ph.D.
>Computer Vision Lab CH-8092 Zurich
>http://www.vision.ee.ethz.ch/~domi
>
>----------------------------------------------------------------------
>This mailing list is for HDF software users discussion.
>To subscribe to this list, send a message to

hdf-forum-subscribe@hdfgroup.org.

>To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to

hdf-forum-subscribe@hdfgroup.org.

···

On Monday 28 January 2008 22.30:10 Pedro Vicente Nunes wrote:

At 02:36 PM 1/28/2008, Dominik Szczerba wrote:
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

--
Dominik Szczerba, Ph.D.
Computer Vision Lab CH-8092 Zurich
http://www.vision.ee.ethz.ch/~domi

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Thanks a lot for the answer.
I see your point. What do I do in case the dataset size is not a multiple of
an easy number? (if my dataset is 1100 bytes and I set the chunk size to 128
bytes for example)

  The HDF5 library will handle "partially full" chunks around the dataset "edges" correctly, so just do what makes sense (under the suggestions I gave below) and things should be OK.

  Quincey

···

On Nov 24, 2007, at 11:07 AM, Dominik Szczerba wrote:

-- Dominik

On Saturday 24 November 2007 17.13:17 Quincey Koziol wrote:

Hi Dominik,

On Nov 24, 2007, at 10:05 AM, Dominik Szczerba wrote:

Could anyone please suggest the best practice to specify the chunk
size for
compressing an arbitrary sized dataset? Currently I am taking chunk
sizes
equal to dataset dimensions but I guess this may be generally a bad
idea.

  It really depends on the access pattern for your chunks. You should
tune your chunk size/dimensions to how you will be writing/reading
the chunks. If you don't know (perhaps you are writing a generic
library or application), you should probably try to make the chunks
as "square" as possible (try to get all the chunk dimensions about
the same size) and try to not make the chunks smaller than 1MB or so.

  Quincey

regards,
Dominik
--
Dominik Szczerba, Ph.D.
Computer Vision Lab CH-8092 Zurich
http://www.vision.ee.ethz.ch/~domi

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-
subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to
hdf-forum-subscribe@hdfgroup.org. To unsubscribe, send a message to
hdf-forum-unsubscribe@hdfgroup.org.

--
Dominik Szczerba, Ph.D.
Computer Vision Lab CH-8092 Zurich
http://www.vision.ee.ethz.ch/~domi

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Thanks a lot for the clarifications.
regards,
-- Dominik

···

On Saturday 24 November 2007 18.10:40 Quincey Koziol wrote:

On Nov 24, 2007, at 11:07 AM, Dominik Szczerba wrote:
> Thanks a lot for the answer.
> I see your point. What do I do in case the dataset size is not a
> multiple of
> an easy number? (if my dataset is 1100 bytes and I set the chunk
> size to 128
> bytes for example)

  The HDF5 library will handle "partially full" chunks around the
dataset "edges" correctly, so just do what makes sense (under the
suggestions I gave below) and things should be OK.

  Quincey

> -- Dominik
>
> On Saturday 24 November 2007 17.13:17 Quincey Koziol wrote:
>> Hi Dominik,
>>
>> On Nov 24, 2007, at 10:05 AM, Dominik Szczerba wrote:
>>> Could anyone please suggest the best practice to specify the chunk
>>> size for
>>> compressing an arbitrary sized dataset? Currently I am taking chunk
>>> sizes
>>> equal to dataset dimensions but I guess this may be generally a bad
>>> idea.
>>
>> It really depends on the access pattern for your chunks. You should
>> tune your chunk size/dimensions to how you will be writing/reading
>> the chunks. If you don't know (perhaps you are writing a generic
>> library or application), you should probably try to make the chunks
>> as "square" as possible (try to get all the chunk dimensions about
>> the same size) and try to not make the chunks smaller than 1MB or so.
>>
>> Quincey
>>
>>> regards,
>>> Dominik
>>> --
>>> Dominik Szczerba, Ph.D.
>>> Computer Vision Lab CH-8092 Zurich
>>> http://www.vision.ee.ethz.ch/~domi
>>>
>>> --------------------------------------------------------------------
>>> --
>>> This mailing list is for HDF software users discussion.
>>> To subscribe to this list, send a message to hdf-forum-
>>> subscribe@hdfgroup.org.
>>> To unsubscribe, send a message to hdf-forum-
>>> unsubscribe@hdfgroup.org.
>>
>> ---------------------------------------------------------------------
>> -
>> This mailing list is for HDF software users discussion.
>> To subscribe to this list, send a message to
>> hdf-forum-subscribe@hdfgroup.org. To unsubscribe, send a message to
>> hdf-forum-unsubscribe@hdfgroup.org.
>
> --
> Dominik Szczerba, Ph.D.
> Computer Vision Lab CH-8092 Zurich
> http://www.vision.ee.ethz.ch/~domi
>
> ----------------------------------------------------------------------
> This mailing list is for HDF software users discussion.
> To subscribe to this list, send a message to hdf-forum-
> subscribe@hdfgroup.org.
> To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

--
Dominik Szczerba, Ph.D.
Computer Vision Lab CH-8092 Zurich
http://www.vision.ee.ethz.ch/~domi

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

And I would also be interested to know what bad can happen if I use the size
of the dataset as the chunk size (so one big chunk). Will it possibly lead to
memory problems and if yes how will hdf5 deal with it?

Thanks and regards,
-- Dominik

···

On Monday 26 November 2007 21.00:47 Francesc Altet wrote:

Hi Quincey,

A Saturday 24 November 2007, Quincey Koziol escrigué:
> On Nov 24, 2007, at 10:05 AM, Dominik Szczerba wrote:
> > Could anyone please suggest the best practice to specify the chunk
> > size for
> > compressing an arbitrary sized dataset? Currently I am taking chunk
> > sizes
> > equal to dataset dimensions but I guess this may be generally a bad
> > idea.
>
> It really depends on the access pattern for your chunks. You should
> tune your chunk size/dimensions to how you will be writing/reading
> the chunks. If you don't know (perhaps you are writing a generic
> library or application), you should probably try to make the chunks
> as "square" as possible (try to get all the chunk dimensions about
> the same size) and try to not make the chunks smaller than 1MB or so.

                                                ^^^^^^^
Er, I think you meant to write *smaller* that 1MB, no?

--
Dominik Szczerba, Ph.D.
Computer Vision Lab CH-8092 Zurich
http://www.vision.ee.ethz.ch/~domi

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Francesc,

···

On Nov 26, 2007, at 2:00 PM, Francesc Altet wrote:

Hi Quincey,

A Saturday 24 November 2007, Quincey Koziol escrigué:

On Nov 24, 2007, at 10:05 AM, Dominik Szczerba wrote:

Could anyone please suggest the best practice to specify the chunk
size for
compressing an arbitrary sized dataset? Currently I am taking chunk
sizes
equal to dataset dimensions but I guess this may be generally a bad
idea.

  It really depends on the access pattern for your chunks. You should
tune your chunk size/dimensions to how you will be writing/reading
the chunks. If you don't know (perhaps you are writing a generic
library or application), you should probably try to make the chunks
as "square" as possible (try to get all the chunk dimensions about
the same size) and try to not make the chunks smaller than 1MB or so.

                                                ^^^^^^^
Er, I think you meant to write *smaller* that 1MB, no?

  No, making chunks too small will make the overhead for the B-tree that indexes them too large and impact both the performance and size of the file. Even 1MB chunks may be too small for many applications, on modern computers...

  Quincey
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

A Monday 26 November 2007, Francesc Altet escrigué:

> It really depends on the access pattern for your chunks. You
> should tune your chunk size/dimensions to how you will be
> writing/reading the chunks. If you don't know (perhaps you are
> writing a generic library or application), you should probably try
> to make the chunks as "square" as possible (try to get all the
> chunk dimensions about the same size) and try to not make the
> chunks smaller than 1MB or so.

                                                ^^^^^^^
Er, I think you meant to write *smaller* that 1MB, no?

Oh my!, I wanted to say *larger* than 1MB. i.e. the recommended chunsize
would not exceed 1MB in size, IMO. My apologies for fooling you.

···

--

0,0< Francesc Altet http://www.carabos.com/

V V Cárabos Coop. V. Enjoy Data
"-"

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Dominik,

Hi Quincey,

A Saturday 24 November 2007, Quincey Koziol escrigué:

Could anyone please suggest the best practice to specify the chunk
size for
compressing an arbitrary sized dataset? Currently I am taking chunk
sizes
equal to dataset dimensions but I guess this may be generally a bad
idea.

  It really depends on the access pattern for your chunks. You should
tune your chunk size/dimensions to how you will be writing/reading
the chunks. If you don't know (perhaps you are writing a generic
library or application), you should probably try to make the chunks
as "square" as possible (try to get all the chunk dimensions about
the same size) and try to not make the chunks smaller than 1MB or so.

                                                ^^^^^^^
Er, I think you meant to write *smaller* that 1MB, no?

And I would also be interested to know what bad can happen if I use the size
of the dataset as the chunk size (so one big chunk). Will it possibly lead to
memory problems and if yes how will hdf5 deal with it?

  Yes, you shouldn't make the chunks too big either. :slight_smile: The HDF5 library will bring each chunk into memory in order to read/write it (unless you are operating on non-compressed chunks using MPI-I/O). So if the dataset is large and stored as a single chunk, that will probably be a lot of extra I/O. If you are reading/writing the entire chunk for the I/O, it would be OK though...

  Quincey

···

On Nov 26, 2007, at 2:13 PM, Dominik Szczerba wrote:

On Monday 26 November 2007 21.00:47 Francesc Altet wrote:

On Nov 24, 2007, at 10:05 AM, Dominik Szczerba wrote:

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.