Dear Knut,
Does the limit 6553/6554 depend on the size of the data item in the chunk? Because if the number is larger for smaller data items, then it could be about page swapping (RAM <-> hard disk) when processing large chunks.
Regards,
Vesa Paatero
···
On Fri, 16 Apr 2010 18:32:31 -0400, John Knutson <jkml@arlut.utexas.edu> wrote:
Chunking puzzles me. At first, I thought it was a number of bytes (I
haven't been able to find any documentation that explicitly says whether
it's a number of bytes, a number of records, or what), but now I'm not
sure. Again, I did some experiments and found that there was a bit of
extra overhead with a chunk size of 1, but there really wasn't much
difference between a chunk size of 128, 512, or 2048 (in terms of
writing speed, mind you, there's definitely a difference in file size).
That said, when I tried the same test using a chunk size of 10240 and it
slowed down enough that I didn't bother letting it finish. After
playing a bit more, it seems the largest chunk size I can pick (in
whatever units it happens to be in), is 6553, with it completing in a
reasonable time frame (processing time increases by two orders of
magnitude going from 6553 to 6554).
It does seem to, yes. I ran a test with a slightly larger compound type (360 bytes vs. 160 or 170), and it started increasing slowly @ around a chunk size of 4K records (as opposed to the 6554 with the 160 byte records).
I fiddled around with it a bit more while monitoring paging (via both vmstat and a graphical tool) and I didn't see any correlation between paging and the performance. My best guess at this point is that it's a cache miss, or the amount of data is hitting the limit of the cache size
···
at that point. Vesa Paatero wrote:
Dear Knut,
Does the limit 6553/6554 depend on the size of the data item in the chunk? Because if the number is larger for smaller data items, then it could be about page swapping (RAM <-> hard disk) when processing large chunks.
Regards,
Vesa Paatero
On Fri, 16 Apr 2010 18:32:31 -0400, John Knutson <jkml@arlut.utexas.edu> wrote:
Chunking puzzles me. At first, I thought it was a number of bytes (I
haven't been able to find any documentation that explicitly says whether
it's a number of bytes, a number of records, or what), but now I'm not
sure. Again, I did some experiments and found that there was a bit of
extra overhead with a chunk size of 1, but there really wasn't much
difference between a chunk size of 128, 512, or 2048 (in terms of
writing speed, mind you, there's definitely a difference in file size).
That said, when I tried the same test using a chunk size of 10240 and it
slowed down enough that I didn't bother letting it finish. After
playing a bit more, it seems the largest chunk size I can pick (in
whatever units it happens to be in), is 6553, with it completing in a
reasonable time frame (processing time increases by two orders of
magnitude going from 6553 to 6554).
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org