hdf read/write performance tuning

Dear Knut,

Does the limit 6553/6554 depend on the size of the data item in the chunk? Because if the number is larger for smaller data items, then it could be about page swapping (RAM <-> hard disk) when processing large chunks.

Regards,
Vesa Paatero

···

On Fri, 16 Apr 2010 18:32:31 -0400, John Knutson <jkml@arlut.utexas.edu> wrote:

Chunking puzzles me. At first, I thought it was a number of bytes (I
haven't been able to find any documentation that explicitly says whether
it's a number of bytes, a number of records, or what), but now I'm not
sure. Again, I did some experiments and found that there was a bit of
extra overhead with a chunk size of 1, but there really wasn't much
difference between a chunk size of 128, 512, or 2048 (in terms of
writing speed, mind you, there's definitely a difference in file size).
That said, when I tried the same test using a chunk size of 10240 and it
slowed down enough that I didn't bother letting it finish. After
playing a bit more, it seems the largest chunk size I can pick (in
whatever units it happens to be in), is 6553, with it completing in a
reasonable time frame (processing time increases by two orders of
magnitude going from 6553 to 6554).

Hi John,

···

On Apr 19, 2010, at 2:30 AM, Vesa Paatero wrote:

Dear Knut,

Does the limit 6553/6554 depend on the size of the data item in the chunk? Because if the number is larger for smaller data items, then it could be about page swapping (RAM <-> hard disk) when processing large chunks.

Regards,
Vesa Paatero

On Fri, 16 Apr 2010 18:32:31 -0400, John Knutson <jkml@arlut.utexas.edu> wrote:

Chunking puzzles me. At first, I thought it was a number of bytes (I
haven't been able to find any documentation that explicitly says whether
it's a number of bytes, a number of records, or what), but now I'm not
sure. Again, I did some experiments and found that there was a bit of
extra overhead with a chunk size of 1, but there really wasn't much
difference between a chunk size of 128, 512, or 2048 (in terms of
writing speed, mind you, there's definitely a difference in file size).
That said, when I tried the same test using a chunk size of 10240 and it
slowed down enough that I didn't bother letting it finish. After
playing a bit more, it seems the largest chunk size I can pick (in
whatever units it happens to be in), is 6553, with it completing in a
reasonable time frame (processing time increases by two orders of
magnitude going from 6553 to 6554).

  Hmm, can you send me your test program which demonstrates this issue? I'd like to give it a try and see if I can reproduce your problem, so I can determine what's going on.

  Thanks,
    Quincey

It does seem to, yes. I ran a test with a slightly larger compound type (360 bytes vs. 160 or 170), and it started increasing slowly @ around a chunk size of 4K records (as opposed to the 6554 with the 160 byte records).

I fiddled around with it a bit more while monitoring paging (via both vmstat and a graphical tool) and I didn't see any correlation between paging and the performance. My best guess at this point is that it's a cache miss, or the amount of data is hitting the limit of the cache size

···

at that point. Vesa Paatero wrote:

Dear Knut,

Does the limit 6553/6554 depend on the size of the data item in the chunk? Because if the number is larger for smaller data items, then it could be about page swapping (RAM <-> hard disk) when processing large chunks.

Regards,
Vesa Paatero

On Fri, 16 Apr 2010 18:32:31 -0400, John Knutson <jkml@arlut.utexas.edu> wrote:
  

Chunking puzzles me. At first, I thought it was a number of bytes (I
haven't been able to find any documentation that explicitly says whether
it's a number of bytes, a number of records, or what), but now I'm not
sure. Again, I did some experiments and found that there was a bit of
extra overhead with a chunk size of 1, but there really wasn't much
difference between a chunk size of 128, 512, or 2048 (in terms of
writing speed, mind you, there's definitely a difference in file size).
That said, when I tried the same test using a chunk size of 10240 and it
slowed down enough that I didn't bother letting it finish. After
playing a bit more, it seems the largest chunk size I can pick (in
whatever units it happens to be in), is 6553, with it completing in a
reasonable time frame (processing time increases by two orders of
magnitude going from 6553 to 6554).
    
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org