suggestion re: benchmarks

I've been using h5perf_serial in the process of testing the iRODS VFD that I've been working on, and have ultimately found myself thinking it might be time to revisit my choices for chunking and transfer sizes for our data. Eventually I remembered that throughput wasn't the only or even the primary measure of performance in our situation.

My suggestion is that there also be a performance benchmark that measures latency, i.e., the amount of time it takes to do a single write or read. When you're streaming data like we are, the ideal chunk size for latency may yield very low throughput.

Additionally, while h5perf_serial supports the option of only doing write tests, it would be really handy to have a similar option for only doing read tests - that is, decouple reading and writing to have a better chance of getting read throughput measurements that are really only showing the throughput of the cache (disk, system, whatever it may be).

Hi John,

···

On May 9, 2011, at 3:17 PM, John Knutson wrote:

I've been using h5perf_serial in the process of testing the iRODS VFD that I've been working on, and have ultimately found myself thinking it might be time to revisit my choices for chunking and transfer sizes for our data. Eventually I remembered that throughput wasn't the only or even the primary measure of performance in our situation.

My suggestion is that there also be a performance benchmark that measures latency, i.e., the amount of time it takes to do a single write or read. When you're streaming data like we are, the ideal chunk size for latency may yield very low throughput.

Additionally, while h5perf_serial supports the option of only doing write tests, it would be really handy to have a similar option for only doing read tests - that is, decouple reading and writing to have a better chance of getting read throughput measurements that are really only showing the throughput of the cache (disk, system, whatever it may be).

  We're planning to update the parallel I/O performance benchmark and will keep this in mind, trying to migrate the changes back to the serial version also.

  Thanks,
    Quincey

Hi, Quincey,

Quincey Koziol wrote:

We're planning to update the parallel I/O performance benchmark and will keep this in mind, trying to migrate the changes back to the serial version also.

I actually found that h5perf_serial had some undocumented command-line options that would get me 80% of the way there on per-write latency, give or take. Using "-D t" on the command-line causes h5perf_serial to print out the total time taken by the do_write function, and if you divide that by (dataset size / transfer buffer size), you get the time per write. I think.

example:
h5perf_serial -D t -w -e 1051200,2,128 -x 105120,2,128 -v irods -c 105120,2,128 -t

output:
Dataset size=1051200 2 128
Transfer buffer size=105120 2 128
                Average Throughput: 34.30 MB/s ( 7.481 s)

7.481s / (1051200 / 105120) = .7481s / write

approximately :slight_smile:

Another piece of information that would be useful for the h5perf benchmark/test would be the resulting file sizes for the write operation. From this, one could get a clear idea of just how much overhead a particular dataset layout was costing, both in the relative sense (by comparing one HDF5 layout to another) and in an "absolute" sense (by comparing an HDF5 layout to the raw posix file size).

Quincey Koziol wrote:

···

Hi John,

On May 9, 2011, at 3:17 PM, John Knutson wrote:

I've been using h5perf_serial in the process of testing the iRODS VFD that I've been working on, and have ultimately found myself thinking it might be time to revisit my choices for chunking and transfer sizes for our data. Eventually I remembered that throughput wasn't the only or even the primary measure of performance in our situation.

My suggestion is that there also be a performance benchmark that measures latency, i.e., the amount of time it takes to do a single write or read. When you're streaming data like we are, the ideal chunk size for latency may yield very low throughput.

Additionally, while h5perf_serial supports the option of only doing write tests, it would be really handy to have a similar option for only doing read tests - that is, decouple reading and writing to have a better chance of getting read throughput measurements that are really only showing the throughput of the cache (disk, system, whatever it may be).
    
  We're planning to update the parallel I/O performance benchmark and will keep this in mind, trying to migrate the changes back to the serial version also.

  Thanks,
    Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi John,

Hi, Quincey,

Quincey Koziol wrote:

We're planning to update the parallel I/O performance benchmark and will keep this in mind, trying to migrate the changes back to the serial version also.

I actually found that h5perf_serial had some undocumented command-line options that would get me 80% of the way there on per-write latency, give or take. Using "-D t" on the command-line causes h5perf_serial to print out the total time taken by the do_write function, and if you divide that by (dataset size / transfer buffer size), you get the time per write. I think.

  Ah, good find - I forgot about those options. :slight_smile:

    Quincey

···

On May 11, 2011, at 1:47 PM, John Knutson wrote:

example:
h5perf_serial -D t -w -e 1051200,2,128 -x 105120,2,128 -v irods -c 105120,2,128 -t

output:
Dataset size=1051200 2 128
Transfer buffer size=105120 2 128
              Average Throughput: 34.30 MB/s ( 7.481 s)

7.481s / (1051200 / 105120) = .7481s / write

approximately :slight_smile:

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org