Local HSDS performance vs local HDF5 files

Hi Dave,

I have tested what you asked, these are the results:

Request data and send to text-file the response

Command:

curl -g --request GET 'http://localhost:5101/datasets/d-6ceed2e2-57cf5573-2fab-877cc5-3770b4/value?select=[0:4000000,:]&domain=/home/test_user1/testFile_fromPython01.h5' --header 'Accept: application/octet-stream' --header 'Authorization: Basic dGVzdF91c2VyMTp0ZXN0' -o ~/receivedData.txt -w "@curlFormat.txt"

Results:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  305M  100  305M    0     0   113M      0  0:00:02  0:00:02 --:--:--  113M
        
        http_version:  1.1
       response_code:  200
       size_download:  320000000 bytes
      speed_download:  119136262,000 bytes/s
     time_namelookup:  0,000992 s
        time_connect:  0,001099 s
     time_appconnect:  0,000000 s
    time_pretransfer:  0,001134 s
       time_redirect:  0,000000 s
  time_starttransfer:  1,078983 s
                     ----------
          time_total:  2,686430 s

The transfer speed is:

320MB / (2.6864 - 1.0789) = 200 MB/s

Request data and send to /dev/null the response

Command:

curl -g --request GET 'http://localhost:5101/datasets/d-6ceed2e2-57cf5573-2fab-877cc5-3770b4/value?select=[0:4000000,:]&domain=/home/test_user1/testFile_fromPython01.h5' --header 'Accept: application/octet-stream' --header 'Authorization: Basic dGVzdF91c2VyMTp0ZXN0' -o /dev/null -w "@curlFormat.txt"

Results:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  305M  100  305M    0     0   148M      0  0:00:02  0:00:02 --:--:--  148M
        
        http_version:  1.1
       response_code:  200
       size_download:  320000000 bytes
      speed_download:  156173743,000 bytes/s
     time_namelookup:  0,000875 s
        time_connect:  0,000999 s
     time_appconnect:  0,000000 s
    time_pretransfer:  0,001050 s
       time_redirect:  0,000000 s
  time_starttransfer:  1,002960 s
                     ----------
          time_total:  2,049349 s

The transfer speed is:

320MB / (2.0493 - 1.00296) = 305 MB/s

Benchmarking

The response times shown previously are variable. If I run several times each command the differences are remarkable. However, it is true that the /dev/null option is faster in all the repetitions.

Therefore, I have used a HTTP benchmark tool (Apache Benchmark) to obtain more accurate results. (apt-get install apache2-utils)

Command:

$ ab -n 50 -c 1 -k -H 'Accept: application/octet-stream' -H 'Authorization: Basic dGVzdF91c2VyMTp0ZXN0' 'http://localhost:5101/datasets/d-6ceed2e2-57cf5573-2fab-877cc5-3770b4/value?select=[0:4000000,:]&domain=/home/test_user1/testFile_fromPython01.h5'

This command performs 50 GET requests and measures the elapsed times. The results are the following:

Results:

Server Software:        Python/3.8
Server Hostname:        localhost
Server Port:            5101

Document Path:          /datasets/d-6ceed2e2-57cf5573-2fab-877cc5-3770b4/value?select=[0:4000000,:]&domain=/home/test_user1/testFile_fromPython01.h5
Document Length:        320000000 bytes

Concurrency Level:      1
Time taken for tests:   134.357 seconds
Complete requests:      50
Failed requests:        0
Keep-Alive requests:    50
Total transferred:      16000017500 bytes
HTML transferred:       16000000000 bytes
Requests per second:    0.37 [#/sec] (mean)
Time per request:       2687.132 [ms] (mean)
Time per request:       2687.132 [ms] (mean, across all concurrent requests)
Transfer rate:          116295.13 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       0
Processing:  1614 2687 706.5   2675    4424
Waiting:      829 1822 577.1   1731    3321
Total:       1614 2687 706.5   2675    4424

Percentage of the requests served within a certain time (ms)
  50%   2675
  66%   2801
  75%   2968
  80%   3218
  90%   3855
  95%   4111
  98%   4424
  99%   4424
 100%   4424 (longest request)

An explanation of the returned data (Times):

  • Connect: the network latency. It is almost 0 because we are working locally (localhost).
  • Processing: Time to receive the full response after connection was opened. Since in this case the Connect time is 0, this Processing time is equal to the Total time.
  • Waiting: Time-to-first-byte after the request was sent. Therefore, this includes the time that the server (HSDS) requires to prepare/obtain the data. The results show that the HSDS requires around 1.73 seconds to read the data from disk and prepare the response.
  • The amount of time required to transfer-download the data is the Total time minus the Waiting time. Hence, the results show that HSDS is sending the data with a transfer rate of:
320 MB / (2.675 - 1.731) seconds =  339 MB/s

In the next post I will show the results when a different data service is used instead of HSDS.