Hey,
Just to add another datapoint, I wrote up a little socket performance test that you can find here: hsds/tests/perf/socket at master · HDFGroup/hsds · GitHub. The test seeks to measure what the maximum performance we can get writing to sockets using a Python client and server.
There are a few different options that can be configured: how many bytes to write, how many bytes for each batch, use TCP vs Unix domain sockets, use shared memory or not.
Here are the results I got:
Observations:
- With regular TCP sockets, the max performance was around 1GB/s (compared with the iperf3 numbers of ~40GB/s. Don’t know if this a Python limitation or something else, but I expect that puts an upper bound on HSDS throughput to a single client.
- Unix Domain sockets turned out to be quite a bit slower than TCP sockets (for local host connections). This was surprising since I was seeing 20% performance with HSDS tests using domain sockets. Maybe there’s more latency in setting up TCP sockets?
- Increasing the batch size (the number of bytes the client reads from the socket in one call) can improve performance by 2x. Not sure what value is being used now in HSDS (it’s controlled by aiohttp & the Python high-level socket interface), I’ll look into it.
- Passing data using a shared memory buffer (for localhost only) increases performance by about 4x with these larger transfers. The socket is use just to communicate the name of the shared memory buffer to the client.