HDF5 In-Transit Support


#1

Hi,
We’re building an In-Situ system requiring data transfer through the network, between a simulation and a data analysis tool running alongside it. Does HDF5 have support for this?


#2

Can you tell us a little more about the basics of your setup?

  1. Does your simulation produce data in HDF5? At what rate? Do you need to retain a copy of the data in situ?
  2. Is it streaming data or bursts of structured data?
  3. How, if at all, do the simulation and the analysis tool communicate? Through a file system? (What kind?)
  4. What’s the network connectivity?
  5. How do you define throughput and latency (from the analyst’s perspective) and what do you expect?

Maybe as a conversation starter, have a look at the “Splitter and mirror VFD section” of this video.
There’re obviously many more moving parts here than just HDF5. For certain workloads, HSDS is a fine solution. Custom solutions, e.g., based on ZeroMQ, give you more control. It really depends on the specifics of your problem. G.


#3

Hi gheber,

We have control over what format we use, and don’t need to retain a copy. I think the data will be mostly bursts of structured data. There will be two modes in which the analysis tool and simulation communicate. In the first mode, the simulation is running in a special post-processing mode waiting for requests to load, processing, and transfer data to the analysis tool. In the second mode, the simulation is running, and at each time step data may be transferred to the analysis tool depending on the dynamic workflow. The analysis tool will be running on the same supercomputer or a connected analysis cluster attached to the same local network. The user will interact with the analysis tool remotely, and the analysis tool will interact with the simulation locally. The analysis tool will operate in low latency on smaller processed data that has already arrived (while retaining some amount of it, depending on the user). The data coming to the analysis tool from the simulation should be optimized for throughput.

We have considered ADIOS, but the simulation may soon be rewritten in Regent (https://regent-lang.org/), which currently has HDF5 built in as it’s only IO functionality.

I’ll check out your links.

Thanks!