Independent datasets for MPI processes. Progress?


#1

TL;DR

A previous post mentioned that it may become possible to have indpendant datasets for independant processes, reducing the need for collective calls such as when extending a dataset.

That post is now 10yrs old. Has anything changed on this front? The link in the previous post is no longer working.

Full story

I have two seperate processes generating data, one generates values rapidly from an oscilicope, the other generates larger data more slowly from a camera.Therefore they are looping at different speeds.

I would like to write the data from each process into its own data set, this includes extending the data set, which is a collective call. Since the two proceses do not interact with the other’s dataset can this be made indepenant?

Mainly this is desired to save time by preventing processes waiting for each other.


#2

The behaviour didn’t change with HDF5 v1.10.6, here is the test case with H5CPP and combination of C API calls

Did you consider using ZeroMQ to collect your events and record them ? If you have interest, let me know…
best: steve

MWE: mpi-extend

creates a container, and rank many datasets with random sizes using collective calls. Once the rig is set up, by default it extends a single dataset with collectiv ecall to demonstrate fitness of the rig but not the actual problem. By adjusting/uncommenting lines marked with NOTE: one can trigger the code relevant to the OP’s question.

identification:

The MWE indeed shows for phdf5v1.10.6 all processes must participate in H5Dset_extent call, otherwise the program will hang indefinetally.

output:

mpic++ -o mpi-extend.o   -std=c++17 -O3  -I/usr/local/include -Wno-narrowing -c mpi-extend.cpp
mpic++ mpi-extend.o -lz -ldl -lm  -lhdf5 -o mpi-extend	
srun -n 4 -w io ./mpi-extend
[rank]	2	[total elements]	0
[dimensions]	current: {346,0}	maximum: {346,inf}
[selection]	start: {0,0}	end:{345,inf}
[rank]	2	[total elements]	0
[dimensions]	current: {346,0}	maximum: {346,inf}
[selection]	start: {0,0}	end:{345,inf}
[rank]	2	[total elements]	0
[dimensions]	current: {346,0}	maximum: {346,inf}
[selection]	start: {0,0}	end:{345,inf}
[rank]	2	[total elements]	0
[dimensions]	current: {346,0}	maximum: {346,inf}
[selection]	start: {0,0}	end:{345,inf}
{346,0}{346,0}{346,0}{346,0}h5ls -r mpi-extend.h5
/                        Group
/io-00                   Dataset {346, 400/Inf}
/io-01                   Dataset {465, 0/Inf}
/io-02                   Dataset {136, 0/Inf}
/io-03                   Dataset {661, 0/Inf}

workarounds:

  • ZeroMQ based solution with a single writer thread

requirements:

  • h5cpp v1.10.6-1
  • PHDF5 C base library, no high level API or beuilt in C++ API is needed works with: parallel
  • c++17 or higher compiler

#3

I’m not sure I understand the Full story. When you say

do you mean processes in general, or MPI processes, separate MPI applications, or something else?

Are the processes independent in the sense that no synchronization is needed?

OK, they acquire data at different rates, and you would like to put the data into different data sets.
What kind of data rates are you looking at, and what’s the connectivity? Are the devices hooked up to the same host? Local storage or NAS?

Datset extension a la H5Dset_extent is collective, provided we are using MPI. But then we are jumping to threads:

Threads, processes,…, which one?

If they are independent, why not write the datasets into different HDF5 files and then stitch them together with a stub that contains just two external links. Or you can just merge/repack, if you need a single physical file (and compress or make them contiguous, depending on what you are looking for). If you don’t like the (temporary) duplication, then Steven’s ZMQ is a fine option.

If you have time, this sounds like another good topic for our HDF Clinic (next Tuesday).

G.


#4

I am talking about MPI processes. Please ignore my slip-up mentioning threads.

Here is a flow chart of the kind of design I have in mind:

Each dataset is written to by a single process, even if the file is shared by both processes.

Combined-1


#5

Nice figure! (yEd?) Assuming I understand it correctly, my main concern would be the tight coupling between the two processes. (You were planning to use the MPI-IO VFD, right?) For example, at the moment, the dataset extension must be a collective operation. In other words, the ‘Oscilloscope’ rank(s) must participate in the extension of the ‘Cam Data’ dataset and the ‘Camera 1’ rank(s) must participate in the extension of the ‘Scope Data’ dataset, at least as long as those datasets reside in the same file. You could get around this by placing those datasets into separate HDF5 files and then just have two external links in ‘Rec1.h5’. In that case, you wouldn’t need the MPI-IO VFD, unless there were multiple ‘oscilloscope’ or ‘camera’ ranks. In that case, you should consider opening those files on separate MPI communicators. Does that make sense? G.