Does the MPI driver lock the file?


I implemented data generation using python, h5py and mpi4py. I implemented it with a master/slave approach. So first I create a new slave communicator which excludes the master. I have a file manager that takes that slave communicator and opens the file. Master won’t touch the file. So only the slaves touch the file and only the slaves opened the file. And it works just fine but I noticed something that irritates me a bit.

I implemented a stride which tells the generation script how much work is being sent to each slave from the master. Now I have 4 datasets and each dataset has something like 400k entries. Each slave rank will write to all 4 datasets.

Now if I set the stride to a low value (10), the generation is way faster than if I set it to a big value (1024).

I wasn’t able to find how parallelism is exactly implemented. From the above behaviour it looks like the file is being locked which then blocks my whole programm, especially if the stride is big (more time for the other ranks to run into a lock and be idle inbetween). Is that really the case? I write data continously, so theoretically there is no need for a lock. Is is possible to tell the driver “don’t lock the file”?

What sort of MPI do you have, what about the parallel FS setup? The excerpt is from /etc/openmpi-mca-params.conf OMPI >= v4.0.x with OrangeFS:

# individual | sm | lockedfile
sharedfp = %
io_ompio_bytes_per_agg = 1MB # for jumbo frame ethernet
io_ompio_num_aggregators = %
fs_pvfs2_stripe_size = %
fs_pvfs2_stripe_width = %

then again, I could completely wrong about this.
cheers: steve

I load MPI 4.0.2. I’m on a small university cluster and as a free user you don’t get access to parallel FS, so I work on a normal one. Forgot to mention that.

Thanks for pointing that out.