I implemented data generation using python, h5py and mpi4py. I implemented it with a master/slave approach. So first I create a new slave communicator which excludes the master. I have a file manager that takes that slave communicator and opens the file. Master won’t touch the file. So only the slaves touch the file and only the slaves opened the file. And it works just fine but I noticed something that irritates me a bit.
I implemented a stride which tells the generation script how much work is being sent to each slave from the master. Now I have 4 datasets and each dataset has something like 400k entries. Each slave rank will write to all 4 datasets.
Now if I set the stride to a low value (10), the generation is way faster than if I set it to a big value (1024).
I wasn’t able to find how parallelism is exactly implemented. From the above behaviour it looks like the file is being locked which then blocks my whole programm, especially if the stride is big (more time for the other ranks to run into a lock and be idle inbetween). Is that really the case? I write data continously, so theoretically there is no need for a lock. Is is possible to tell the driver “don’t lock the file”?