HDF5 Parallel writing to single file in different groups from different python programms

We have started using Hdf5 file for saving the data.
Data Received from different source of python programs, each python program executes on different Hardware but all are connected in Network(ethernet).
So we want to write all the received data into a single Hdf5 file by creating separate independent group for each python program.
we are using ‘MPI’ python package for this purpose.
But the problem we are unable to write data in parallel. whichever group created first/last able to write to file.
Is it possible to write to a single .h5 file from different python programs.
If possible How we have to do that, can you share the example to achieve this.

  Software versions used :
   Ubuntu : 16.0.4
   Python : 3.5
  MPI : MPI4_py(is there any alternate other than MPI)
   Hdf5 : 1.8.20(Already Enabled --enable parallel, --enable --shared, --enable thread safe) do we missed any ?
   H5py : 2.7.1
Is parallel writing is possible in a single file?

We need to append data from multiple python programs to a h5 file.
please suggest the best way to do it.

Hello Nagendar,

It looks like you need multiple independent writers to one file that HDF5 doesn’t support yet.

If you use MPI, you will need to have one python program and use HDF5 parallel programming model (see HDF5 Parallel Tutorial) to write your data.

Thank you!

Elena

Dear Elena,
Thanks for your reply.
Can you share a example program for parallel writing of H5 file

Dear Elina,

Can you provide a sample program in python.

Thank you…

Hi Elena,
Thanks for your replay.
I have tried the example you mentioned below problem i faced . if Single python program try access a H5 file using multi threading concept.
The first group data set data is updated properly, but the second, third data group data set are corrupted.
Can you share a example code for parallel writing of H5 from single python program …

Dear Elina,

Is it possible to create the data sets dynamically.
In my case Number of groups are not fixed, Number of groups will change every time. data sets created in each group is not fixed. I mean number of data sets created based on time & data reception.
So we have to create the data sets dynamically and write data into them. is it possible ???
please help.

Dear Negendar,

Unfortunately, I don’t have any examples of MPI Python code. May be someone on this list could provide you with an example or you may be you post your question on the h5py mailing list?

Sorry!

Elena

Hi Nagendar,

I am not sure how MT Python works. Could you please share your program? If you were using multi-threaded C program and thread-safe HDF5 library, you should be able to do what you are doing. Once again, may be people on the h5py mailing list will be more helpful. Sorry!

Elena

I believe the example, in python using MPI, you are looking for can be found here:
http://docs.h5py.org/en/latest/mpi.html

The Python Global Interpreter Lock or GIL, in simple words, is a mutex (or a lock) that allows only one thread to hold the control of the Python interpreter. All the GIL does is make sure only one thread is executing Python code at a time; control still switches between threads. What the GIL prevents then, is making use of more than one CPU core or separate CPUs to run threads in parallel.

Python threading is great for creating a responsive GUI, or for handling multiple short web requests where I/O is the bottleneck more than the Python code. It is not suitable for parallelizing computationally intensive Python code, stick to the multiprocessing module for such tasks or delegate to a dedicated external library. For actual parallelization in Python, you should use the multiprocessing module to fork multiple processes that execute in parallel (due to the global interpreter lock, Python threads provide interleaving, but they are in fact executed serially, not in parallel, and are only useful when interleaving I/O operations). However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.