Hello I would like to use h5py to save a lot over 900,000 of csvs into one h5 data set.
I have been trying to figure out how to enable parallel for the past couple days and I haven’t figured it out yet. I am on a windows and normally use anaconda but i’ve switched to Linux since all the resources I’ve seen use linux. Using the Windows Subsystem for Linux.
I followed these instructions: http://depts.washington.edu/cssuwb/wiki/linux_hdf5_installation to install hdf5
I ran this to get MPI.
sudo apt install mpich
From what I can tell i need to do this:
$./configure --enable-parallel --enable-shared
But I don’t know where or how this works.
I am a student pretty new to Linux really frustrated I haven’t figured this out yet since it is probably really easy to do. Can someone please help.
To install HDF5 build with MPI on Ubuntu or Debian, sudo apt install libhdf5-mpi-dev
will give a version to use. h5py is also packaged by Debian, and the version built with MPI support can be installed via sudo apt install python3-h5py-mpi
. If you want to use pip to install python code, see https://docs.h5py.org/en/stable/build.html#custom-installation. The docs about using MPI with h5py are at https://docs.h5py.org/en/stable/mpi.html.
However, I would note that unless you’re familiar with MPI (and are aware of the constraints it puts on your code), I would suggest avoiding using MPI (depending on exactly what you are trying to achieve, splitting your set of csv files in to N chunks which written to N HDF5 files, and then merging those files might be a better strategy).
Thank you for responding! I got it to work!! I was about to give up @aragilar
I am using a fresh install of Ubuntu 20.04.1 LTS (GNU/Linux 4.4.0-19041-Microsoft x86_64)
sudo apt update #Gets access to packages
-
sudo apt install libhdf5-mpi-dev
#grabs hdf5 for parallel
-
h5pcc - showconfig
#checking it. p is important
Shows me parallel HDF5 is turned on yay!
-
sudo apt install python3-pip
#Installing pip3
-
sudo apt-get install -y pkg-config
#need this to download h5py
-
export CC=mpicc
export HDF5_MPI="ON"
pip install --no-binary=h5py h5py
#installing h5py with variables
Just in case there is other newbies out there this is what I did.