Parallel I/O does not support filters yet

Hi,

I’ve got this error message (in the subject of this topic) when I try to setup compression on datasets in the MPI environment.
Is there any workaround to compress datasets in parallel I/O or any plans to support this in C API?

Regards,
Rafal

I thought 1.10.1 does support filters in parallel. If not, I know THG has been working to develop this and it should be available in a release soon. That said, I’ve also heard it starts to run into scaling issues at around 10,000 MPI ranks.

Hello,

No, it will be available in HDF5 1.10.2. Please download the latest snapshot of 1.10 ftp://gamma.hdfgroup.org/pub/outgoing/hdf5/snapshots/v110/ or checkout the source from bitbucket develop or 1_10 branch.
See https://bitbucket.hdfgroup.org/projects/HDFFV/repos/hdf5/browse

Thank you!

Elena

We didn’t have a chance to test with more than 6,000 MPI ranks. Scaling issues are unknown at this point.

As with any new feature, one should expect that tuning will be required :slight_smile:

Thank you!

Elena

Thank you for this advice. It’s nice to hear this feature is planned to be released in the production soon.

I’ve just cloned “hdf5_1_10” branch from the BitBucket repo and try to compile it. First of all I’ve had to turn ON the “parallel” switch on cmake command by:

-DHDF5_ENABLE_PARALLEL:BOOL=ON -DMPIEXEC_MAX_NUMPROCS:STRING=4

But then I’ve had an error message:

“Parallel and C++ options are mutually exclusive”

So I decided to compile it with one more cmake option:

-DHDF5_BUILD_CPP_LIB:BOOL=OFF

But then I have got the following error message with cmake command:

CMake Error at config/cmake_ext_mod/FindMPI.cmake:338 (separate_arguments):
separate_arguments given unknown argument NATIVE_COMMAND

I didn’t find this NATIVE_COMMAND cmake variable anywhere except this FindMPI module…
How to fix that problem and successfully build this pre-release HDF5 library?

Regards,
Rafal

I’ve tried also the first mentioned link, so the FTP repo and the latest snapshot found there. Unfortunately the same problem during cmake configuration, so after the command:

cmake -DHDF5_ENABLE_PARALLEL:BOOL=ON -DMPIEXEC_MAX_NUMPROCS:STRING=4 -DHDF5_BUILD_CPP_LIB:BOOL=OFF <path_to_source>

I have an error:

CMake Error at config/cmake_ext_mod/FindMPI.cmake:338 (separate_arguments):
separate_arguments given unknown argument NATIVE_COMMAND

Could you please help me how to fix this and successfully build custom 1.10.2 version of the hdf5 library?
Maybe any workaround for this configuration problem?

Regards,
Rafal

Dear Rafal,

We will investigate the issue.

Would it be possible for you to build parallel HDF5 using autoconf following the commands below

setenv CC mpicc (or other appropriate command for the shell you use) and then

./configure --enable-parallel

Thank you!

Elena

Rafal

              March 9
          I’ve tried also the

first mentioned link, so the FTP repo and the latest
snapshot found there. Unfortunately the same problem
during cmake configuration, so after the command:

          cmake

-DHDF5_ENABLE_PARALLEL:BOOL=ON
-DMPIEXEC_MAX_NUMPROCS:STRING=4
-DHDF5_BUILD_CPP_LIB:BOOL=OFF <path_to_source>

I have an error:

          CMake Error at

config/cmake_ext_mod/FindMPI.cmake:338
(separate_arguments):

          separate_arguments given unknown argument NATIVE_COMMAND

You have an older version of CMake: replace NATIVE_COMMAND
with UNIX_COMMAND if you’re on UNIX, WINDOWS_COMMAND
if you’re on Windows. If you support both, you’ll need to have a
conditional based on CMAKE_SYSTEM_NAME, something like:

if (NOT 3.9 VERSION_GREATER CMAKE_VERSION)
  set(sep_arg_comm NATIVE_COMMAND)
elseif (CMAKE_SYSTEM_NAME STREQUAL "Windows")
  set(sep_arg_comm WINDOWS_COMMAND)
else()
  set(sep_arg_comm UNIX_COMMAND)
endif()
...
separate_arguments(VAR ${sep_arg_comm} ...)

Best,

Chris.

If you are using CMake 3.10, just remove the FindMPI.cmake file from the cmake_ext_mod folder in hdf5 source.
This was a temporary solution for older CMake versions.

That is correct - that was an issue with an old version of Cmake.
I’m working on Cygwin environment on Windows 10 and the latest version of Cmake available in Cygwin by default is 3.6, so I’ve had to manually install Cmake 3.10 and now I can successfully configure, build and install HDF5 1.10.2, BUT…

Unfortunately there is still something wrong which does not allow me to use dataset compression of HDF files in parallel (MPI) environment.

For the time being (without chunks and filters) I’m following the normal (parallel) scenario: I’m creating a given dataset calling collective I/O in all processes and then calling individual H5Dwrite() in each process to write their own data to this dataset (which is different and unique in each process).
But when I turn on the chunks and compression filter for this given dataset, I have the following errors:

H5Dwrite(): can’t prepare for writing data
H5D__write(): unable to adjust I/O info for parallel I/O
H5D__ioinfo_adjust(): Can’t perform independent write with filters in pipeline.

Is there any change needed in this scenario?

Regards,
Rafal

Hi,

Did you select collective writes by calling H5Pset_dxpl_mpio(propid, H5FD_MPIO_COLLECTIVE) on a property list of class H5P_DATASET_XFER
that you passed as part of the H5Dwrite() call?

Best,

Chris.

Well… I did not, but I’m a bit confused now…
I don’t want to call collective dataset writes, because that means all processes need to know all the other processes data which should be written to the given dataset - is that correct?
According to collective calling requirements (https://support.hdfgroup.org/HDF5/doc/RM/CollectiveCalls.html) H5Dwrite() function is NOT required to be called collectively, which makes sense, because in that way each process can have its own data to write and knows nothing about all the others data.
H5Dwrite() function does not operate on metadata of the HDF file (like H5Dcreate() does), so why I should call it collectively?

When I modify my code as you propose, so I’ve added this property list and use it for write call, all processes hung on H5Dwrite() function and I guess it’s because it is not exactly collective call since each process passes it’s own data to write.

How to solve the logic/scenario without sharing ALL data between ALL processes (which makes no sense for parallel I/O any more…)

Rafal

Hi,

As I understand it, every rank must issue a collective write call to the same dataset, but they don’t have to write the same data to the same place, or even any data at all (e.g. NULL dataspace). Collective writes are, however, a prerequisite for using filters with parallel I/O.

As an addendum: when we implemented collective writes, we went to a system that, rather than being iterative (is my rank due to write these data or not), every rank calculates the correct numerology to allow the desired data to be written (or not) by each rank in lockstep. It’s trickier to set up, but it is possible.

Unfortunately I cannot achieve what you suggest and I’m sorry but I don’t fully understand what do you mean in your “addendum” (could you please provide me some example?).

What I want to do is to be able to write portion of data to different compressed datasets in each rank, but even when I call H5Dwrite() function collectively it seems it has to have the same memspace and filespace parameters - otherwise this call is blocked in each rank (so treated as not collective).
Let me explain on example:

Let’s say we have two ranks: #1 and #2.
Rank #1 is going to write its data to compressed dataset named /rank1_data.
Rank #2 is going to write its data to compressed dataset named /rank2_data.
Rank #1 does not know anything about data that has to be written by rank #2, but rank #1 knows that /rank2_data dataset has to created collectively. And vice versa…
So both ranks have all information about metadata, but have no information about other data then its own.

So now both ranks create both datasets collectively:
/rank1_data
/rank2_data
And now rank #1 tries to write its data to /rank1_data dataset and for this operation (to call it collectively, as it is required when we want to apply compression filter) rank #2 creates H5S_NULL dataspace and use it as both “memspace” and “filespace” in H5Dwrite() function call.
Unfortunately (I’ve just checked) this call seems to be seen by the HDF5 library as NOT collective (since it has different memspace and filespace in different ranks) because this call hangs…

Does anyone can help me and provide working solution of this scenario?
The point is to NOT share data that has to be written by each rank between all the ranks as this solution has no more sense for parallel I/O calls I guess…

Regards,
Rafal

Hi,

First, I should say that I misspoke: an H5S_NULL dataspace is not allowed and will trigger an assert when compiling against a debug HDF5 library. According to a HDF5 expert, a collective NOP write to a dataset from a rank should using a H5Sselect_none() on the file dataspace, and the memory dataspace should be created with H5Screate_simple() with an empty shape (0,…) and maxshape.

In the scenario you describe, all ranks should create both datasets as you already do, and call H5Dwrite() in collective mode for both datasets, but the ranks with data to write to dataset 1 only should still call H5Dwrite() on dataset 2 with the dataspaces configured as above. Vice versa for those ranks writing only to dataset 2.

Attached find test-no-write-ranks.c, a C-language, single-dataset example that, when run with more than one rank, has a NOP write to the dataset from rank 1. For example:

mpirun -np 5 test-no-write-ranks
h5dump test-no-write-ranks.hdf5

will show you that a section of the dataset has been left empty, where the rest has been filled with formulaic data according to an easily-verifiable pattern.

You may also find the HDF5 support page on collective operations useful when verifying which operations need to be collective (i.e. issued from all ranks), and which do not. Collective writes are not referenced on that page, however.

Finally, my addendum regarding pre-calculating does not (I think) apply to your case, but rather to cases where different ranks wish to write to different parts of the same dataset, and occasionally a rank may not need to write to the dataset at all, which is the case for our scenario: a parallel file concatenation program that concatenates identically shaped datasets (to within the outermost extent) from different files to a single extensible dataset in the output file. If you are interested in the numerology involved there as an example, I would be happy to provide an annotated example under separate cover.

Hi Chris,

Thank you very much for your help! It works now!
That’s correct, when the given rank has nothing to write to the given dataset and we still need to call H5Dwrite() collectively, this rank should select H5Sselect_none() on filespace and should use 0-dimentions on memspace. Then the last parameter of H5Dwrite() (data buffer) can be anything or even NULL - data is not written and function is properly called collectively.
Thank you!

Regards,
Rafal