Parallel hdf5 problem in 1.8.0 through 1.8.2

Dear all

I'm having problems with parallel hdf5, specifically writing data by chunk using collective write. Simply compiling and running the example:

http://www.hdfgroup.org/HDF5/Tutor/examples/parallel/Hyperslab_by_chunk.c

fails with an MPI error:

Fatal error in MPI_Type_free: Invalid datatype, error stack:
MPI_Type_free(145): MPI_Type_free(datatype_p=0x520c20) failed
MPI_Type_free(96).: Cannot free permanent data type [cli_2]: aborting job:

I'm using mpich2-1.0.8 under OS X 10.5.5 on a single computer. (the same error occurs with openmpi-1.3)

A little investigation shows that the problem occurs in H5Dmpio.c in the MPi_Type_Free calls after line 976 in routine H5D_link_chunk_collective_io. The problem here is that HDF5 is attempting to free a permanent (not derived) datatype (specifically MPI_BYTE) which causes MPI to abort.

If I change the example to independent write then the code runs fine. Does anyone know if this is an OS X specific problem and/or if there are any workarounds?

All the best,
Ricardo

···

---
Prof. Ricardo Fonseca

GoLP - Grupo de Lasers e Plasmas
Instituto de Plasmas e Fusão Nuclear
Instituto Superior Técnico
Av. Rovisco Pais
1049-001 Lisboa
Portugal

tel: +351 21 8419202
fax: +351 21 8464455
web: http://cfp.ist.utl.pt/golp/

Hi Ricardo. I ran this on BlueGene, which is roughly based on 1.0.8,
compiled against HDF5-1.8.2 and did not get this error. I hope other
OS X users can report their experiences.

==rob

···

On Thu, Mar 05, 2009 at 06:13:24PM +0000, Ricardo Fonseca wrote:

Dear all

I'm having problems with parallel hdf5, specifically writing data by
chunk using collective write. Simply compiling and running the example:

http://www.hdfgroup.org/HDF5/Tutor/examples/parallel/Hyperslab_by_chunk.c

fails with an MPI error:

Fatal error in MPI_Type_free: Invalid datatype, error stack:
MPI_Type_free(145): MPI_Type_free(datatype_p=0x520c20) failed
MPI_Type_free(96).: Cannot free permanent data type [cli_2]: aborting
job:

I'm using mpich2-1.0.8 under OS X 10.5.5 on a single computer. (the same
error occurs with openmpi-1.3)

--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Dear all

The problem is indeed apple specific, namely the config/apple file. Just adding :

hdf5_mpi_complex_derived_datatype_works='no'

at the end of this file fixes it.

The problem is that the mpi complex derived types that hdf5 tries to use are not supported by mpich (and openmpi). The linux config files know this and automatically set the above flag. The apple config file didn't, so hdf5 tries to call H5D_link_chunk_collective_io (which breaks) rather than H5D_multi_chunk_collective_io_no_opt (which works).

I hope this helps other people trying to use parallel hdf5 under OS X.

All the best,
Ricardo

P.S -> (rob) Thanks for the input, this work is actually in preparation for a BlueGene system. Could you tell me if that flag is set on your configuration? If you just look into $H5DIR/include/H5pubconf.h around line 431 you can check the definition of the H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS macro.

web: http://cfp.ist.utl.pt/golp/

···

On Mar 5, 2009, at 21:13 , Robert Latham wrote:

On Thu, Mar 05, 2009 at 06:13:24PM +0000, Ricardo Fonseca wrote:

Dear all

I'm having problems with parallel hdf5, specifically writing data by
chunk using collective write. Simply compiling and running the example:

http://www.hdfgroup.org/HDF5/Tutor/examples/parallel/Hyperslab_by_chunk.c

fails with an MPI error:

Fatal error in MPI_Type_free: Invalid datatype, error stack:
MPI_Type_free(145): MPI_Type_free(datatype_p=0x520c20) failed
MPI_Type_free(96).: Cannot free permanent data type [cli_2]: aborting
job:

I'm using mpich2-1.0.8 under OS X 10.5.5 on a single computer. (the same
error occurs with openmpi-1.3)

Hi Ricardo. I ran this on BlueGene, which is roughly based on 1.0.8,
compiled against HDF5-1.8.2 and did not get this error. I hope other
OS X users can report their experiences.

==rob

--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B

This flag is indeed unset in my HDF5 config.

I am quite interested in knowing more about just what datatype stuff
HDF5 wants to do that MPICH2 cannot handle. If, as you observe, HDF5
tries to free an MPI_BYTE then maybe there's work to be done on both
ends.

Also, since I'm poking around in nearby code, the check for
collective I/O working seems... weird to me? By the MPI standard, all
processes in a communicator must call a collective routine if any of
them call a collective routine. If any of those processes have zero
bytes of I/O, that should be just fine. If not, you've found another
bug in the MPI-IO implementation.

Note how I'm not rushing off to test these settings myself. I fully
understand time and resource constraints. Just making a note of
things to look at "one day" if we need more parallel I/O performance.

==rob

···

On Fri, Mar 06, 2009 at 06:17:37PM +0000, Ricardo Fonseca wrote:

P.S -> (rob) Thanks for the input, this work is actually in preparation
for a BlueGene system. Could you tell me if that flag is set on your
configuration? If you just look into $H5DIR/include/H5pubconf.h around
line 431 you can check the definition of the
H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS macro.

--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

A couple more notes for HDF5 on BlueGene

- BlueGene requires cross compiling, so you'll need to set quite a few
  environment variables to get configure to work correctly

- The configure tests for the fortran and C++ bindings are run-time
  only tests, and so will only work if you trick configure into
  executing its tests on the compute node. At least one person at
  argonne know how to do this, but I'm still learning myself.

- Your choice of file system will make a big difference in
  experiences. I hope your BlueGene has either GPFS or PVFS, and does
  not have Lustre. (This is not an issue with HDF5, but rather with
  the MPI-IO library. An active area of effort, though)

If you or your site runs into problems, it's probably best to post
here. I'll be happy to share what I know.

==rob

···

On Fri, Mar 06, 2009 at 06:17:37PM +0000, Ricardo Fonseca wrote:

P.S -> (rob) Thanks for the input, this work is actually in preparation
for a BlueGene system. Could you tell me if that flag is set on your
configuration? If you just look into $H5DIR/include/H5pubconf.h around
line 431 you can check the definition of the
H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS macro.

--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.