Fatal error in MPI_Type_free: Invalid datatype, error stack:
MPI_Type_free(145): MPI_Type_free(datatype_p=0x520c20) failed
MPI_Type_free(96).: Cannot free permanent data type [cli_2]: aborting job:
I'm using mpich2-1.0.8 under OS X 10.5.5 on a single computer. (the same error occurs with openmpi-1.3)
A little investigation shows that the problem occurs in H5Dmpio.c in the MPi_Type_Free calls after line 976 in routine H5D_link_chunk_collective_io. The problem here is that HDF5 is attempting to free a permanent (not derived) datatype (specifically MPI_BYTE) which causes MPI to abort.
If I change the example to independent write then the code runs fine. Does anyone know if this is an OS X specific problem and/or if there are any workarounds?
All the best,
Ricardo
···
---
Prof. Ricardo Fonseca
GoLP - Grupo de Lasers e Plasmas
Instituto de Plasmas e Fusão Nuclear
Instituto Superior Técnico
Av. Rovisco Pais
1049-001 Lisboa
Portugal
Hi Ricardo. I ran this on BlueGene, which is roughly based on 1.0.8,
compiled against HDF5-1.8.2 and did not get this error. I hope other
OS X users can report their experiences.
==rob
···
On Thu, Mar 05, 2009 at 06:13:24PM +0000, Ricardo Fonseca wrote:
Dear all
I'm having problems with parallel hdf5, specifically writing data by
chunk using collective write. Simply compiling and running the example:
Fatal error in MPI_Type_free: Invalid datatype, error stack:
MPI_Type_free(145): MPI_Type_free(datatype_p=0x520c20) failed
MPI_Type_free(96).: Cannot free permanent data type [cli_2]: aborting
job:
I'm using mpich2-1.0.8 under OS X 10.5.5 on a single computer. (the same
error occurs with openmpi-1.3)
--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.
The problem is indeed apple specific, namely the config/apple file. Just adding :
hdf5_mpi_complex_derived_datatype_works='no'
at the end of this file fixes it.
The problem is that the mpi complex derived types that hdf5 tries to use are not supported by mpich (and openmpi). The linux config files know this and automatically set the above flag. The apple config file didn't, so hdf5 tries to call H5D_link_chunk_collective_io (which breaks) rather than H5D_multi_chunk_collective_io_no_opt (which works).
I hope this helps other people trying to use parallel hdf5 under OS X.
All the best,
Ricardo
P.S -> (rob) Thanks for the input, this work is actually in preparation for a BlueGene system. Could you tell me if that flag is set on your configuration? If you just look into $H5DIR/include/H5pubconf.h around line 431 you can check the definition of the H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS macro.
Fatal error in MPI_Type_free: Invalid datatype, error stack:
MPI_Type_free(145): MPI_Type_free(datatype_p=0x520c20) failed
MPI_Type_free(96).: Cannot free permanent data type [cli_2]: aborting
job:
I'm using mpich2-1.0.8 under OS X 10.5.5 on a single computer. (the same
error occurs with openmpi-1.3)
Hi Ricardo. I ran this on BlueGene, which is roughly based on 1.0.8,
compiled against HDF5-1.8.2 and did not get this error. I hope other
OS X users can report their experiences.
==rob
--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B
I am quite interested in knowing more about just what datatype stuff
HDF5 wants to do that MPICH2 cannot handle. If, as you observe, HDF5
tries to free an MPI_BYTE then maybe there's work to be done on both
ends.
Also, since I'm poking around in nearby code, the check for
collective I/O working seems... weird to me? By the MPI standard, all
processes in a communicator must call a collective routine if any of
them call a collective routine. If any of those processes have zero
bytes of I/O, that should be just fine. If not, you've found another
bug in the MPI-IO implementation.
Note how I'm not rushing off to test these settings myself. I fully
understand time and resource constraints. Just making a note of
things to look at "one day" if we need more parallel I/O performance.
==rob
···
On Fri, Mar 06, 2009 at 06:17:37PM +0000, Ricardo Fonseca wrote:
P.S -> (rob) Thanks for the input, this work is actually in preparation
for a BlueGene system. Could you tell me if that flag is set on your
configuration? If you just look into $H5DIR/include/H5pubconf.h around
line 431 you can check the definition of the
H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS macro.
--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.
- BlueGene requires cross compiling, so you'll need to set quite a few
environment variables to get configure to work correctly
- The configure tests for the fortran and C++ bindings are run-time
only tests, and so will only work if you trick configure into
executing its tests on the compute node. At least one person at
argonne know how to do this, but I'm still learning myself.
- Your choice of file system will make a big difference in
experiences. I hope your BlueGene has either GPFS or PVFS, and does
not have Lustre. (This is not an issue with HDF5, but rather with
the MPI-IO library. An active area of effort, though)
If you or your site runs into problems, it's probably best to post
here. I'll be happy to share what I know.
==rob
···
On Fri, Mar 06, 2009 at 06:17:37PM +0000, Ricardo Fonseca wrote:
P.S -> (rob) Thanks for the input, this work is actually in preparation
for a BlueGene system. Could you tell me if that flag is set on your
configuration? If you just look into $H5DIR/include/H5pubconf.h around
line 431 you can check the definition of the
H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS macro.
--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.