Dear all,
when compiling parallel hdf5 I get the the following strange error:
make[2]: Entering directory `/usr/bmp/chaste_libs_mvapich/hdf5-1.8.11/src'
LD_LIBRARY_PATH="$LD_LIBRARY_PATH`echo | \
sed -e 's/-L/:/g' -e 's/ //g'`" \
./H5make_libsettings > H5lib_settings.c || \
(test $HDF5_Make_Ignore && echo "*** Error ignored") || \
(rm -f H5lib_settings.c ; exit 1)
/bin/sh: line 4: 78738 Segmentation fault LD_LIBRARY_PATH="$LD_LIBRARY_PATH`echo | sed -e 's/-L/:/g' -e 's/ //g'`" ./H5make_libsettings > H5lib_settings.c
There is one previous post on this list mentioning this problem, did a solution come up in the meantime ?
Thanks in advance,
tariq
Hi Tariq,
Yes this has come up previously.. First if you are cross-compiling, you need to set RUNSERIAL and RUNPARALLEL to whatever you use to launch mpi programs on your systems.
Below is the reply from a user who reported this problem and investigated with a Linux expert to try and figure out what is going on. In short, setting the RTLD_DEEPBIND environment variable to 0 resolved his segfault with mvapich.
Thanks,
Mohamad
···
If you set the environment variable RTLD_DEEPBIND to 0 then the
segfault disappears. I'm trying to better understand this, but works
by altering the behaviour of glibc when loading NSS modules. The
symbol resolution order seems to change with it set (for the better in
this case).
I think I've figured out who is in the "wrong here". This is a result
of a patch SuSE applied to glibc to dlopen NSS libraries with the
RTLD_DEEPBIND flag which puts the symbols inside the library ahead of
the global scope. This results in the behaviour we saw earlier where
the
free() called by the dlopen'd libnss_sss was resolved to the glibc lib
required by libnss_sss instead of the global scope which resolved and
bound free() to mvapich2's libmpi.
The environment variable I found works by disabling this behaviour and
making it the glibc default, where symbols in dlopen()'d libraries
aren't put ahead of the global scope.
SuSE has a legitimate reason for their patch, but the solution to this
issue is to set the RTLD_DEEPBIND environment variable to "0" to
resolve the segfaults.
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Baig, Tariq
Sent: Friday, June 12, 2015 8:52 AM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] secfault when building with mvapich2-1.9a2
Dear all,
when compiling parallel hdf5 I get the the following strange error:
make[2]: Entering directory `/usr/bmp/chaste_libs_mvapich/hdf5-1.8.11/src'
LD_LIBRARY_PATH="$LD_LIBRARY_PATH`echo | \
sed -e 's/-L/:/g' -e 's/ //g'`" \
./H5make_libsettings > H5lib_settings.c || \
(test $HDF5_Make_Ignore && echo "*** Error ignored") || \
(rm -f H5lib_settings.c ; exit 1)
/bin/sh: line 4: 78738 Segmentation fault LD_LIBRARY_PATH="$LD_LIBRARY_PATH`echo | sed -e 's/-L/:/g' -e 's/ //g'`" ./H5make_libsettings > H5lib_settings.c
There is one previous post on this list mentioning this problem, did a solution come up in the meantime ?
Thanks in advance,
tariq