secfault when building with mvapich2-1.9a2

Dear all,
when compiling parallel hdf5 I get the the following strange error:

make[2]: Entering directory `/usr/bmp/chaste_libs_mvapich/hdf5-1.8.11/src'

LD_LIBRARY_PATH="$LD_LIBRARY_PATH`echo | \
                sed -e 's/-L/:/g' -e 's/ //g'`" \
         ./H5make_libsettings > H5lib_settings.c || \
            (test $HDF5_Make_Ignore && echo "*** Error ignored") || \
            (rm -f H5lib_settings.c ; exit 1)
/bin/sh: line 4: 78738 Segmentation fault LD_LIBRARY_PATH="$LD_LIBRARY_PATH`echo | sed -e 's/-L/:/g' -e 's/ //g'`" ./H5make_libsettings > H5lib_settings.c

There is one previous post on this list mentioning this problem, did a solution come up in the meantime ?

Thanks in advance,

tariq

Hi Tariq,

Yes this has come up previously.. First if you are cross-compiling, you need to set RUNSERIAL and RUNPARALLEL to whatever you use to launch mpi programs on your systems.
Below is the reply from a user who reported this problem and investigated with a Linux expert to try and figure out what is going on. In short, setting the RTLD_DEEPBIND environment variable to 0 resolved his segfault with mvapich.

Thanks,
Mohamad

···

If you set the environment variable RTLD_DEEPBIND to 0 then the

segfault disappears. I'm trying to better understand this, but works

by altering the behaviour of glibc when loading NSS modules. The

symbol resolution order seems to change with it set (for the better in

this case).

I think I've figured out who is in the "wrong here". This is a result

of a patch SuSE applied to glibc to dlopen NSS libraries with the

RTLD_DEEPBIND flag which puts the symbols inside the library ahead of

the global scope. This results in the behaviour we saw earlier where

the

free() called by the dlopen'd libnss_sss was resolved to the glibc lib

required by libnss_sss instead of the global scope which resolved and

bound free() to mvapich2's libmpi.

The environment variable I found works by disabling this behaviour and

making it the glibc default, where symbols in dlopen()'d libraries

aren't put ahead of the global scope.

SuSE has a legitimate reason for their patch, but the solution to this

issue is to set the RTLD_DEEPBIND environment variable to "0" to

resolve the segfaults.

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Baig, Tariq
Sent: Friday, June 12, 2015 8:52 AM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] secfault when building with mvapich2-1.9a2

Dear all,
when compiling parallel hdf5 I get the the following strange error:

make[2]: Entering directory `/usr/bmp/chaste_libs_mvapich/hdf5-1.8.11/src'

LD_LIBRARY_PATH="$LD_LIBRARY_PATH`echo | \
                sed -e 's/-L/:/g' -e 's/ //g'`" \
         ./H5make_libsettings > H5lib_settings.c || \
            (test $HDF5_Make_Ignore && echo "*** Error ignored") || \
            (rm -f H5lib_settings.c ; exit 1)
/bin/sh: line 4: 78738 Segmentation fault LD_LIBRARY_PATH="$LD_LIBRARY_PATH`echo | sed -e 's/-L/:/g' -e 's/ //g'`" ./H5make_libsettings > H5lib_settings.c

There is one previous post on this list mentioning this problem, did a solution come up in the meantime ?

Thanks in advance,

tariq