Error makecheck HDF5


#1

Hi all,

I am getting an error while doing make check on hdf5, the error is:

===================================
PHDF5 tests detected 1536 errors


Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[37185,1],0]
Exit code: 1

Command exited with non-zero status 1
0.77user 0.44system 0:01.50elapsed 81%CPU (0avgtext+0avgdata 185496maxresident)k
0inputs+70640outputs (2628major+56783minor)pagefaults 0swaps
make[4]: *** [Makefile:1577: testphdf5.chkexe_] Error 1
make[4]: Leaving directory ‘/BIG_6_9/opt_shared/hdf5-1.14.0/testpar’
make[3]: *** [Makefile:1710: build-check-p] Error 1
make[3]: Leaving directory ‘/BIG_6_9/opt_shared/hdf5-1.14.0/testpar’
make[2]: *** [Makefile:1558: test] Error 2
make[2]: Leaving directory ‘/BIG_6_9/opt_shared/hdf5-1.14.0/testpar’
make[1]: *** [Makefile:1358: check-am] Error 2
make[1]: Leaving directory ‘/BIG_6_9/opt_shared/hdf5-1.14.0/testpar’
make: *** [Makefile:729: check-recursive] Error 1

has anyone any idea what the issue is?

I really appreciate any thoughts,

Best,
Shima


#2

What’s the OS/compiler/MPI version?
Can you capture the whole output (stdout and stderr) and attach? The large number of errors suggests that the tests are not running because of a resource error or missing dependencies.

G.


#3

mpiexec (OpenRTE) 4.1.4
seems like i cannot attech the file. this is the link to a drive where i put the whole error:
https://drive.google.com/drive/folders/1jCIBEbMQ0_qkxxxY3WrOuC292MH-PeKk?usp=sharing

Thanks


#4

It looks like things go sour in the atomicity test (around line 12041):

Testing  -- dataset atomic updates (atomicity) 
Testing  -- dataset atomic updates (atomicity) 
Testing  -- dataset atomic updates (atomicity) 
Testing  -- dataset atomic updates (atomicity) 
Testing  -- dataset atomic updates (atomicity) 
Testing  -- dataset atomic updates (atomicity) 
Atomicity Test Failed Process 2: read_buf[1536] is 0, should be 5
Atomicity Test Failed Process 2: read_buf[1537] is 0, should be 5
Atomicity Test Failed Process 2: read_buf[1538] is 0, should be 5
Atomicity Test Failed Process 2: read_buf[1539] is 0, should be 5
Atomicity Test Failed Process 2: read_buf[1540] is 0, should be 5
Atomicity Test Failed Process 2: read_buf[1541] is 0, should be 5
...

What kind of file system are you running this on?

G.


#5

It’s also weird that the script says

All tests were successful.

...

===================================
***PHDF5 tests detected 512 errors***
===================================

That’s maybe our weird sense of humor. :thinking:

G.


#6

This is the file system that I am running:
[skasaei@lange testpar]$ df -T
Filesystem Type 1K-blocks Used Available Use% Mounted on
devtmpfs devtmpfs 4096 0 4096 0% /dev
tmpfs tmpfs 98313056 0 98313056 0% /dev/shm
tmpfs tmpfs 39325224 33952 39291272 1% /run
/dev/sdc4 xfs 451400192 15097848 436302344 4% /
/dev/sdc2 xfs 957440 313736 643704 33% /boot
/dev/sdc1 vfat 97062 7114 89948 8% /boot/efi
/dev/sdb1 ext4 6974131456 32 6622580016 1% /BIG_2_5
/dev/sda1 ext4 6974131456 2579700 6620000348 1% /BIG_6_9
tmpfs tmpfs 19662608 56 19662552 1% /run/user/42
tmpfs tmpfs 19662608 40 19662568 1% /run/user/0
tmpfs tmpfs 19662608 40 19662568 1% /run/user/1004

any idea how to solve the issue?


#7

I have done make check successfully by disabling the parallel during the build:
./configure --disable -parallel --prefix <…>
then make
and make check, which seems successful.

I would like to install NetCDF and COAWST-Roms and use in in parallel later.
Do you think I will face any issues using it in parallel later on because of this “disabling”?


#8

What was the compiler used to build mpiexec (OpenRTE) 4.1.4. This may not be the problem, but older GCC versions don’t support atomic operations.


#9

ldd mpiexec
linux-vdso.so.1 (0x00007fffc8b3d000)
libopen-rte.so.40 => /opt/opt_shared/openmpi-4.1.4/lib/libopen-rte.so.40 (0x00007ff42b50a000)
libopen-pal.so.40 => /opt/opt_shared/openmpi-4.1.4/lib/libopen-pal.so.40 (0x00007ff42b401000)
libm.so.6 => /lib64/libm.so.6 (0x00007ff42b31a000)
libz.so.1 => /lib64/libz.so.1 (0x00007ff42b300000)
libc.so.6 => /lib64/libc.so.6 (0x00007ff42b0f7000)
/lib64/ld-linux-x86-64.so.2 (0x00007ff42b5c6000)


#10

Also the gcc version is: gcc (GCC) 11.3.1 20220421 (Red Hat 11.3.1-2)


#11

I am done with “make” and “make check” successfully by disabling the parallel in ./configure. (echo $? outputs 0 after each command, which means successfulness)
but I am facing an issue when I do make install:

[skasaei@lange hdf5-1.14.0]$ make install
Making install in src
make[1]: Entering directory ‘/BIG_6_9/opt_shared/hdf5-1.14.0/src’
make[2]: Entering directory ‘/BIG_6_9/opt_shared/hdf5-1.14.0/src’
/usr/bin/mkdir -p ‘/opt/opt_shared/hdf5-1.14.0/lib’
/bin/sh …/libtool --mode=install /usr/bin/install -c libhdf5.la ‘/opt/opt_shared/hdf5-1.14.0/lib’
libtool: install: /usr/bin/install -c .libs/libhdf5.so.310.0.0 /opt/opt_shared/hdf5-1.14.0/lib/libhdf5.so.310.0.0
libtool: install: (cd /opt/opt_shared/hdf5-1.14.0/lib && { ln -s -f libhdf5.so.310.0.0 libhdf5.so.310 || { rm -f libhdf5.so.310 && ln -s libhdf5.so.310.0.0 libhdf5.so.310; }; })
libtool: install: (cd /opt/opt_shared/hdf5-1.14.0/lib && { ln -s -f libhdf5.so.310.0.0 libhdf5.so || { rm -f libhdf5.so && ln -s libhdf5.so.310.0.0 libhdf5.so; }; })
libtool: install: /usr/bin/install -c .libs/libhdf5.lai /opt/opt_shared/hdf5-1.14.0/lib/libhdf5.la
libtool: install: /usr/bin/install -c .libs/libhdf5.a /opt/opt_shared/hdf5-1.14.0/lib/libhdf5.a
libtool: install: chmod 644 /opt/opt_shared/hdf5-1.14.0/lib/libhdf5.a
libtool: install: ranlib /opt/opt_shared/hdf5-1.14.0/lib/libhdf5.a
libtool: finish: PATH="/opt/opt_shared/openmpi-4.1.4/bin:/home/skasaei/.local/bin:/home/skasaei/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin" ldconfig -n /opt/opt_shared/hdf5-1.14.0/lib

Libraries have been installed in:
/opt/opt_shared/hdf5-1.14.0/lib

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the ‘-LLIBDIR’
flag during linking and do at least one of the following:

  • add LIBDIR to the ‘LD_LIBRARY_PATH’ environment variable
    during execution
  • add LIBDIR to the ‘LD_RUN_PATH’ environment variable
    during linking
  • use the ‘-Wl,-rpath -Wl,LIBDIR’ linker flag
  • have your system administrator add LIBDIR to ‘/etc/ld.so.conf’

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.

/usr/bin/mkdir -p ‘/opt/opt_shared/hdf5-1.14.0/include’
/usr/bin/install -c -m 644 hdf5.h H5api_adpt.h H5overflow.h H5pubconf.h H5public.h H5version.h H5Apublic.h H5ACpublic.h H5Cpublic.h H5Dpublic.h H5Epubgen.h H5Epublic.h H5ESpublic.h H5Fpublic.h H5FDpublic.h H5FDcore.h H5FDdirect.h H5FDfamily.h H5FDhdfs.h H5FDlog.h H5FDmirror.h H5FDmpi.h H5FDmpio.h H5FDmulti.h H5FDonion.h H5FDros3.h H5FDsec2.h H5FDsplitter.h H5FDstdio.h H5FDsubfiling/H5FDsubfiling.h H5FDsubfiling/H5FDioc.h H5FDwindows.h H5Gpublic.h H5Ipublic.h H5Lpublic.h H5Mpublic.h H5MMpublic.h H5Opublic.h H5Ppublic.h H5PLextern.h ‘/opt/opt_shared/hdf5-1.14.0/include’
/usr/bin/install -c -m 644 H5PLpublic.h H5Rpublic.h H5Spublic.h H5Tpublic.h H5VLconnector.h H5VLconnector_passthru.h H5VLnative.h H5VLpassthru.h H5VLpublic.h H5Zpublic.h H5ESdevelop.h H5FDdevelop.h H5Idevelop.h H5Ldevelop.h H5Tdevelop.h H5TSdevelop.h H5Zdevelop.h ‘/opt/opt_shared/hdf5-1.14.0/include’
/usr/bin/mkdir -p ‘/opt/opt_shared/hdf5-1.14.0/lib’
/usr/bin/install -c -m 644 libhdf5.settings ‘/opt/opt_shared/hdf5-1.14.0/lib’
make[2]: Leaving directory ‘/BIG_6_9/opt_shared/hdf5-1.14.0/src’
make[1]: Leaving directory ‘/BIG_6_9/opt_shared/hdf5-1.14.0/src’
Making install in test
make[1]: Entering directory ‘/BIG_6_9/opt_shared/hdf5-1.14.0/test’
make[2]: Entering directory ‘/BIG_6_9/opt_shared/hdf5-1.14.0/test’
make[2]: Nothing to be done for ‘install-exec-am’.
make[2]: Nothing to be done for ‘install-data-am’.
make[2]: Leaving directory ‘/BIG_6_9/opt_shared/hdf5-1.14.0/test’
make[1]: Leaving directory ‘/BIG_6_9/opt_shared/hdf5-1.14.0/test’
Making install in bin
make[1]: Entering directory ‘/BIG_6_9/opt_shared/hdf5-1.14.0/bin’
make[2]: Entering directory ‘/BIG_6_9/opt_shared/hdf5-1.14.0/bin’
/usr/bin/mkdir -p ‘/opt/opt_shared/hdf5-1.14.0/bin’
/usr/bin/install -c h5redeploy ‘/opt/opt_shared/hdf5-1.14.0/bin’
/usr/bin/install: ‘h5redeploy’ and ‘/opt/opt_shared/hdf5-1.14.0/bin/h5redeploy’ are the same file
make[2]: *** [Makefile:843: install-binSCRIPTS] Error 1
make[2]: Leaving directory ‘/BIG_6_9/opt_shared/hdf5-1.14.0/bin’
make[1]: *** [Makefile:1078: install-am] Error 2
make[1]: Leaving directory ‘/BIG_6_9/opt_shared/hdf5-1.14.0/bin’
make: *** [Makefile:729: install-recursive] Error 1

Any idea what is the reason?