Building hdf5-1.10.0-patch 1 on Blue Gene Q

I have been unsuccessful in building a parallel version of hdf5-1.10.0-patch1 on a blue gene q system (rzuseq). I have used both the yod-configure approach and manually changing all ./conftest to srun –n1 ./conftest and although both approaches configure and build correctly, I am unable to run the testhdf5 or testphdf5. The testhdf5 gives errors of the sort:

Linked with hdf5 version 1.10 release 0
Testing -- Configure definitions (config)
Testing -- Encoding/decoding metadata (metadata)
Testing -- Checksum algorithm (checksum)
Testing -- Ternary Search Trees (tst)
Testing -- Memory Heaps (heap)
Testing -- Skip Lists (skiplist)
Testing -- Reference Counted Strings (refstr)
Testing -- Low-Level File I/O (file)
*** UNEXPECTED RETURN from H5Fcreate is -1 at line 187 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5F.c line 491 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 38, error message = 'Function not implemented'
    major: File accessibilty
    minor: Bad file ID accessed
*** UNEXPECTED RETURN from H5Fclose is -1 at line 198 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5F.c line 749 in H5Fclose(): not a file ID
    major: Invalid arguments to routine
    minor: Inappropriate type

This happens on both a lustre and nfs filesystem. When I build hdf5-1.8.16 using same procedure; everything works correctly. I have also used the bulid_hdf5 in the CGNS distribution with no change in behavior.

I need hdf5-1.10.0-patch1 or later to investigate the collective metadata changes.

If anyone on the list or any of the hdf5 developers or support people have successfully bult on a blue gene q system, your help would be very much appreciated.
..Greg

···

--
"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”

Hi Greg,

It looks like you are bumping into the "file locking not implemented" issue. There is a small source patch here that disables file locking:

https://support.hdfgroup.org/HDF5/release/obtainsrc5110.html

Note that file locking was implemented solely to help users get concurrent file opening semantics right. There's no actual loss of HDF5 or SWMR functionality.

In the upcoming HDF5 1.10.1, file locking can be disabled via an environment variable and we have a more informative error message when we detect that file locking is not implemented on a file system.

Let me know if that doesn't work for you and we can diagnose further.

Cheers,

Dana Robinson
Software Engineer
The HDF Group

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Sjaardema, Gregory D
Sent: Thursday, December 8, 2016 3:23 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: [Hdf-forum] Building hdf5-1.10.0-patch 1 on Blue Gene Q

I have been unsuccessful in building a parallel version of hdf5-1.10.0-patch1 on a blue gene q system (rzuseq). I have used both the yod-configure approach and manually changing all ./conftest to srun –n1 ./conftest and although both approaches configure and build correctly, I am unable to run the testhdf5 or testphdf5. The testhdf5 gives errors of the sort:

Linked with hdf5 version 1.10 release 0
Testing -- Configure definitions (config)
Testing -- Encoding/decoding metadata (metadata)
Testing -- Checksum algorithm (checksum)
Testing -- Ternary Search Trees (tst)
Testing -- Memory Heaps (heap)
Testing -- Skip Lists (skiplist)
Testing -- Reference Counted Strings (refstr)
Testing -- Low-Level File I/O (file)
*** UNEXPECTED RETURN from H5Fcreate is -1 at line 187 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5F.c line 491 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 38, error message = 'Function not implemented'
    major: File accessibilty
    minor: Bad file ID accessed
*** UNEXPECTED RETURN from H5Fclose is -1 at line 198 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5F.c line 749 in H5Fclose(): not a file ID
    major: Invalid arguments to routine
    minor: Inappropriate type

This happens on both a lustre and nfs filesystem. When I build hdf5-1.8.16 using same procedure; everything works correctly. I have also used the bulid_hdf5 in the CGNS distribution with no change in behavior.

I need hdf5-1.10.0-patch1 or later to investigate the collective metadata changes.

If anyone on the list or any of the hdf5 developers or support people have successfully bult on a blue gene q system, your help would be very much appreciated.
..Greg

--
"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”

A little more information on the previous email:

The parallel tests that fail are: calloc, fltread, and atomicity. All others pass.

The h5dump fails to open any existing hdf5 file that I have tried.

Using mpicc which is powerpc64-bgq-linux-gcc 4.7.2

..Greg

···

--
"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of "Sjaardema, Gregory D" <gdsjaar@sandia.gov>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Date: Thursday, December 8, 2016 at 1:23 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: [EXTERNAL] [Hdf-forum] Building hdf5-1.10.0-patch 1 on Blue Gene Q

I have been unsuccessful in building a parallel version of hdf5-1.10.0-patch1 on a blue gene q system (rzuseq). I have used both the yod-configure approach and manually changing all ./conftest to srun –n1 ./conftest and although both approaches configure and build correctly, I am unable to run the testhdf5 or testphdf5. The testhdf5 gives errors of the sort:

Linked with hdf5 version 1.10 release 0
Testing -- Configure definitions (config)
Testing -- Encoding/decoding metadata (metadata)
Testing -- Checksum algorithm (checksum)
Testing -- Ternary Search Trees (tst)
Testing -- Memory Heaps (heap)
Testing -- Skip Lists (skiplist)
Testing -- Reference Counted Strings (refstr)
Testing -- Low-Level File I/O (file)
*** UNEXPECTED RETURN from H5Fcreate is -1 at line 187 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5F.c line 491 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 38, error message = 'Function not implemented'
    major: File accessibilty
    minor: Bad file ID accessed
*** UNEXPECTED RETURN from H5Fclose is -1 at line 198 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5F.c line 749 in H5Fclose(): not a file ID
    major: Invalid arguments to routine
    minor: Inappropriate type

This happens on both a lustre and nfs filesystem. When I build hdf5-1.8.16 using same procedure; everything works correctly. I have also used the bulid_hdf5 in the CGNS distribution with no change in behavior.

I need hdf5-1.10.0-patch1 or later to investigate the collective metadata changes.

If anyone on the list or any of the hdf5 developers or support people have successfully bult on a blue gene q system, your help would be very much appreciated.
..Greg

--
"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”

I think this is working. I am going to run some more tests and build the applications that use the library, but I am able to h5dump files and it looks like most/all of the serial and parallel tests are working.

Thanks for the quick response.
..Greg

···

--
"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Dana Robinson <derobins@hdfgroup.org>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Date: Thursday, December 8, 2016 at 1:46 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: [EXTERNAL] Re: [Hdf-forum] Building hdf5-1.10.0-patch 1 on Blue Gene Q

Hi Greg,

It looks like you are bumping into the "file locking not implemented" issue. There is a small source patch here that disables file locking:

https://support.hdfgroup.org/HDF5/release/obtainsrc5110.html

Note that file locking was implemented solely to help users get concurrent file opening semantics right. There's no actual loss of HDF5 or SWMR functionality.

In the upcoming HDF5 1.10.1, file locking can be disabled via an environment variable and we have a more informative error message when we detect that file locking is not implemented on a file system.

Let me know if that doesn't work for you and we can diagnose further.

Cheers,

Dana Robinson
Software Engineer
The HDF Group

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Sjaardema, Gregory D
Sent: Thursday, December 8, 2016 3:23 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: [Hdf-forum] Building hdf5-1.10.0-patch 1 on Blue Gene Q

I have been unsuccessful in building a parallel version of hdf5-1.10.0-patch1 on a blue gene q system (rzuseq). I have used both the yod-configure approach and manually changing all ./conftest to srun –n1 ./conftest and although both approaches configure and build correctly, I am unable to run the testhdf5 or testphdf5. The testhdf5 gives errors of the sort:

Linked with hdf5 version 1.10 release 0
Testing -- Configure definitions (config)
Testing -- Encoding/decoding metadata (metadata)
Testing -- Checksum algorithm (checksum)
Testing -- Ternary Search Trees (tst)
Testing -- Memory Heaps (heap)
Testing -- Skip Lists (skiplist)
Testing -- Reference Counted Strings (refstr)
Testing -- Low-Level File I/O (file)
*** UNEXPECTED RETURN from H5Fcreate is -1 at line 187 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5F.c line 491 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 38, error message = 'Function not implemented'
    major: File accessibilty
    minor: Bad file ID accessed
*** UNEXPECTED RETURN from H5Fclose is -1 at line 198 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5F.c line 749 in H5Fclose(): not a file ID
    major: Invalid arguments to routine
    minor: Inappropriate type

This happens on both a lustre and nfs filesystem. When I build hdf5-1.8.16 using same procedure; everything works correctly. I have also used the bulid_hdf5 in the CGNS distribution with no change in behavior.

I need hdf5-1.10.0-patch1 or later to investigate the collective metadata changes.

If anyone on the list or any of the hdf5 developers or support people have successfully bult on a blue gene q system, your help would be very much appreciated.
..Greg

--
"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”

I am getting confusing results with the file-locking patch. The parallel tests (check-p) seem to run correctly, but the serial tests fail on the low level file i/o tests (file) with an error that looks related to the file locking. I have verified that the patch applied correctly. Here is the results of the test output:

For help use: /usr/workspace/wsrzc/gdsjaar/seacas/TPL/hdf5/hdf5-1.10.0-patch1/test/./testhdf5 -help
Linked with hdf5 version 1.10 release 0
Testing -- Configure definitions (config)
Testing -- Encoding/decoding metadata (metadata)
Testing -- Checksum algorithm (checksum)
Testing -- Ternary Search Trees (tst)
Testing -- Memory Heaps (heap)
Testing -- Skip Lists (skiplist)
Testing -- Reference Counted Strings (refstr)
Testing -- Low-Level File I/O (file)
*** UNEXPECTED RETURN from H5Fcreate is -1 at line 3158 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5F.c line 491 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
    major: File accessibilty
    minor: Unable to open file
*** UNEXPECTED RETURN from H5Dcreate2 is -1 at line 3180 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 121 in H5Dcreate2(): not a location ID
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
*** UNEXPECTED RETURN from H5Dclose is -1 at line 3183 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 334 in H5Dclose(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
*** UNEXPECTED RETURN from H5Dcreate2 is -1 at line 3180 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 121 in H5Dcreate2(): not a location ID
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
*** UNEXPECTED RETURN from H5Dclose is -1 at line 3183 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 334 in H5Dclose(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
*** UNEXPECTED RETURN from H5Dcreate2 is -1 at line 3180 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 121 in H5Dcreate2(): not a location ID
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
*** UNEXPECTED RETURN from H5Dclose is -1 at line 3183 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 334 in H5Dclose(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
*** UNEXPECTED RETURN from H5Dcreate2 is -1 at line 3180 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 121 in H5Dcreate2(): not a location ID
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
*** UNEXPECTED RETURN from H5Dclose is -1 at line 3183 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 334 in H5Dclose(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
*** UNEXPECTED RETURN from H5Dcreate2 is -1 at line 3180 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 121 in H5Dcreate2(): not a location ID
/

···

--
"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Dana Robinson <derobins@hdfgroup.org>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Date: Thursday, December 8, 2016 at 1:46 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: [EXTERNAL] Re: [Hdf-forum] Building hdf5-1.10.0-patch 1 on Blue Gene Q

Hi Greg,

It looks like you are bumping into the "file locking not implemented" issue. There is a small source patch here that disables file locking:

https://support.hdfgroup.org/HDF5/release/obtainsrc5110.html

Note that file locking was implemented solely to help users get concurrent file opening semantics right. There's no actual loss of HDF5 or SWMR functionality.

In the upcoming HDF5 1.10.1, file locking can be disabled via an environment variable and we have a more informative error message when we detect that file locking is not implemented on a file system.

Let me know if that doesn't work for you and we can diagnose further.

Cheers,

Dana Robinson
Software Engineer
The HDF Group

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Sjaardema, Gregory D
Sent: Thursday, December 8, 2016 3:23 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: [Hdf-forum] Building hdf5-1.10.0-patch 1 on Blue Gene Q

I have been unsuccessful in building a parallel version of hdf5-1.10.0-patch1 on a blue gene q system (rzuseq). I have used both the yod-configure approach and manually changing all ./conftest to srun –n1 ./conftest and although both approaches configure and build correctly, I am unable to run the testhdf5 or testphdf5. The testhdf5 gives errors of the sort:

Linked with hdf5 version 1.10 release 0
Testing -- Configure definitions (config)
Testing -- Encoding/decoding metadata (metadata)
Testing -- Checksum algorithm (checksum)
Testing -- Ternary Search Trees (tst)
Testing -- Memory Heaps (heap)
Testing -- Skip Lists (skiplist)
Testing -- Reference Counted Strings (refstr)
Testing -- Low-Level File I/O (file)
*** UNEXPECTED RETURN from H5Fcreate is -1 at line 187 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5F.c line 491 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 38, error message = 'Function not implemented'
    major: File accessibilty
    minor: Bad file ID accessed
*** UNEXPECTED RETURN from H5Fclose is -1 at line 198 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5F.c line 749 in H5Fclose(): not a file ID
    major: Invalid arguments to routine
    minor: Inappropriate type

This happens on both a lustre and nfs filesystem. When I build hdf5-1.8.16 using same procedure; everything works correctly. I have also used the bulid_hdf5 in the CGNS distribution with no change in behavior.

I need hdf5-1.10.0-patch1 or later to investigate the collective metadata changes.

If anyone on the list or any of the hdf5 developers or support people have successfully bult on a blue gene q system, your help would be very much appreciated.
..Greg

--
"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”

Hi Greg,

That is a weird error. Did you build from a clean state after applying the patch?

Dana

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Sjaardema, Gregory D
Sent: Monday, December 12, 2016 11:37 AM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] [EXTERNAL] Re: Building hdf5-1.10.0-patch 1 on Blue Gene Q

I am getting confusing results with the file-locking patch. The parallel tests (check-p) seem to run correctly, but the serial tests fail on the low level file i/o tests (file) with an error that looks related to the file locking. I have verified that the patch applied correctly. Here is the results of the test output:

For help use: /usr/workspace/wsrzc/gdsjaar/seacas/TPL/hdf5/hdf5-1.10.0-patch1/test/./testhdf5 -help
Linked with hdf5 version 1.10 release 0
Testing -- Configure definitions (config)
Testing -- Encoding/decoding metadata (metadata)
Testing -- Checksum algorithm (checksum)
Testing -- Ternary Search Trees (tst)
Testing -- Memory Heaps (heap)
Testing -- Skip Lists (skiplist)
Testing -- Reference Counted Strings (refstr)
Testing -- Low-Level File I/O (file)
*** UNEXPECTED RETURN from H5Fcreate is -1 at line 3158 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5F.c line 491 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
    major: File accessibilty
    minor: Unable to open file
*** UNEXPECTED RETURN from H5Dcreate2 is -1 at line 3180 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 121 in H5Dcreate2(): not a location ID
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
*** UNEXPECTED RETURN from H5Dclose is -1 at line 3183 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 334 in H5Dclose(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
*** UNEXPECTED RETURN from H5Dcreate2 is -1 at line 3180 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 121 in H5Dcreate2(): not a location ID
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
*** UNEXPECTED RETURN from H5Dclose is -1 at line 3183 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 334 in H5Dclose(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
*** UNEXPECTED RETURN from H5Dcreate2 is -1 at line 3180 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 121 in H5Dcreate2(): not a location ID
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
*** UNEXPECTED RETURN from H5Dclose is -1 at line 3183 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 334 in H5Dclose(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
*** UNEXPECTED RETURN from H5Dcreate2 is -1 at line 3180 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 121 in H5Dcreate2(): not a location ID
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
*** UNEXPECTED RETURN from H5Dclose is -1 at line 3183 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 334 in H5Dclose(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
*** UNEXPECTED RETURN from H5Dcreate2 is -1 at line 3180 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 121 in H5Dcreate2(): not a location ID
/

--
"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Dana Robinson <derobins@hdfgroup.org<mailto:derobins@hdfgroup.org>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Thursday, December 8, 2016 at 1:46 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [EXTERNAL] Re: [Hdf-forum] Building hdf5-1.10.0-patch 1 on Blue Gene Q

Hi Greg,

It looks like you are bumping into the "file locking not implemented" issue. There is a small source patch here that disables file locking:

https://support.hdfgroup.org/HDF5/release/obtainsrc5110.html

Note that file locking was implemented solely to help users get concurrent file opening semantics right. There's no actual loss of HDF5 or SWMR functionality.

In the upcoming HDF5 1.10.1, file locking can be disabled via an environment variable and we have a more informative error message when we detect that file locking is not implemented on a file system.

Let me know if that doesn't work for you and we can diagnose further.

Cheers,

Dana Robinson
Software Engineer
The HDF Group

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Sjaardema, Gregory D
Sent: Thursday, December 8, 2016 3:23 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] Building hdf5-1.10.0-patch 1 on Blue Gene Q

I have been unsuccessful in building a parallel version of hdf5-1.10.0-patch1 on a blue gene q system (rzuseq). I have used both the yod-configure approach and manually changing all ./conftest to srun –n1 ./conftest and although both approaches configure and build correctly, I am unable to run the testhdf5 or testphdf5. The testhdf5 gives errors of the sort:

Linked with hdf5 version 1.10 release 0
Testing -- Configure definitions (config)
Testing -- Encoding/decoding metadata (metadata)
Testing -- Checksum algorithm (checksum)
Testing -- Ternary Search Trees (tst)
Testing -- Memory Heaps (heap)
Testing -- Skip Lists (skiplist)
Testing -- Reference Counted Strings (refstr)
Testing -- Low-Level File I/O (file)
*** UNEXPECTED RETURN from H5Fcreate is -1 at line 187 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5F.c line 491 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 38, error message = 'Function not implemented'
    major: File accessibilty
    minor: Bad file ID accessed
*** UNEXPECTED RETURN from H5Fclose is -1 at line 198 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5F.c line 749 in H5Fclose(): not a file ID
    major: Invalid arguments to routine
    minor: Inappropriate type

This happens on both a lustre and nfs filesystem. When I build hdf5-1.8.16 using same procedure; everything works correctly. I have also used the bulid_hdf5 in the CGNS distribution with no change in behavior.

I need hdf5-1.10.0-patch1 or later to investigate the collective metadata changes.

If anyone on the list or any of the hdf5 developers or support people have successfully bult on a blue gene q system, your help would be very much appreciated.
..Greg

--
"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”

Yes, I’ve built it a couple times from scratch… Deleting all files, untarring, patching, build.

..Greg

···

--
"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Dana Robinson <derobins@hdfgroup.org>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Date: Monday, December 12, 2016 at 9:48 AM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] [EXTERNAL] Re: Building hdf5-1.10.0-patch 1 on Blue Gene Q

Hi Greg,

That is a weird error. Did you build from a clean state after applying the patch?

Dana

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Sjaardema, Gregory D
Sent: Monday, December 12, 2016 11:37 AM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Subject: Re: [Hdf-forum] [EXTERNAL] Re: Building hdf5-1.10.0-patch 1 on Blue Gene Q

I am getting confusing results with the file-locking patch. The parallel tests (check-p) seem to run correctly, but the serial tests fail on the low level file i/o tests (file) with an error that looks related to the file locking. I have verified that the patch applied correctly. Here is the results of the test output:

For help use: /usr/workspace/wsrzc/gdsjaar/seacas/TPL/hdf5/hdf5-1.10.0-patch1/test/./testhdf5 -help
Linked with hdf5 version 1.10 release 0
Testing -- Configure definitions (config)
Testing -- Encoding/decoding metadata (metadata)
Testing -- Checksum algorithm (checksum)
Testing -- Ternary Search Trees (tst)
Testing -- Memory Heaps (heap)
Testing -- Skip Lists (skiplist)
Testing -- Reference Counted Strings (refstr)
Testing -- Low-Level File I/O (file)
*** UNEXPECTED RETURN from H5Fcreate is -1 at line 3158 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5F.c line 491 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
    major: File accessibilty
    minor: Unable to open file
*** UNEXPECTED RETURN from H5Dcreate2 is -1 at line 3180 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 121 in H5Dcreate2(): not a location ID
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
*** UNEXPECTED RETURN from H5Dclose is -1 at line 3183 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 334 in H5Dclose(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
*** UNEXPECTED RETURN from H5Dcreate2 is -1 at line 3180 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 121 in H5Dcreate2(): not a location ID
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
*** UNEXPECTED RETURN from H5Dclose is -1 at line 3183 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 334 in H5Dclose(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
*** UNEXPECTED RETURN from H5Dcreate2 is -1 at line 3180 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 121 in H5Dcreate2(): not a location ID
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
*** UNEXPECTED RETURN from H5Dclose is -1 at line 3183 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 334 in H5Dclose(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
*** UNEXPECTED RETURN from H5Dcreate2 is -1 at line 3180 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 121 in H5Dcreate2(): not a location ID
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5Gloc.c line 253 in H5G_loc(): invalid object ID
    major: Invalid arguments to routine
    minor: Bad value
*** UNEXPECTED RETURN from H5Dclose is -1 at line 3183 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 334 in H5Dclose(): not a dataset
    major: Invalid arguments to routine
    minor: Inappropriate type
*** UNEXPECTED RETURN from H5Dcreate2 is -1 at line 3180 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5D.c line 121 in H5Dcreate2(): not a location ID
/

--
"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Dana Robinson <derobins@hdfgroup.org<mailto:derobins@hdfgroup.org>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Thursday, December 8, 2016 at 1:46 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [EXTERNAL] Re: [Hdf-forum] Building hdf5-1.10.0-patch 1 on Blue Gene Q

Hi Greg,

It looks like you are bumping into the "file locking not implemented" issue. There is a small source patch here that disables file locking:

https://support.hdfgroup.org/HDF5/release/obtainsrc5110.html

Note that file locking was implemented solely to help users get concurrent file opening semantics right. There's no actual loss of HDF5 or SWMR functionality.

In the upcoming HDF5 1.10.1, file locking can be disabled via an environment variable and we have a more informative error message when we detect that file locking is not implemented on a file system.

Let me know if that doesn't work for you and we can diagnose further.

Cheers,

Dana Robinson
Software Engineer
The HDF Group

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Sjaardema, Gregory D
Sent: Thursday, December 8, 2016 3:23 PM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] Building hdf5-1.10.0-patch 1 on Blue Gene Q

I have been unsuccessful in building a parallel version of hdf5-1.10.0-patch1 on a blue gene q system (rzuseq). I have used both the yod-configure approach and manually changing all ./conftest to srun –n1 ./conftest and although both approaches configure and build correctly, I am unable to run the testhdf5 or testphdf5. The testhdf5 gives errors of the sort:

Linked with hdf5 version 1.10 release 0
Testing -- Configure definitions (config)
Testing -- Encoding/decoding metadata (metadata)
Testing -- Checksum algorithm (checksum)
Testing -- Ternary Search Trees (tst)
Testing -- Memory Heaps (heap)
Testing -- Skip Lists (skiplist)
Testing -- Reference Counted Strings (refstr)
Testing -- Low-Level File I/O (file)
*** UNEXPECTED RETURN from H5Fcreate is -1 at line 187 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5F.c line 491 in H5Fcreate(): unable to create file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 1168 in H5F_open(): unable to lock the file or initialize file structure
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 1821 in H5FD_lock(): driver lock request failed
    major: Virtual File Layer
    minor: Can't update object
  #003: H5FDsec2.c line 939 in H5FD_sec2_lock(): unable to flock file, errno = 38, error message = 'Function not implemented'
    major: File accessibilty
    minor: Bad file ID accessed
*** UNEXPECTED RETURN from H5Fclose is -1 at line 198 in tfile.c
HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 0:
  #000: H5F.c line 749 in H5Fclose(): not a file ID
    major: Invalid arguments to routine
    minor: Inappropriate type

This happens on both a lustre and nfs filesystem. When I build hdf5-1.8.16 using same procedure; everything works correctly. I have also used the bulid_hdf5 in the CGNS distribution with no change in behavior.

I need hdf5-1.10.0-patch1 or later to investigate the collective metadata changes.

If anyone on the list or any of the hdf5 developers or support people have successfully bult on a blue gene q system, your help would be very much appreciated.
..Greg

--
"A supercomputer is a device for turning compute-bound problems into I/O-bound problems”