help with errors

Hi,
The following piece of Fortran code is producing some cryptic errors i am
having trouble understanding. The errors are attached as an attachment.

       CALL h5dcreate_f(newfl_id, &

                        '/aux/planes/foo', &
                        H5T_NATIVE_DOUBLE, &
                        hifstate(n)%flspace,hifstate(n)%dset_id(1),err)

newfl_id is of type INTEGER(HID_T) and was return by h5fcreate_f and is
still open. The groups /aux and /aux/planes have already been created.
hifstate(n)%flspace has been created by h5screate_simple_f and is of type
INTEGER(HID_T), and hifstate(n)%dset_id(1) is of type INTEGER(HID_T).

Do /aux and /aux/planes need to be open? Could this be the issue? (Although
I am using an absolute path so I don't think this is the issue)

Thanks for any feedback, I am really stuck.
Izaak Beekman

error.txt (68.8 KB)

···

===================================
(301)244-9367
Princeton University Doctoral Candidate
Mechanical and Aerospace Engineering
ibeekman@princeton.edu

UMD-CP Visiting Graduate Student
Aerospace Engineering
ibeekman@umiacs.umd.edu
ibeekman@umd.edu

Also,
I am using HDF5-1.8.5-p1 on RHEL5 x86_64, with intel's ifort 11.1 compiler,
Build 20090630 Package ID: l_cprof_p_11.1.046 and mvapich-1.1.0-qlc. I
built HDF5 using this toolchain.

Hi,

The following piece of Fortran code is producing some cryptic errors i am
having trouble understanding. The errors are attached as an attachment.

       CALL h5dcreate_f(newfl_id, &

                        '/aux/planes/foo', &
                        H5T_NATIVE_DOUBLE, &
                        hifstate(n)%flspace,hifstate(n)%dset_id(1),err)

newfl_id is of type INTEGER(HID_T) and was return by h5fcreate_f and is
still open. The groups /aux and /aux/planes have already been created.
hifstate(n)%flspace has been created by h5screate_simple_f and is of type
INTEGER(HID_T), and hifstate(n)%dset_id(1) is of type INTEGER(HID_T).

Do /aux and /aux/planes need to be open? Could this be the issue?
(Although I am using an absolute path so I don't think this is the issue)

Thanks for any feedback, I am really stuck.

Izaak Beekman

error.txt (68.8 KB)

···

===================================
(301)244-9367
UMD-CP Visiting Graduate Student
Aerospace Engineering
ibeekman@umiacs.umd.edu
ibeekman@umd.edu

Izaak,

There is a message in the error log file that object already exists.

  #002: H5L.c line 1640 in H5L_link_object(): unable to create new link to object
  #000: H5D.c line 170 in H5Dcreate2(): unable to create dataset
  #006: H5L.c line 1675 in H5L_link_cb(): name already exists
  #005: H5Gtraverse.c line 759 in H5G_traverse_real(): traversal operator failed
    major: Symbol table
    major: Dataset
    minor: Callback failed
    major: Symbol table
    major: Symbol table
    minor: Callback failed
    minor: Unable to initialize object
    major: Symbol table
    major: Links
    minor: Object already exists

Elena

···

On Nov 3, 2011, at 7:21 PM, Izaak Beekman wrote:

Also,
I am using HDF5-1.8.5-p1 on RHEL5 x86_64, with intel's ifort 11.1 compiler, Build 20090630 Package ID: l_cprof_p_11.1.046 and mvapich-1.1.0-qlc. I built HDF5 using this toolchain.

Hi,
The following piece of Fortran code is producing some cryptic errors i am having trouble understanding. The errors are attached as an attachment.

       CALL h5dcreate_f(newfl_id, &
                        '/aux/planes/foo', &
                        H5T_NATIVE_DOUBLE, &
                        hifstate(n)%flspace,hifstate(n)%dset_id(1),err)

newfl_id is of type INTEGER(HID_T) and was return by h5fcreate_f and is still open. The groups /aux and /aux/planes have already been created. hifstate(n)%flspace has been created by h5screate_simple_f and is of type INTEGER(HID_T), and hifstate(n)%dset_id(1) is of type INTEGER(HID_T).

Do /aux and /aux/planes need to be open? Could this be the issue? (Although I am using an absolute path so I don't think this is the issue)

Thanks for any feedback, I am really stuck.

Izaak Beekman

(301)244-9367
UMD-CP Visiting Graduate Student
Aerospace Engineering
ibeekman@umiacs.umd.edu
ibeekman@umd.edu
<error.txt>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Elana,
Thanks for the response. I am certain that this is the first call to
create an object at this location with this name--but I am using collective
calls to HDF5 routines and I have created the file collectively. Could this
be an MPI related issue? The code that generates this error is fairly
large, and it's not immediately obvious to me how to make a smaller code to
reproduce this error. I commented out the call to h5dcreate_f and replaced
it with another one to create a data set, foo, under the root group which
is also failing with the same errors. I'll re-double-check that the file
has been created and opened correctly (i.e. collectively, etc.) and that I
am not missing anything critical. Any additional thoughts about where I
might have gone wrong are welcome.

Izaak Beekman

···

===================================
(301)244-9367
UMD-CP Visiting Graduate Student
Aerospace Engineering
ibeekman@umiacs.umd.edu
ibeekman@umd.edu

On Fri, Nov 4, 2011 at 12:00 PM, <hdf-forum-request@hdfgroup.org> wrote:

Send Hdf-forum mailing list submissions to
       hdf-forum@hdfgroup.org

To subscribe or unsubscribe via the World Wide Web, visit
       http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
or, via email, send a message with subject or body 'help' to
       hdf-forum-request@hdfgroup.org

You can reach the person managing the list at
       hdf-forum-owner@hdfgroup.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Hdf-forum digest..."

Today's Topics:

  1. Re: help with errors (Elena Pourmal)

---------- Forwarded message ----------
From: Elena Pourmal <epourmal@hdfgroup.org>
To: HDF Users Discussion List <hdf-forum@hdfgroup.org>
Date: Thu, 3 Nov 2011 19:52:38 -0500
Subject: Re: [Hdf-forum] help with errors
Izaak,

There is a message in the error log file that object already exists.

  #002: H5L.c line 1640 in H5L_link_object(): unable to create new link to
object
  #000: H5D.c line 170 in H5Dcreate2(): unable to create dataset
  #006: H5L.c line 1675 in H5L_link_cb(): name already exists
  #005: H5Gtraverse.c line 759 in H5G_traverse_real(): traversal operator
failed
    major: Symbol table
    major: Dataset
    minor: Callback failed
    major: Symbol table
    major: Symbol table
    minor: Callback failed
    minor: Unable to initialize object
    major: Symbol table
    major: Links
    minor: Object already exists

Elena
On Nov 3, 2011, at 7:21 PM, Izaak Beekman wrote:

Also,
I am using HDF5-1.8.5-p1 on RHEL5 x86_64, with intel's ifort 11.1
compiler, Build 20090630 Package ID: l_cprof_p_11.1.046 and
mvapich-1.1.0-qlc. I built HDF5 using this toolchain.

Hi,

The following piece of Fortran code is producing some cryptic errors i am
having trouble understanding. The errors are attached as an attachment.

       CALL h5dcreate_f(newfl_id, &

                        '/aux/planes/foo', &
                        H5T_NATIVE_DOUBLE, &
                        hifstate(n)%flspace,hifstate(n)%dset_id(1),err)

newfl_id is of type INTEGER(HID_T) and was return by h5fcreate_f and is
still open. The groups /aux and /aux/planes have already been created.
hifstate(n)%flspace has been created by h5screate_simple_f and is of type
INTEGER(HID_T), and hifstate(n)%dset_id(1) is of type INTEGER(HID_T).

Do /aux and /aux/planes need to be open? Could this be the issue?
(Although I am using an absolute path so I don't think this is the issue)

Thanks for any feedback, I am really stuck.

Izaak Beekman

(301)244-9367
UMD-CP Visiting Graduate Student
Aerospace Engineering
ibeekman@umiacs.umd.edu
ibeekman@umd.edu
<error.txt>_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

So,
I it turns out I had sent the wrong error log. The correct error I am
seeing is attached Again it is triggered on calls to h5dwrite_f. The error
has something about file locks. HDF5 is built against an MPICH variant and
the intel compiler. The data is being read from and written to a lustre
file system. I believe the all the tests passed when I installed
HDF5-1.8.5p1 and I have been successfuly reading and writing data
(collectively) in parallel until now.
I spoke to our system administrator and he seems to think that if we remout
lustrefs with different flags it will fix the issue, but both of us have
limited experience.

Has anyone seen these errors before? Is my sysadmin correct in saying that
a simple change to the lustrefs mount flags will fix this?

Many thanks again,
Izaak Beekman

error (137 KB)

···

===================================
(301)244-9367
Princeton University Doctoral Candidate
Mechanical and Aerospace Engineering
ibeekman@princeton.edu

UMD-CP Visiting Graduate Student
Aerospace Engineering
ibeekman@umiacs.umd.edu
ibeekman@umd.edu

It's good you found the correct error file.

Your MPI version is a little on the older side. Newest versions have
a bit more helpful error message:

FPRINTF(stderr, "File locking failed in ADIOI_Set_lock(fd %X,cmd %s/%X,type %s/%X,whence %X) with return value %X and errno %X.\n"
                  "- If the file system is NFS, you need to use NFS version 3, ensure that the lockd daemon is running on all the machines, and mount the directory with the 'noac' option (no attribute caching).\n"
                  "- If the file system is LUSTRE, ensure that the directory is mounted with the 'flock' option.\n",

I don't have much personal experience with lustre, so I don't know the
full ramifications of the 'flock' option, but that's what the
lustre community tells me is needed.

A more helpful error message is just one benefit of upgrading your MPI
implementation. The Lustre driver has seen a host of improvements and
bug fixes. If there's any way you can upgrade to MPICH2-1.4.1 or
MVAPICH2-1.7, you will be a much happier lustre user.

==rob

···

On Tue, Nov 08, 2011 at 03:51:01PM -0500, Zaak Beekman wrote:

So,
I it turns out I had sent the wrong error log. The correct error I am
seeing is attached Again it is triggered on calls to h5dwrite_f. The error
has something about file locks. HDF5 is built against an MPICH variant and
the intel compiler. The data is being read from and written to a lustre
file system. I believe the all the tests passed when I installed
HDF5-1.8.5p1 and I have been successfuly reading and writing data
(collectively) in parallel until now.
I spoke to our system administrator and he seems to think that if we remout
lustrefs with different flags it will fix the issue, but both of us have
limited experience.

Has anyone seen these errors before? Is my sysadmin correct in saying that
a simple change to the lustrefs mount flags will fix this?

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA