setting the default error handler to exit

Hi,

Right now, if I run an MPI job and an HDF5 error occurs on a subset of
the procs, HDF5 will hang and eat up all of my allotted walltime.
Instead, I'd like the program to abort as soon as one proc reports any
HDF5 error. Do I need to write my own error handler to accomplish
this, or is there a way that the default error handler can handle
errors in this way?

Thanks,
Mark

Hi Mark,

···

On Aug 31, 2009, at 11:12 AM, Mark Howison wrote:

Hi,

Right now, if I run an MPI job and an HDF5 error occurs on a subset of
the procs, HDF5 will hang and eat up all of my allotted walltime.
Instead, I'd like the program to abort as soon as one proc reports any
HDF5 error. Do I need to write my own error handler to accomplish
this, or is there a way that the default error handler can handle
errors in this way?

  I think you'll have to write your own error handler - the default handler can't do this currently.

  Quincey

Hi Quincey,

Can I do something as simple as this?

herr_t
hdf5_exit_on_error (void *unused)
{
    H5Eprint(stderr);
    exit(EXIT_FAILURE);
}

Will that still print the normal HDF5 error stack and then exit right after?

Thanks
Mark

···

On Mon, Aug 31, 2009 at 11:18 AM, Quincey Koziol<koziol@hdfgroup.org> wrote:

Hi Mark,

On Aug 31, 2009, at 11:12 AM, Mark Howison wrote:

Hi,

Right now, if I run an MPI job and an HDF5 error occurs on a subset of
the procs, HDF5 will hang and eat up all of my allotted walltime.
Instead, I'd like the program to abort as soon as one proc reports any
HDF5 error. Do I need to write my own error handler to accomplish
this, or is there a way that the default error handler can handle
errors in this way?

   I think you&#39;ll have to write your own error handler \- the default

handler can't do this currently.

   Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Quincey,

Can I do something as simple as this?

herr_t
hdf5_exit_on_error (void *unused)
{
   H5Eprint(stderr);
   exit(EXIT_FAILURE);
}

Will that still print the normal HDF5 error stack and then exit right after?

  Yes, I think something like that would be fine.

    Quincey

···

On Aug 31, 2009, at 4:57 PM, Mark Howison wrote:

Thanks
Mark

On Mon, Aug 31, 2009 at 11:18 AM, Quincey > Koziol<koziol@hdfgroup.org> wrote:

Hi Mark,

On Aug 31, 2009, at 11:12 AM, Mark Howison wrote:

Hi,

Right now, if I run an MPI job and an HDF5 error occurs on a subset of
the procs, HDF5 will hang and eat up all of my allotted walltime.
Instead, I'd like the program to abort as soon as one proc reports any
HDF5 error. Do I need to write my own error handler to accomplish
this, or is there a way that the default error handler can handle
errors in this way?

       I think you'll have to write your own error handler - the default
handler can't do this currently.

       Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Oddly, it works on my workstation, but not on Franklin...

···

On Tue, Sep 1, 2009 at 9:57 AM, Quincey Koziol<koziol@hdfgroup.org> wrote:

On Aug 31, 2009, at 4:57 PM, Mark Howison wrote:

Hi Quincey,

Can I do something as simple as this?

herr_t
hdf5_exit_on_error (void *unused)
{
H5Eprint(stderr);
exit(EXIT_FAILURE);
}

Will that still print the normal HDF5 error stack and then exit right
after?

   Yes, I think something like that would be fine\.

           Quincey

Thanks
Mark

On Mon, Aug 31, 2009 at 11:18 AM, Quincey Koziol<koziol@hdfgroup.org> >> wrote:

Hi Mark,

On Aug 31, 2009, at 11:12 AM, Mark Howison wrote:

Hi,

Right now, if I run an MPI job and an HDF5 error occurs on a subset of
the procs, HDF5 will hang and eat up all of my allotted walltime.
Instead, I'd like the program to abort as soon as one proc reports any
HDF5 error. Do I need to write my own error handler to accomplish
this, or is there a way that the default error handler can handle
errors in this way?

  I think you&#39;ll have to write your own error handler \- the default

handler can't do this currently.

  Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Oddly, it works on my workstation, but not on Franklin...

  Are you running MPI applications on your workstation? Or, is it serial code?

    Quincey

···

On Sep 1, 2009, at 11:59 AM, Mark Howison wrote:

On Tue, Sep 1, 2009 at 9:57 AM, Quincey Koziol<koziol@hdfgroup.org> > wrote:

On Aug 31, 2009, at 4:57 PM, Mark Howison wrote:

Hi Quincey,

Can I do something as simple as this?

herr_t
hdf5_exit_on_error (void *unused)
{
  H5Eprint(stderr);
  exit(EXIT_FAILURE);
}

Will that still print the normal HDF5 error stack and then exit right
after?

       Yes, I think something like that would be fine.

               Quincey

Thanks
Mark

On Mon, Aug 31, 2009 at 11:18 AM, Quincey >>> Koziol<koziol@hdfgroup.org> >>> wrote:

Hi Mark,

On Aug 31, 2009, at 11:12 AM, Mark Howison wrote:

Hi,

Right now, if I run an MPI job and an HDF5 error occurs on a subset of
the procs, HDF5 will hang and eat up all of my allotted walltime.
Instead, I'd like the program to abort as soon as one proc reports any
HDF5 error. Do I need to write my own error handler to accomplish
this, or is there a way that the default error handler can handle
errors in this way?

      I think you'll have to write your own error handler - the default
handler can't do this currently.

      Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Yes, MPI. When I test it on my workstation, I generate an error on
proc 1 by calling H5Dopen on bogus hid_t values. This calls my error
handler and exits on proc 1, which causes mpirun to abort all MPI
tasks.

On Franklin, the error is generated in the MPI-POSIX VFD on about 40
of 16K procs. Maybe those procs exit... I don't have any stderr output
that says they do. Even if they are exiting, the aprun environment may
handle that event different from my workstation.

Mark

···

On Tue, Sep 1, 2009 at 10:01 AM, Quincey Koziol<koziol@hdfgroup.org> wrote:

On Sep 1, 2009, at 11:59 AM, Mark Howison wrote:

Oddly, it works on my workstation, but not on Franklin...

   Are you running MPI applications on your workstation?  Or, is it

serial code?

           Quincey

On Tue, Sep 1, 2009 at 9:57 AM, Quincey Koziol<koziol@hdfgroup.org> wrote:

On Aug 31, 2009, at 4:57 PM, Mark Howison wrote:

Hi Quincey,

Can I do something as simple as this?

herr_t
hdf5_exit_on_error (void *unused)
{
H5Eprint(stderr);
exit(EXIT_FAILURE);
}

Will that still print the normal HDF5 error stack and then exit right
after?

  Yes, I think something like that would be fine\.

          Quincey

Thanks
Mark

On Mon, Aug 31, 2009 at 11:18 AM, Quincey Koziol<koziol@hdfgroup.org> >>>> wrote:

Hi Mark,

On Aug 31, 2009, at 11:12 AM, Mark Howison wrote:

Hi,

Right now, if I run an MPI job and an HDF5 error occurs on a subset of
the procs, HDF5 will hang and eat up all of my allotted walltime.
Instead, I'd like the program to abort as soon as one proc reports any
HDF5 error. Do I need to write my own error handler to accomplish
this, or is there a way that the default error handler can handle
errors in this way?

 I think you&#39;ll have to write your own error handler \- the default

handler can't do this currently.

 Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Yes, MPI. When I test it on my workstation, I generate an error on
proc 1 by calling H5Dopen on bogus hid_t values. This calls my error
handler and exits on proc 1, which causes mpirun to abort all MPI
tasks.

On Franklin, the error is generated in the MPI-POSIX VFD on about 40
of 16K procs. Maybe those procs exit... I don't have any stderr output
that says they do. Even if they are exiting, the aprun environment may
handle that event different from my workstation.

  What about calling MPI_Abort() instead of exit()?

  Quincey

···

On Sep 1, 2009, at 12:19 PM, Mark Howison wrote:

Mark

On Tue, Sep 1, 2009 at 10:01 AM, Quincey Koziol<koziol@hdfgroup.org> > wrote:

On Sep 1, 2009, at 11:59 AM, Mark Howison wrote:

Oddly, it works on my workstation, but not on Franklin...

       Are you running MPI applications on your workstation? Or, is it
serial code?

               Quincey

On Tue, Sep 1, 2009 at 9:57 AM, Quincey >>> Koziol<koziol@hdfgroup.org> wrote:

On Aug 31, 2009, at 4:57 PM, Mark Howison wrote:

Hi Quincey,

Can I do something as simple as this?

herr_t
hdf5_exit_on_error (void *unused)
{
H5Eprint(stderr);
exit(EXIT_FAILURE);
}

Will that still print the normal HDF5 error stack and then exit right
after?

      Yes, I think something like that would be fine.

              Quincey

Thanks
Mark

On Mon, Aug 31, 2009 at 11:18 AM, Quincey Koziol<koziol@hdfgroup.org >>>>> > >>>>> wrote:

Hi Mark,

On Aug 31, 2009, at 11:12 AM, Mark Howison wrote:

Hi,

Right now, if I run an MPI job and an HDF5 error occurs on a subset of
the procs, HDF5 will hang and eat up all of my allotted walltime.
Instead, I'd like the program to abort as soon as one proc reports any
HDF5 error. Do I need to write my own error handler to accomplish
this, or is there a way that the default error handler can handle
errors in this way?

     I think you'll have to write your own error handler - the default
handler can't do this currently.

     Quincey

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org