Hi,
Right now, if I run an MPI job and an HDF5 error occurs on a subset of
the procs, HDF5 will hang and eat up all of my allotted walltime.
Instead, I'd like the program to abort as soon as one proc reports any
HDF5 error. Do I need to write my own error handler to accomplish
this, or is there a way that the default error handler can handle
errors in this way?
Thanks,
Mark
Hi Quincey,
Can I do something as simple as this?
herr_t
hdf5_exit_on_error (void *unused)
{
H5Eprint(stderr);
exit(EXIT_FAILURE);
}
Will that still print the normal HDF5 error stack and then exit right after?
Thanks
Mark
···
On Mon, Aug 31, 2009 at 11:18 AM, Quincey Koziol<koziol@hdfgroup.org> wrote:
Hi Mark,
On Aug 31, 2009, at 11:12 AM, Mark Howison wrote:
Hi,
Right now, if I run an MPI job and an HDF5 error occurs on a subset of
the procs, HDF5 will hang and eat up all of my allotted walltime.
Instead, I'd like the program to abort as soon as one proc reports any
HDF5 error. Do I need to write my own error handler to accomplish
this, or is there a way that the default error handler can handle
errors in this way?
I think you'll have to write your own error handler \- the default
handler can't do this currently.
Quincey
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
Hi Quincey,
Can I do something as simple as this?
herr_t
hdf5_exit_on_error (void *unused)
{
H5Eprint(stderr);
exit(EXIT_FAILURE);
}
Will that still print the normal HDF5 error stack and then exit right after?
Yes, I think something like that would be fine.
Quincey
···
On Aug 31, 2009, at 4:57 PM, Mark Howison wrote:
Thanks
Mark
On Mon, Aug 31, 2009 at 11:18 AM, Quincey > Koziol<koziol@hdfgroup.org> wrote:
Hi Mark,
On Aug 31, 2009, at 11:12 AM, Mark Howison wrote:
Hi,
Right now, if I run an MPI job and an HDF5 error occurs on a subset of
the procs, HDF5 will hang and eat up all of my allotted walltime.
Instead, I'd like the program to abort as soon as one proc reports any
HDF5 error. Do I need to write my own error handler to accomplish
this, or is there a way that the default error handler can handle
errors in this way?
I think you'll have to write your own error handler - the default
handler can't do this currently.
Quincey
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
Oddly, it works on my workstation, but not on Franklin...
···
On Tue, Sep 1, 2009 at 9:57 AM, Quincey Koziol<koziol@hdfgroup.org> wrote:
On Aug 31, 2009, at 4:57 PM, Mark Howison wrote:
Hi Quincey,
Can I do something as simple as this?
herr_t
hdf5_exit_on_error (void *unused)
{
H5Eprint(stderr);
exit(EXIT_FAILURE);
}
Will that still print the normal HDF5 error stack and then exit right
after?
Yes, I think something like that would be fine\.
Quincey
Thanks
Mark
On Mon, Aug 31, 2009 at 11:18 AM, Quincey Koziol<koziol@hdfgroup.org> >> wrote:
Hi Mark,
On Aug 31, 2009, at 11:12 AM, Mark Howison wrote:
Hi,
Right now, if I run an MPI job and an HDF5 error occurs on a subset of
the procs, HDF5 will hang and eat up all of my allotted walltime.
Instead, I'd like the program to abort as soon as one proc reports any
HDF5 error. Do I need to write my own error handler to accomplish
this, or is there a way that the default error handler can handle
errors in this way?
I think you'll have to write your own error handler \- the default
handler can't do this currently.
Quincey
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
Oddly, it works on my workstation, but not on Franklin...
Are you running MPI applications on your workstation? Or, is it serial code?
Quincey
···
On Sep 1, 2009, at 11:59 AM, Mark Howison wrote:
On Tue, Sep 1, 2009 at 9:57 AM, Quincey Koziol<koziol@hdfgroup.org> > wrote:
On Aug 31, 2009, at 4:57 PM, Mark Howison wrote:
Hi Quincey,
Can I do something as simple as this?
herr_t
hdf5_exit_on_error (void *unused)
{
H5Eprint(stderr);
exit(EXIT_FAILURE);
}
Will that still print the normal HDF5 error stack and then exit right
after?
Yes, I think something like that would be fine.
Quincey
Thanks
Mark
On Mon, Aug 31, 2009 at 11:18 AM, Quincey >>> Koziol<koziol@hdfgroup.org> >>> wrote:
Hi Mark,
On Aug 31, 2009, at 11:12 AM, Mark Howison wrote:
Hi,
Right now, if I run an MPI job and an HDF5 error occurs on a subset of
the procs, HDF5 will hang and eat up all of my allotted walltime.
Instead, I'd like the program to abort as soon as one proc reports any
HDF5 error. Do I need to write my own error handler to accomplish
this, or is there a way that the default error handler can handle
errors in this way?
I think you'll have to write your own error handler - the default
handler can't do this currently.
Quincey
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
Yes, MPI. When I test it on my workstation, I generate an error on
proc 1 by calling H5Dopen on bogus hid_t values. This calls my error
handler and exits on proc 1, which causes mpirun to abort all MPI
tasks.
On Franklin, the error is generated in the MPI-POSIX VFD on about 40
of 16K procs. Maybe those procs exit... I don't have any stderr output
that says they do. Even if they are exiting, the aprun environment may
handle that event different from my workstation.
Mark
···
On Tue, Sep 1, 2009 at 10:01 AM, Quincey Koziol<koziol@hdfgroup.org> wrote:
On Sep 1, 2009, at 11:59 AM, Mark Howison wrote:
Oddly, it works on my workstation, but not on Franklin...
Are you running MPI applications on your workstation? Or, is it
serial code?
Quincey
On Tue, Sep 1, 2009 at 9:57 AM, Quincey Koziol<koziol@hdfgroup.org> wrote:
On Aug 31, 2009, at 4:57 PM, Mark Howison wrote:
Hi Quincey,
Can I do something as simple as this?
herr_t
hdf5_exit_on_error (void *unused)
{
H5Eprint(stderr);
exit(EXIT_FAILURE);
}
Will that still print the normal HDF5 error stack and then exit right
after?
Yes, I think something like that would be fine\.
Quincey
Thanks
Mark
On Mon, Aug 31, 2009 at 11:18 AM, Quincey Koziol<koziol@hdfgroup.org> >>>> wrote:
Hi Mark,
On Aug 31, 2009, at 11:12 AM, Mark Howison wrote:
Hi,
Right now, if I run an MPI job and an HDF5 error occurs on a subset of
the procs, HDF5 will hang and eat up all of my allotted walltime.
Instead, I'd like the program to abort as soon as one proc reports any
HDF5 error. Do I need to write my own error handler to accomplish
this, or is there a way that the default error handler can handle
errors in this way?
I think you'll have to write your own error handler \- the default
handler can't do this currently.
Quincey
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
Yes, MPI. When I test it on my workstation, I generate an error on
proc 1 by calling H5Dopen on bogus hid_t values. This calls my error
handler and exits on proc 1, which causes mpirun to abort all MPI
tasks.
On Franklin, the error is generated in the MPI-POSIX VFD on about 40
of 16K procs. Maybe those procs exit... I don't have any stderr output
that says they do. Even if they are exiting, the aprun environment may
handle that event different from my workstation.
What about calling MPI_Abort() instead of exit()?
Quincey
···
On Sep 1, 2009, at 12:19 PM, Mark Howison wrote:
Mark
On Tue, Sep 1, 2009 at 10:01 AM, Quincey Koziol<koziol@hdfgroup.org> > wrote:
On Sep 1, 2009, at 11:59 AM, Mark Howison wrote:
Oddly, it works on my workstation, but not on Franklin...
Are you running MPI applications on your workstation? Or, is it
serial code?
Quincey
On Tue, Sep 1, 2009 at 9:57 AM, Quincey >>> Koziol<koziol@hdfgroup.org> wrote:
On Aug 31, 2009, at 4:57 PM, Mark Howison wrote:
Hi Quincey,
Can I do something as simple as this?
herr_t
hdf5_exit_on_error (void *unused)
{
H5Eprint(stderr);
exit(EXIT_FAILURE);
}
Will that still print the normal HDF5 error stack and then exit right
after?
Yes, I think something like that would be fine.
Quincey
Thanks
Mark
On Mon, Aug 31, 2009 at 11:18 AM, Quincey Koziol<koziol@hdfgroup.org >>>>> > >>>>> wrote:
Hi Mark,
On Aug 31, 2009, at 11:12 AM, Mark Howison wrote:
Hi,
Right now, if I run an MPI job and an HDF5 error occurs on a subset of
the procs, HDF5 will hang and eat up all of my allotted walltime.
Instead, I'd like the program to abort as soon as one proc reports any
HDF5 error. Do I need to write my own error handler to accomplish
this, or is there a way that the default error handler can handle
errors in this way?
I think you'll have to write your own error handler - the default
handler can't do this currently.
Quincey
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org