Bail out on parallel hdf5 write or extend error

ulrik.pedersen · July 10, 2013, 4:00pm

Hello,

I am writing an application to stream data from multiple imaging detectors, operating in a synchronised fashion into a single dataset in one HDF5 file. The dataset is a chunked 3D dataset where 2D (X,Y) images gets appended on the 3rd dimension (dimensions are defined as: [Z, Y, X] - so as the 2D frames are received I extend dimension Z.

At this point I need to work out how to deal with errors - if one node for some reason does something wrong like trying to write outside the dataset dimensions or whatever, I need to be able to close the current file and return to the initial state, ready to create and write to another file. For performance reasons I am using collective IO and so I think I need the same number of extend, write, etc - or the H5Fclose call will hang. Is this correct?

Can I use the H5Pset_fclose_degree to set H5F_CLOSE_STRONG safely in the parallel hdf5 case or will that cause a crash/hang/corrupt file?

Cheers,
Ulrik

···

---------------------------------------------------------------------
Ulrik Kofoed Pedersen
Senior Software Engineer
Diamond Light Source Ltd
Phone: 01235 77 8580

--
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

Mohamad_Chaarawi · July 10, 2013, 4:46pm

Hi Ulrik,

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of
ulrik.pedersen@diamond.ac.uk
Sent: Wednesday, July 10, 2013 11:01 AM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] Bail out on parallel hdf5 write or extend error

Hello,

I am writing an application to stream data from multiple imaging detectors,
operating in a synchronised fashion into a single dataset in one HDF5 file.
The dataset is a chunked 3D dataset where 2D (X,Y) images gets appended on
the 3rd dimension (dimensions are defined as: [Z, Y, X] - so as the 2D
frames are received I extend dimension Z.

At this point I need to work out how to deal with errors - if one node for
some reason does something wrong like trying to write outside the dataset
dimensions or whatever, I need to be able to close the current file and
return to the initial state, ready to create and write to another file.

Hmm, I'm not quite sure I understand what you are trying to achieve here.
When you say that a node does something wrong like trying to write outside
the dataset dimensions, that implies an erroneous program and should be
corrected, and not try and recover from it. From what I understand, you are
attempting to use HDF5 erroneously, and continue to do that expecting a
certain behaviour. This is not possible.

I might have misunderstood you because I'm not aware of the full details
about your use case here, i.e. why would you write outside the dataset
dimensions.

For performance reasons I am using collective IO and so I think I need the
same number of extend, write, etc - or the H5Fclose call will hang. Is this
correct?

The hang would most probably not happen in H5Fclose if you don't call extend
and write (if you set collective I/O) collectively. It will happen in the
extend or write itself, because a collective operation expects all processes
to be there at some point in time. If one process does not call the
operation and other processes attempt to talk to that process, then your
program will hang.

Can I use the H5Pset_fclose_degree to set H5F_CLOSE_STRONG safely in the
parallel hdf5 case or will that cause a crash/hang/corrupt file?

I do not think this is relevant to what you are asking/require. Sure you can
set to H5F_CLOSE_STRONG, but that does not mean you can avoid calling
collective operations on all processes, or use the API in an erroneous
manner.

Thanks,

Mohamad

Cheers,

Ulrik

---------------------------------------------------------------------

Ulrik Kofoed Pedersen

Senior Software Engineer

Diamond Light Source Ltd

Phone: 01235 77 8580

--

This e-mail and any attachments may contain confidential, copyright and or
privileged material, and are for the use of the intended addressee only. If
you are not the intended addressee or an authorised recipient of the
addressee please notify us of receipt by returning the e-mail and do not
use, copy, retain, distribute or disclose the information in or attached to
the e-mail.
Any opinions expressed within this e-mail are those of the individual and
not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any
attachments are free from viruses and we cannot accept liability for any
damage which you may sustain as a result of software viruses which may be
transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England
and Wales with its registered office at Diamond House, Harwell Science and
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

ulrik.pedersen · July 10, 2013, 5:08pm

Hi Mohamad,

What I'm trying to achieve is a graceful error handling in case 'something' wrong happens. My parallel hdf5 writer application is the back end of a system which at the front reads data off a detector. The front end system will then pass on the data along with some metadata to describe where the 2D frame sits in the full dataset. So each MPI process sits on a separate server and each server is connected to one piece of readout-electronic (front-end) which reads a 2D strip off the full detector/camera.

Because errors just happens for various reasons in complex systems - especially ones in development (and in this case we are talking about several software and hardware subsystems working in supposedly beautiful synchronisation) - the writer must be able to recover even from erroneous use - for example if the frontend is sending an the writer a bit of data with wrongly configured offsets, causing us to be writing outside datasets (wrong offsets is just an example - of course I could make sanity checks for this particular case...)

The bottom line (or question) is really just: is there a way to recover somewhat gracefully if an error has happened on one or more nodes - and what would be the consequences? (corrupt file?)

Perhaps I need to switch off the collective mode? Would that allow me to close the file without having done an equal number of extend/write on each node?

Cheers,
Ulrik

···

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Mohamad Chaarawi
Sent: 10 July 2013 17:46
To: 'HDF Users Discussion List'
Subject: Re: [Hdf-forum] Bail out on parallel hdf5 write or extend error

Hi Ulrik,

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of ulrik.pedersen@diamond.ac.uk<mailto:ulrik.pedersen@diamond.ac.uk>
Sent: Wednesday, July 10, 2013 11:01 AM
To: hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>
Subject: [Hdf-forum] Bail out on parallel hdf5 write or extend error

Hello,

I am writing an application to stream data from multiple imaging detectors, operating in a synchronised fashion into a single dataset in one HDF5 file. The dataset is a chunked 3D dataset where 2D (X,Y) images gets appended on the 3rd dimension (dimensions are defined as: [Z, Y, X] - so as the 2D frames are received I extend dimension Z.

At this point I need to work out how to deal with errors - if one node for some reason does something wrong like trying to write outside the dataset dimensions or whatever, I need to be able to close the current file and return to the initial state, ready to create and write to another file.

Hmm, I'm not quite sure I understand what you are trying to achieve here. When you say that a node does something wrong like trying to write outside the dataset dimensions, that implies an erroneous program and should be corrected, and not try and recover from it. From what I understand, you are attempting to use HDF5 erroneously, and continue to do that expecting a certain behaviour. This is not possible.
I might have misunderstood you because I'm not aware of the full details about your use case here, i.e. why would you write outside the dataset dimensions.

For performance reasons I am using collective IO and so I think I need the same number of extend, write, etc - or the H5Fclose call will hang. Is this correct?

The hang would most probably not happen in H5Fclose if you don't call extend and write (if you set collective I/O) collectively. It will happen in the extend or write itself, because a collective operation expects all processes to be there at some point in time. If one process does not call the operation and other processes attempt to talk to that process, then your program will hang.

Can I use the H5Pset_fclose_degree to set H5F_CLOSE_STRONG safely in the parallel hdf5 case or will that cause a crash/hang/corrupt file?

I do not think this is relevant to what you are asking/require. Sure you can set to H5F_CLOSE_STRONG, but that does not mean you can avoid calling collective operations on all processes, or use the API in an erroneous manner.

Thanks,
Mohamad

Cheers,
Ulrik

---------------------------------------------------------------------
Ulrik Kofoed Pedersen
Senior Software Engineer
Diamond Light Source Ltd
Phone: 01235 77 8580

--

This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

--
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

Mohamad_Chaarawi · July 10, 2013, 6:21pm

Hi Ulrik,

Hi Mohamad,

What I'm trying to achieve is a graceful error handling in case 'something' wrong happens. My parallel hdf5 writer application is the back end of a system which at the front reads data off a detector. The front end system will then pass on the data along with some metadata to describe where the 2D frame sits in the full dataset. So each MPI process sits on a separate server and each server is connected to one piece of readout-electronic (front-end) which reads a 2D strip off the full detector/camera.

Because errors just happens for various reasons in complex systems -- especially ones in development (and in this case we are talking about several software and hardware subsystems working in supposedly beautiful synchronisation) -- the writer must be able to recover even from erroneous use -- for example if the frontend is sending an the writer a bit of data with wrongly configured offsets, causing us to be writing outside datasets (wrong offsets is just an example -- of course I could make sanity checks for this particular case...)

I still think you are not making a distinction between programmable errors and system/hardware errors.
But if you want to recover from the case that one process fails in a collective write call, you will need to add fault tolerance yourself, by checking the return status of every process, and communicating it to all the ranks in the communicator.

The bottom line (or question) is really just: is there a way to recover somewhat gracefully if an error has happened on one or more nodes -- and what would be the consequences? (corrupt file?)

I do not know how to answer this question. There are many range of errors, some of which can be recovered from; others well not so much. Add to that there is still no fault tolerance in MPI, so recovering from MPI failures is hard.
As for file corruption, again it depends on the error. If a failure prevents processes to close the file and flush the metadata cache, then yes you will end up with a possible corrupt file.

Perhaps I need to switch off the collective mode? Would that allow me to close the file without having done an equal number of extend/write on each node?

switching off collective mode means that dataset access operations (H5Dread/H5Dwrite) can be done independently. Other operations are still required to be collective. see:
http://www.hdfgroup.org/HDF5/doc/RM/CollectiveCalls.html

Thanks,
Mohamad

···

On 7/10/2013 12:08 PM, ulrik.pedersen@diamond.ac.uk wrote:

Cheers,

Ulrik

*From:*Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] *On Behalf Of *Mohamad Chaarawi
*Sent:* 10 July 2013 17:46
*To:* 'HDF Users Discussion List'
*Subject:* Re: [Hdf-forum] Bail out on parallel hdf5 write or extend error

Hi Ulrik,

*From:*Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] *On Behalf Of *ulrik.pedersen@diamond.ac.uk <mailto:ulrik.pedersen@diamond.ac.uk>
*Sent:* Wednesday, July 10, 2013 11:01 AM
*To:* hdf-forum@lists.hdfgroup.org <mailto:hdf-forum@lists.hdfgroup.org>
*Subject:* [Hdf-forum] Bail out on parallel hdf5 write or extend error

Hello,

I am writing an application to stream data from multiple imaging detectors, operating in a synchronised fashion into a single dataset in one HDF5 file. The dataset is a chunked 3D dataset where 2D (X,Y) images gets appended on the 3^rd dimension (dimensions are defined as: [Z, Y, X] -- so as the 2D frames are received I extend dimension Z.

At this point I need to work out how to deal with errors -- if one node for some reason does something wrong like trying to write outside the dataset dimensions or whatever, I need to be able to close the current file and return to the initial state, ready to create and write to another file.

Hmm, I'm not quite sure I understand what you are trying to achieve here. When you say that a node does something wrong like trying to write outside the dataset dimensions, that implies an erroneous program and should be corrected, and not try and recover from it. From what I understand, you are attempting to use HDF5 erroneously, and continue to do that expecting a certain behaviour. This is not possible.

I might have misunderstood you because I'm not aware of the full details about your use case here, i.e. why would you write outside the dataset dimensions.

For performance reasons I am using collective IO and so I think I need the same number of extend, write, etc -- or the H5Fclose call will hang. Is this correct?

The hang would most probably not happen in H5Fclose if you don't call extend and write (if you set collective I/O) collectively. It will happen in the extend or write itself, because a collective operation expects all processes to be there at some point in time. If one process does not call the operation and other processes attempt to talk to that process, then your program will hang.

Can I use the H5Pset_fclose_degree to set H5F_CLOSE_STRONG safely in the parallel hdf5 case or will that cause a crash/hang/corrupt file?

I do not think this is relevant to what you are asking/require. Sure you can set to H5F_CLOSE_STRONG, but that does not mean you can avoid calling collective operations on all processes, or use the API in an erroneous manner.

Thanks,

Mohamad

Cheers,

Ulrik

---------------------------------------------------------------------

Ulrik Kofoed Pedersen

Senior Software Engineer

Diamond Light Source Ltd

Phone: 01235 77 8580

--

This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

--

This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Bail out on parallel hdf5 write or extend error