Removing group entries on a corrupt HDF5 file?

Hi,

A PyTables user managed to corrupt a file so that it looks like:

$ ls -r pcmldata_LG_idxTEST.h5
/PCMLData Group
/PCMLData/PCMLData Dataset {5956303/Inf}
/PCMLData/_i_PCMLData Group
/PCMLData/_i_PCMLData/implog Group
/PCMLData/_i_PCMLData/peptideid **NOT FOUND**
/PCMLData/_i_PCMLData/spectrumid **NOT FOUND**

I suppose that the answer is no, but is there some way to remove the
stalled group entries (i.e. 'peptideid' and 'spectrumid')?

Thanks in advance,

···

--
Francesc Alted

"One would expect people to feel threatened by the 'giant
brains or machines that think'. In fact, the frightening
computer becomes less frightening if it is used only to
simulate a familiar noncomputer."

-- Edsger W. Dykstra
   "On the cruelty of really teaching computer science"

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Dimitris,

A Wednesday 22 April 2009, Dimitris Servis escrigué:

Hi,

probably you did, but out of curiosity: H5Ocopy on

/PCMLData/PCMLData Dataset {5956303/Inf}
/PCMLData/_i_PCMLData Group
/PCMLData/_i_PCMLData/implog Group

into a different file doesn't work?

Yes. I've done something similar with the ptrepack tool (similar to the
h5repack that comes with HDF5) with success. I was asking mainly to
know if there is a way that does not need a complete copy of the file.

Thanks anyway,

···

--
Francesc Alted

"One would expect people to feel threatened by the 'giant
brains or machines that think'. In fact, the frightening
computer becomes less frightening if it is used only to
simulate a familiar noncomputer."

-- Edsger W. Dykstra
   "On the cruelty of really teaching computer science"

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Francesc,

Dimitris,

A Wednesday 22 April 2009, Dimitris Servis escrigué:

Hi,

probably you did, but out of curiosity: H5Ocopy on

/PCMLData/PCMLData Dataset {5956303/Inf}
/PCMLData/_i_PCMLData Group
/PCMLData/_i_PCMLData/implog Group

into a different file doesn't work?

Yes. I've done something similar with the ptrepack tool (similar to the
h5repack that comes with HDF5) with success. I was asking mainly to
know if there is a way that does not need a complete copy of the file.

  Hmm, we don't have any tools that will do this currently. If it's the application's fault (by writing to the file from two different processes, or by crashing, etc), it's usually
too specific to have a general tool to fix. I can think of some things that might make sense, like removing a mangled link from a group, etc., but even that is tricky... :-/

  Quincey

···

On Apr 22, 2009, at 9:25 AM, Francesc Alted wrote:

Thanks anyway,

--
Francesc Alted

"One would expect people to feel threatened by the 'giant
brains or machines that think'. In fact, the frightening
computer becomes less frightening if it is used only to
simulate a familiar noncomputer."

-- Edsger W. Dykstra
  "On the cruelty of really teaching computer science"

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

May I comment on this issue.
It is the very risk of corrupting the hdf5 file (I agree, possibly due
to my fault) that is preventing me from naturally storing multiple time
steps in one hdf5 file. I just cant affort to lose several days of
computations at the end by writing to a full disk or so. IMO the file
system should be resistant to such crashes, by e.g. using journaling or
similar type of buffering such that I would at worst lose just the last
time step. Unfortunately, writes to a full disk etc. render the whole
file unusable.
- -- Dominik

Quincey Koziol wrote:

Hi Francesc,

Dimitris,

A Wednesday 22 April 2009, Dimitris Servis escrigu�:

Hi,

probably you did, but out of curiosity: H5Ocopy on

/PCMLData/PCMLData Dataset {5956303/Inf}
/PCMLData/_i_PCMLData Group
/PCMLData/_i_PCMLData/implog Group

into a different file doesn't work?

Yes. I've done something similar with the ptrepack tool (similar to
the
h5repack that comes with HDF5) with success. I was asking mainly to
know if there is a way that does not need a complete copy of the file.

  Hmm, we don't have any tools that will do this currently. If it's
the application's fault (by writing to the file from two different
processes, or by crashing, etc), it's usually
too specific to have a general tool to fix. I can think of some
things that might make sense, like removing a mangled link from a
group, etc., but even that is tricky... :-/

  Quincey

Thanks anyway,

--
Francesc Alted

"One would expect people to feel threatened by the 'giant
brains or machines that think'. In fact, the frightening
computer becomes less frightening if it is used only to
simulate a familiar noncomputer."

-- Edsger W. Dykstra
  "On the cruelty of really teaching computer science"

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org
.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

- --
Dominik Szczerba, PhD.
Biomedical Simulation Group
Computer Vision Lab CH-8092 Zurich
http://www.vision.ee.ethz.ch/~domi

···

On Apr 22, 2009, at 9:25 AM, Francesc Alted wrote:

Hi Quincey,

A Wednesday 22 April 2009, Quincey Koziol escrigué:

Hi Francesc,

> Dimitris,
>
> A Wednesday 22 April 2009, Dimitris Servis escrigué:
>> Hi,
>>
>> probably you did, but out of curiosity: H5Ocopy on
>>
>> /PCMLData/PCMLData Dataset {5956303/Inf}
>> /PCMLData/_i_PCMLData Group
>> /PCMLData/_i_PCMLData/implog Group
>>
>> into a different file doesn't work?
>
> Yes. I've done something similar with the ptrepack tool (similar
> to the
> h5repack that comes with HDF5) with success. I was asking mainly
> to know if there is a way that does not need a complete copy of the
> file.

  Hmm, we don't have any tools that will do this currently. If it's
the application's fault (by writing to the file from two different
processes, or by crashing, etc), it's usually
too specific to have a general tool to fix. I can think of some
things that might make sense, like removing a mangled link from a
group, etc., but even that is tricky... :-/

Ok. I supposed this, but wanted to ask just in case. And yes, in this
specific case the problem was most likely having two independent
processes writing to the same file at the same time. Fortunately
copying the entire file is still a solution for cases like this one :slight_smile:

···

On Apr 22, 2009, at 9:25 AM, Francesc Alted wrote:

--
Francesc Alted

"One would expect people to feel threatened by the 'giant
brains or machines that think'. In fact, the frightening
computer becomes less frightening if it is used only to
simulate a familiar noncomputer."

-- Edsger W. Dykstra
   "On the cruelty of really teaching computer science"

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Dominik,

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

May I comment on this issue.
It is the very risk of corrupting the hdf5 file (I agree, possibly due
to my fault) that is preventing me from naturally storing multiple time
steps in one hdf5 file. I just cant affort to lose several days of
computations at the end by writing to a full disk or so. IMO the file
system should be resistant to such crashes, by e.g. using journaling or
similar type of buffering such that I would at worst lose just the last
time step. Unfortunately, writes to a full disk etc. render the whole
file unusable.

  We are in the process of implementing metadata journaling on HDF5 files. I'm expecting it will be in the 1.10.0 release (or whatever we end up calling the next major release :-).

  Until then, you can create separate files and use external links to create a "wrapper" file that points at the timestep datasets in the individual files.

  Quincey

···

On Apr 22, 2009, at 4:52 PM, Dominik Szczerba wrote:

- -- Dominik

Quincey Koziol wrote:

Hi Francesc,

On Apr 22, 2009, at 9:25 AM, Francesc Alted wrote:

Dimitris,

A Wednesday 22 April 2009, Dimitris Servis escrigué:

Hi,

probably you did, but out of curiosity: H5Ocopy on

/PCMLData/PCMLData Dataset {5956303/Inf}
/PCMLData/_i_PCMLData Group
/PCMLData/_i_PCMLData/implog Group

into a different file doesn't work?

Yes. I've done something similar with the ptrepack tool (similar to
the
h5repack that comes with HDF5) with success. I was asking mainly to
know if there is a way that does not need a complete copy of the file.

  Hmm, we don't have any tools that will do this currently. If it's
the application's fault (by writing to the file from two different
processes, or by crashing, etc), it's usually
too specific to have a general tool to fix. I can think of some
things that might make sense, like removing a mangled link from a
group, etc., but even that is tricky... :-/

  Quincey

Thanks anyway,

--
Francesc Alted

"One would expect people to feel threatened by the 'giant
brains or machines that think'. In fact, the frightening
computer becomes less frightening if it is used only to
simulate a familiar noncomputer."

-- Edsger W. Dykstra
"On the cruelty of really teaching computer science"

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org
.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

- --
Dominik Szczerba, PhD.
Biomedical Simulation Group
Computer Vision Lab CH-8092 Zurich
http://www.vision.ee.ethz.ch/~domi
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iEYEARECAAYFAknvkYYACgkQ/EBMh9bUuzJ0sQCgqcClGijONdbJgfRSnlPqae7W
CWgAn25/8nuttyIV7u/PAaoL1SAMQ5gI
=KpOe
-----END PGP SIGNATURE-----