HDF lib incompatible with HDF file spec?

Hmm. If I understand you, you have written code that you believe produces an HDF5 file according to the 3.0 file version specification, https://support.hdfgroup.org/HDF5/doc/H5.format.html but nevertheless does NOT use the HDF5 library to do it. Furthermore, where 'extended padding' is concerned, your implementation does business differently than the HDF5 implementation.

You can prove HDF5 tools will *read* the file ok. But, in a read-modify-write scenario, the file is getting corrupted by HDF5 library due to the difference in how the two implementations handle the extended padding -- a feature that you explain is '...not defined at all -- not even recommended'.

Is that about right?

If so, it does indeed sound like a potential issue in the file format specification for HDF5.

Your scenario sounds like a super useful test case...does a wholly independent implementation produce a file the HDF5 library can "handle"?

I wonder if there are settings in HDF5 library you may need to set (such as alignment or block-size or something) such that read-modify-write will indeed work ok? I wonder if there is some metadata missing from your file that will inform the HDF5 library what specific settings it must use to properly read and write to the file? I wonder if there is some boot-block information you have neglected to include so that the HDF5 library is not aware of all the paramaters effecting the file's layout.

The only reason for calling into question many possibilities of your implementation is that the HDF5 file format is fairly complex. I don't think it is easily duplicated without using the library itself. So, I think its highly likely you may be overlooking some important features of the format necessary for the HDF5 library to fully handle it.

All that said, I commend your courage for attempting it and hope others can chime in with more detailed thoughts on what to do about it.

Mark

"Hdf-forum on behalf of Krug, Markus" wrote:

Dear all,

I just came around an interesting issue.
I implemented the writing of HDF files on an embedded system. The amount of functionality I implemented is significant less than the HDF lib offers. So it is just tailored to my needs. I implemented everything on base of the HDF 3.0 file spec. One point of my tailoring was to optimize the file size. Therefore, I write every internal block in the HDF files aligned byte-by-byte to the next – or padded to the address alignment if it is requested by the HDF file specification. The HDF files generated by HDFview or Matlab have plenty of space in-between the internal blocks. Sometimes a few hundred bytes. As far as I read from the HDF file specification this ‘extended padding’ is not defined at all – not even recommended.
However, this ‘extended padding’ that is performed by the HDF lib leads to a behavior that I would consider as an incompatibility to itself. To demonstrate this I attached two HDF files to this email. The first (sizeoptimized.h5) is generated by my embedded software and is optimized concerning the file size. It contains three compounds with each of them has 2 elements. You should be able to open that file in HDFview or similar tools and read all its contents.
The second file (sizeoptimizedextended.h5) is generated by HDFview by adding a fourth compound after the sizeoptimized.h5 file was opened in HDFview. You can see that the file is partly corrupted. The reason for this is that HDFview (and therefore the HDF lib I guess) is not really taking care about the position of the internal blocks of a file that it is writing to. It seems to me it has some internal mapping of those blocks. This mapping gets applied even if it will collide, and therefore corrupt, the existing blocks.
If my observation is correct I think the HDF lib will need a bugfix or the HDF file spec will need a description of how the internal blocks are allowed to be positioned within a HDF file.
I forgot to mention that I tried to use the HDF lib sources and compile it to my system. However, I quit after a couple of days because the way the sources are written are not suitable at all to adopt them to an embedded system that runs a simplified file system and a real-time operating system – and all of it has to fit into a few hundred kilobytes.

Can anyone comment on my observation?

Best Regards
Markus

Over in Parallel-NetCDF land a few years back, we took, um, a "rather
aggressive interpretation" of the NetCDF spec with respect to alignment
and then opend a bug with Unidata when their tools did not follow the
rules as written.

As Mark observes, it was a productive exercise in keeping both
implementations honest.

==rob

···

On Tue, 2017-09-05 at 17:21 +0000, Miller, Mark C. wrote:

Hmm. If I understand you, you have written code that you believe
produces an HDF5 file according to the 3.0 file version
specification, https://support.hdfgroup.org/HDF5/doc/H5.format.html
but nevertheless does NOT use the HDF5 library to do it. Furthermore,
where 'extended padding' is concerned, your implementation does
business differently than the HDF5 implementation.

You can prove HDF5 tools will *read* the file ok. But, in a read-
modify-write scenario, the file is getting corrupted by HDF5 library
due to the difference in how the two implementations handle the
extended padding -- a feature that you explain is '...not defined at
all -- not even recommended'.

Is that about right?

If so, it does indeed sound like a potential issue in the file format
specification for HDF5.

Your scenario sounds like a super useful test case...does a wholly
independent implementation produce a file the HDF5 library can
"handle"?

Dear Mark,

completely correct. I wrote some routines that generate hdf files. However only a small subset of functionality is uses. More less only compressed, compound data types with a maximum number of 5 will be in the files. Very likely not more than two groups. I follow this paper (http://www.ep.liu.se/ecp/076/050/ecp12076050.pdf) concerning the hdf file layout because I have the need to write ‘time series’ in my embedded application.

You are right. The HDF file spec is highly complex. Even my reduced functional set takes me significant more time that I was planning to get an understanding. In the meantime I think I understand what I need for my purpose. However, I’m not saying that the file that I can generate so far are 100% correct in the sense of the HDF file spec. But at least HDFview can read them with no problems. So it cannot be that wrong.

Best Regards
Markus

···

Von: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] Im Auftrag von Miller, Mark C.
Gesendet: Dienstag, 5. September 2017 19:22
An: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Betreff: Re: [Hdf-forum] HDF lib incompatible with HDF file spec?

Hmm. If I understand you, you have written code that you believe produces an HDF5 file according to the 3.0 file version specification, https://support.hdfgroup.org/HDF5/doc/H5.format.html but nevertheless does NOT use the HDF5 library to do it. Furthermore, where 'extended padding' is concerned, your implementation does business differently than the HDF5 implementation.

You can prove HDF5 tools will *read* the file ok. But, in a read-modify-write scenario, the file is getting corrupted by HDF5 library due to the difference in how the two implementations handle the extended padding -- a feature that you explain is '...not defined at all -- not even recommended'.

Is that about right?

If so, it does indeed sound like a potential issue in the file format specification for HDF5.

Your scenario sounds like a super useful test case...does a wholly independent implementation produce a file the HDF5 library can "handle"?

I wonder if there are settings in HDF5 library you may need to set (such as alignment or block-size or something) such that read-modify-write will indeed work ok? I wonder if there is some metadata missing from your file that will inform the HDF5 library what specific settings it must use to properly read and write to the file? I wonder if there is some boot-block information you have neglected to include so that the HDF5 library is not aware of all the paramaters effecting the file's layout.

The only reason for calling into question many possibilities of your implementation is that the HDF5 file format is fairly complex. I don't think it is easily duplicated without using the library itself. So, I think its highly likely you may be overlooking some important features of the format necessary for the HDF5 library to fully handle it.

All that said, I commend your courage for attempting it and hope others can chime in with more detailed thoughts on what to do about it.

Mark

"Hdf-forum on behalf of Krug, Markus" wrote:

Dear all,

I just came around an interesting issue.
I implemented the writing of HDF files on an embedded system. The amount of functionality I implemented is significant less than the HDF lib offers. So it is just tailored to my needs. I implemented everything on base of the HDF 3.0 file spec. One point of my tailoring was to optimize the file size. Therefore, I write every internal block in the HDF files aligned byte-by-byte to the next – or padded to the address alignment if it is requested by the HDF file specification. The HDF files generated by HDFview or Matlab have plenty of space in-between the internal blocks. Sometimes a few hundred bytes. As far as I read from the HDF file specification this ‘extended padding’ is not defined at all – not even recommended.
However, this ‘extended padding’ that is performed by the HDF lib leads to a behavior that I would consider as an incompatibility to itself. To demonstrate this I attached two HDF files to this email. The first (sizeoptimized.h5) is generated by my embedded software and is optimized concerning the file size. It contains three compounds with each of them has 2 elements. You should be able to open that file in HDFview or similar tools and read all its contents.
The second file (sizeoptimizedextended.h5) is generated by HDFview by adding a fourth compound after the sizeoptimized.h5 file was opened in HDFview. You can see that the file is partly corrupted. The reason for this is that HDFview (and therefore the HDF lib I guess) is not really taking care about the position of the internal blocks of a file that it is writing to. It seems to me it has some internal mapping of those blocks. This mapping gets applied even if it will collide, and therefore corrupt, the existing blocks.
If my observation is correct I think the HDF lib will need a bugfix or the HDF file spec will need a description of how the internal blocks are allowed to be positioned within a HDF file.
I forgot to mention that I tried to use the HDF lib sources and compile it to my system. However, I quit after a couple of days because the way the sources are written are not suitable at all to adopt them to an embedded system that runs a simplified file system and a real-time operating system – and all of it has to fit into a few hundred kilobytes.

Can anyone comment on my observation?

Best Regards
Markus

Dear all,

I just want to come back to my question about incompatibility between the HDFlib and the HDF file spec concerning the actual physical layout of a HDF file. Can anyone confirm my observation that this can lead to corrupt files if they are generated first in a ‘non HDFlib based’ application that complies to the HDF file spec and then is altered in a ‘HDFlib based’ application like HDFview?

Best Regards
Markus

···

Von: Krug, Markus
Gesendet: Mittwoch, 6. September 2017 17:56
An: 'HDF Users Discussion List' <hdf-forum@lists.hdfgroup.org>
Betreff: AW: [Hdf-forum] HDF lib incompatible with HDF file spec?

Dear Mark,

completely correct. I wrote some routines that generate hdf files. However only a small subset of functionality is uses. More less only compressed, compound data types with a maximum number of 5 will be in the files. Very likely not more than two groups. I follow this paper (http://www.ep.liu.se/ecp/076/050/ecp12076050.pdf) concerning the hdf file layout because I have the need to write ‘time series’ in my embedded application.

You are right. The HDF file spec is highly complex. Even my reduced functional set takes me significant more time that I was planning to get an understanding. In the meantime I think I understand what I need for my purpose. However, I’m not saying that the file that I can generate so far are 100% correct in the sense of the HDF file spec. But at least HDFview can read them with no problems. So it cannot be that wrong.

Best Regards
Markus

Von: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] Im Auftrag von Miller, Mark C.
Gesendet: Dienstag, 5. September 2017 19:22
An: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Betreff: Re: [Hdf-forum] HDF lib incompatible with HDF file spec?

Hmm. If I understand you, you have written code that you believe produces an HDF5 file according to the 3.0 file version specification, https://support.hdfgroup.org/HDF5/doc/H5.format.html but nevertheless does NOT use the HDF5 library to do it. Furthermore, where 'extended padding' is concerned, your implementation does business differently than the HDF5 implementation.

You can prove HDF5 tools will *read* the file ok. But, in a read-modify-write scenario, the file is getting corrupted by HDF5 library due to the difference in how the two implementations handle the extended padding -- a feature that you explain is '...not defined at all -- not even recommended'.

Is that about right?

If so, it does indeed sound like a potential issue in the file format specification for HDF5.

Your scenario sounds like a super useful test case...does a wholly independent implementation produce a file the HDF5 library can "handle"?

I wonder if there are settings in HDF5 library you may need to set (such as alignment or block-size or something) such that read-modify-write will indeed work ok? I wonder if there is some metadata missing from your file that will inform the HDF5 library what specific settings it must use to properly read and write to the file? I wonder if there is some boot-block information you have neglected to include so that the HDF5 library is not aware of all the paramaters effecting the file's layout.

The only reason for calling into question many possibilities of your implementation is that the HDF5 file format is fairly complex. I don't think it is easily duplicated without using the library itself. So, I think its highly likely you may be overlooking some important features of the format necessary for the HDF5 library to fully handle it.

All that said, I commend your courage for attempting it and hope others can chime in with more detailed thoughts on what to do about it.

Mark

"Hdf-forum on behalf of Krug, Markus" wrote:

Dear all,

I just came around an interesting issue.
I implemented the writing of HDF files on an embedded system. The amount of functionality I implemented is significant less than the HDF lib offers. So it is just tailored to my needs. I implemented everything on base of the HDF 3.0 file spec. One point of my tailoring was to optimize the file size. Therefore, I write every internal block in the HDF files aligned byte-by-byte to the next – or padded to the address alignment if it is requested by the HDF file specification. The HDF files generated by HDFview or Matlab have plenty of space in-between the internal blocks. Sometimes a few hundred bytes. As far as I read from the HDF file specification this ‘extended padding’ is not defined at all – not even recommended.
However, this ‘extended padding’ that is performed by the HDF lib leads to a behavior that I would consider as an incompatibility to itself. To demonstrate this I attached two HDF files to this email. The first (sizeoptimized.h5) is generated by my embedded software and is optimized concerning the file size. It contains three compounds with each of them has 2 elements. You should be able to open that file in HDFview or similar tools and read all its contents.
The second file (sizeoptimizedextended.h5) is generated by HDFview by adding a fourth compound after the sizeoptimized.h5 file was opened in HDFview. You can see that the file is partly corrupted. The reason for this is that HDFview (and therefore the HDF lib I guess) is not really taking care about the position of the internal blocks of a file that it is writing to. It seems to me it has some internal mapping of those blocks. This mapping gets applied even if it will collide, and therefore corrupt, the existing blocks.
If my observation is correct I think the HDF lib will need a bugfix or the HDF file spec will need a description of how the internal blocks are allowed to be positioned within a HDF file.
I forgot to mention that I tried to use the HDF lib sources and compile it to my system. However, I quit after a couple of days because the way the sources are written are not suitable at all to adopt them to an embedded system that runs a simplified file system and a real-time operating system – and all of it has to fit into a few hundred kilobytes.

Can anyone comment on my observation?

Best Regards
Markus

Indeed. :slight_smile: I don’t have time to look into Markus’s file today, but I will take a look tomorrow and see what the best course of action is.

  Quincey

···

On Sep 5, 2017, at 2:22 PM, Latham, Robert J. <robl@mcs.anl.gov> wrote:

On Tue, 2017-09-05 at 17:21 +0000, Miller, Mark C. wrote:

Hmm. If I understand you, you have written code that you believe
produces an HDF5 file according to the 3.0 file version
specification, https://support.hdfgroup.org/HDF5/doc/H5.format.html
but nevertheless does NOT use the HDF5 library to do it. Furthermore,
where 'extended padding' is concerned, your implementation does
business differently than the HDF5 implementation.

You can prove HDF5 tools will *read* the file ok. But, in a read-
modify-write scenario, the file is getting corrupted by HDF5 library
due to the difference in how the two implementations handle the
extended padding -- a feature that you explain is '...not defined at
all -- not even recommended'.

Is that about right?

If so, it does indeed sound like a potential issue in the file format
specification for HDF5.

Your scenario sounds like a super useful test case...does a wholly
independent implementation produce a file the HDF5 library can
"handle"?

Over in Parallel-NetCDF land a few years back, we took, um, a "rather
aggressive interpretation" of the NetCDF spec with respect to alignment
and then opend a bug with Unidata when their tools did not follow the
rules as written.

As Mark observes, it was a productive exercise in keeping both
implementations honest.

==rob

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Just FYI... if the question is whether the Markus' code produces correct HDF5 files, then it might be helpful to look at an independently developed reader. If so, it might be worth looking at libmysofa:

"... The NetCDF and HDF5 libraries, which were intended to handle big data, were not originally designed to be compiled on constrained devices. The German company Symonics GmbH, (together with help from The HDF Group), has reimplemented the HDF5 file format specifications aiming at a light-weight HDF5 reader library called libmysofa. With libmysofa, the size of a SOFA reader can be reduced by a factor of eight. The library is open source and available under the Apache license. It provides reading capabilities to access SOFA files and directly addresses loading HRTFs into the system."

Hope this helps,

-- Dave

···

________________________________
From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Quincey Koziol <koziol@lbl.gov>
Sent: Tuesday, September 5, 2017 2:27 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] HDF lib incompatible with HDF file spec?

Indeed. :slight_smile: I don’t have time to look into Markus’s file today, but I will take a look tomorrow and see what the best course of action is.

        Quincey

On Sep 5, 2017, at 2:22 PM, Latham, Robert J. <robl@mcs.anl.gov> wrote:

On Tue, 2017-09-05 at 17:21 +0000, Miller, Mark C. wrote:

Hmm. If I understand you, you have written code that you believe
produces an HDF5 file according to the 3.0 file version
specification, HDF5 File Format Specification Version 3.0
but nevertheless does NOT use the HDF5 library to do it. Furthermore,
where 'extended padding' is concerned, your implementation does
business differently than the HDF5 implementation.

You can prove HDF5 tools will *read* the file ok. But, in a read-
modify-write scenario, the file is getting corrupted by HDF5 library
due to the difference in how the two implementations handle the
extended padding -- a feature that you explain is '...not defined at
all -- not even recommended'.

Is that about right?

If so, it does indeed sound like a potential issue in the file format
specification for HDF5.

Your scenario sounds like a super useful test case...does a wholly
independent implementation produce a file the HDF5 library can
"handle"?

Over in Parallel-NetCDF land a few years back, we took, um, a "rather
aggressive interpretation" of the NetCDF spec with respect to alignment
and then opend a bug with Unidata when their tools did not follow the
rules as written.

As Mark observes, it was a productive exercise in keeping both
implementations honest.

==rob

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi Quincey,

looking forward to get your results of the file analysis.

Best Regards
Markus

···

-----Ursprüngliche Nachricht-----
Von: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] Im Auftrag von Quincey Koziol
Gesendet: Dienstag, 5. September 2017 21:28
An: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Betreff: Re: [Hdf-forum] HDF lib incompatible with HDF file spec?

Indeed. :slight_smile: I don’t have time to look into Markus’s file today, but I will take a look tomorrow and see what the best course of action is.

  Quincey

On Sep 5, 2017, at 2:22 PM, Latham, Robert J. <robl@mcs.anl.gov> wrote:

On Tue, 2017-09-05 at 17:21 +0000, Miller, Mark C. wrote:

Hmm. If I understand you, you have written code that you believe
produces an HDF5 file according to the 3.0 file version
specification, HDF5 File Format Specification Version 3.0
but nevertheless does NOT use the HDF5 library to do it. Furthermore,
where 'extended padding' is concerned, your implementation does
business differently than the HDF5 implementation.

You can prove HDF5 tools will *read* the file ok. But, in a read-
modify-write scenario, the file is getting corrupted by HDF5 library
due to the difference in how the two implementations handle the
extended padding -- a feature that you explain is '...not defined at
all -- not even recommended'.

Is that about right?

If so, it does indeed sound like a potential issue in the file format
specification for HDF5.

Your scenario sounds like a super useful test case...does a wholly
independent implementation produce a file the HDF5 library can
"handle"?

Over in Parallel-NetCDF land a few years back, we took, um, a "rather
aggressive interpretation" of the NetCDF spec with respect to
alignment and then opend a bug with Unidata when their tools did not
follow the rules as written.

As Mark observes, it was a productive exercise in keeping both
implementations honest.

==rob

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.or
g
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi Markus,
  I’ve looked at the files you’ve produced and it seems like the first object is getting corrupted when you add the 4th object. Can you see if that’s the case? Also, have you been using the h5debug tool for looking at your files? (in the tools directory) Or h5check?

  Regards,
    Quincey

···

On Sep 18, 2017, at 5:03 AM, Krug, Markus <markus.krug@hm.edu> wrote:

Dear all,

I just want to come back to my question about incompatibility between the HDFlib and the HDF file spec concerning the actual physical layout of a HDF file. Can anyone confirm my observation that this can lead to corrupt files if they are generated first in a ‘non HDFlib based’ application that complies to the HDF file spec and then is altered in a ‘HDFlib based’ application like HDFview?

Best Regards
Markus
Von: Krug, Markus
Gesendet: Mittwoch, 6. September 2017 17:56
An: 'HDF Users Discussion List' <hdf-forum@lists.hdfgroup.org <mailto:hdf-forum@lists.hdfgroup.org>>
Betreff: AW: [Hdf-forum] HDF lib incompatible with HDF file spec?

Dear Mark,

completely correct. I wrote some routines that generate hdf files. However only a small subset of functionality is uses. More less only compressed, compound data types with a maximum number of 5 will be in the files. Very likely not more than two groups. I follow this paper (http://www.ep.liu.se/ecp/076/050/ecp12076050.pdf) concerning the hdf file layout because I have the need to write ‘time series’ in my embedded application.

You are right. The HDF file spec is highly complex. Even my reduced functional set takes me significant more time that I was planning to get an understanding. In the meantime I think I understand what I need for my purpose. However, I’m not saying that the file that I can generate so far are 100% correct in the sense of the HDF file spec. But at least HDFview can read them with no problems. So it cannot be that wrong.

Best Regards
Markus

Von: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org <mailto:hdf-forum-bounces@lists.hdfgroup.org>] Im Auftrag von Miller, Mark C.
Gesendet: Dienstag, 5. September 2017 19:22
An: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org <mailto:hdf-forum@lists.hdfgroup.org>>
Betreff: Re: [Hdf-forum] HDF lib incompatible with HDF file spec?

Hmm. If I understand you, you have written code that you believe produces an HDF5 file according to the 3.0 file version specification, https://support.hdfgroup.org/HDF5/doc/H5.format.html but nevertheless does NOT use the HDF5 library to do it. Furthermore, where 'extended padding' is concerned, your implementation does business differently than the HDF5 implementation.

You can prove HDF5 tools will *read* the file ok. But, in a read-modify-write scenario, the file is getting corrupted by HDF5 library due to the difference in how the two implementations handle the extended padding -- a feature that you explain is '...not defined at all -- not even recommended'.

Is that about right?

If so, it does indeed sound like a potential issue in the file format specification for HDF5.

Your scenario sounds like a super useful test case...does a wholly independent implementation produce a file the HDF5 library can "handle"?

I wonder if there are settings in HDF5 library you may need to set (such as alignment or block-size or something) such that read-modify-write will indeed work ok? I wonder if there is some metadata missing from your file that will inform the HDF5 library what specific settings it must use to properly read and write to the file? I wonder if there is some boot-block information you have neglected to include so that the HDF5 library is not aware of all the paramaters effecting the file's layout.

The only reason for calling into question many possibilities of your implementation is that the HDF5 file format is fairly complex. I don't think it is easily duplicated without using the library itself. So, I think its highly likely you may be overlooking some important features of the format necessary for the HDF5 library to fully handle it.

All that said, I commend your courage for attempting it and hope others can chime in with more detailed thoughts on what to do about it.

Mark

"Hdf-forum on behalf of Krug, Markus" wrote:

Dear all,

I just came around an interesting issue.
I implemented the writing of HDF files on an embedded system. The amount of functionality I implemented is significant less than the HDF lib offers. So it is just tailored to my needs. I implemented everything on base of the HDF 3.0 file spec. One point of my tailoring was to optimize the file size. Therefore, I write every internal block in the HDF files aligned byte-by-byte to the next – or padded to the address alignment if it is requested by the HDF file specification. The HDF files generated by HDFview or Matlab have plenty of space in-between the internal blocks. Sometimes a few hundred bytes. As far as I read from the HDF file specification this ‘extended padding’ is not defined at all – not even recommended.
However, this ‘extended padding’ that is performed by the HDF lib leads to a behavior that I would consider as an incompatibility to itself. To demonstrate this I attached two HDF files to this email. The first (sizeoptimized.h5) is generated by my embedded software and is optimized concerning the file size. It contains three compounds with each of them has 2 elements. You should be able to open that file in HDFview or similar tools and read all its contents.
The second file (sizeoptimizedextended.h5) is generated by HDFview by adding a fourth compound after the sizeoptimized.h5 file was opened in HDFview. You can see that the file is partly corrupted. The reason for this is that HDFview (and therefore the HDF lib I guess) is not really taking care about the position of the internal blocks of a file that it is writing to. It seems to me it has some internal mapping of those blocks. This mapping gets applied even if it will collide, and therefore corrupt, the existing blocks.
If my observation is correct I think the HDF lib will need a bugfix or the HDF file spec will need a description of how the internal blocks are allowed to be positioned within a HDF file.
I forgot to mention that I tried to use the HDF lib sources and compile it to my system. However, I quit after a couple of days because the way the sources are written are not suitable at all to adopt them to an embedded system that runs a simplified file system and a real-time operating system – and all of it has to fit into a few hundred kilobytes.

Can anyone comment on my observation?

Best Regards
Markus
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org <mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi,

Can I use high level API function calls (H5TBmake_table(...)) in parallel version of the HDF5 library?
There are no property list parameters for that function calls...

Regards,
Rafal

Dear all,

is anybody using libmysofa? I think it is beyond my resources to compile the sources and write a test program for the file I was generating. Nevertheless I think the file I was attaching in my previous email is correct in the sense of the HDF 3.0 file spec. As you can see it can be read by HDFview without any problem at all.
The problem is more on writing HDF files. HDFview, and I guess most of the applications that use the hdflib, are writing to certain positions in the file without checking if the position is already in use. The reason for this is the assumption that the file has been generated with hdflib. So the hdflib writes to positions that it has been left free in previous file generation- or manipulation steps.
My point is that as far as I can see there is no specification about the physical address layout in a HDF file. The whole HDF file specification is built on a tree structure where different nodes have different meanings and point to the possible next nodes. The physical position of the node is not specified. However the hdflib seems to have some assumptions where to place certain nodes. This leads very likely to a corruption of a HDF file if it was initially generated without the hdflib and then gets updated/manipulated with an application that is based on the hdflib. The reason why this hasn't been observed before might be that most people can use the original sources from the HDFgroup and stay with the generated hdflib for generating, reading/writing/manipulating their HDF files. Which I cannot, because of my limited hardware resources. My plan is to write HDF files on my embedded device and do the data analysis on host computers. This analysis will include to add new data and or metadata. So I cannot have the risk that after the analysis of the data the hdf file is corrupt.

I kindly ask the HDF community to check if my observation is true. After that we can discuss if the misunderstanding was on my side or if the HDF file spec or implementation needs an update.
Best Regards
Markus

···

Von: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] Im Auftrag von David Pearah
Gesendet: Mittwoch, 6. September 2017 04:32
An: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Betreff: Re: [Hdf-forum] HDF lib incompatible with HDF file spec?

Just FYI... if the question is whether the Markus' code produces correct HDF5 files, then it might be helpful to look at an independently developed reader. If so, it might be worth looking at libmysofa:


"... The NetCDF and HDF5 libraries, which were intended to handle big data, were not originally designed to be compiled on constrained devices. The German company Symonics GmbH, (together with help from The HDF Group), has reimplemented the HDF5 file format specifications aiming at a light-weight HDF5 reader library called libmysofa. With libmysofa, the size of a SOFA reader can be reduced by a factor of eight. The library is open source and available under the Apache license. It provides reading capabilities to access SOFA files and directly addresses loading HRTFs into the system."

Hope this helps,

-- Dave

________________________________
From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Quincey Koziol <koziol@lbl.gov<mailto:koziol@lbl.gov>>
Sent: Tuesday, September 5, 2017 2:27 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] HDF lib incompatible with HDF file spec?

Indeed. :slight_smile: I don't have time to look into Markus's file today, but I will take a look tomorrow and see what the best course of action is.

        Quincey

On Sep 5, 2017, at 2:22 PM, Latham, Robert J. <robl@mcs.anl.gov<mailto:robl@mcs.anl.gov>> wrote:

On Tue, 2017-09-05 at 17:21 +0000, Miller, Mark C. wrote:

Hmm. If I understand you, you have written code that you believe
produces an HDF5 file according to the 3.0 file version
specification, https://support.hdfgroup.org/HDF5/doc/H5.format.html
but nevertheless does NOT use the HDF5 library to do it. Furthermore,
where 'extended padding' is concerned, your implementation does
business differently than the HDF5 implementation.

You can prove HDF5 tools will *read* the file ok. But, in a read-
modify-write scenario, the file is getting corrupted by HDF5 library
due to the difference in how the two implementations handle the
extended padding -- a feature that you explain is '...not defined at
all -- not even recommended'.

Is that about right?

If so, it does indeed sound like a potential issue in the file format
specification for HDF5.

Your scenario sounds like a super useful test case...does a wholly
independent implementation produce a file the HDF5 library can
"handle"?

Over in Parallel-NetCDF land a few years back, we took, um, a "rather
aggressive interpretation" of the NetCDF spec with respect to alignment
and then opend a bug with Unidata when their tools did not follow the
rules as written.

As Mark observes, it was a productive exercise in keeping both
implementations honest.

==rob

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi Dave,

Interesting information. Thanks for sharing.

That said, and not at all to denigrate the work you describe here but I suspect doing a lightweight reader....umm, well, I was about to say would be substantially easier than doing a writer. But, given that any file the HDF5 library is capable of reading could have been produced by any one of a number of VFDs, having a reader capable of reading *any* HDF5 file is probably a fairly complex under-taking as well. I suspect the reader you reference here is probably assuming something like the stdio or sec2 VFD and not something more exotic like the family or split VFD or globus.

This discussion actually points in a direction I was headed when first raising some questions about the impact of the *current* bytes-on-disk file format in terms of HPC relevant performance scenarios as well as SQE necessary to support it.

Mark

"Hdf-forum on behalf of David Pearah" wrote:

Just FYI... if the question is whether the Markus' code produces correct HDF5 files, then it might be helpful to look at an independently developed reader. If so, it might be worth looking at libmysofa:

"... The NetCDF and HDF5 libraries, which were intended to handle big data, were not originally designed to be compiled on constrained devices. The German company Symonics GmbH, (together with help from The HDF Group), has reimplemented the HDF5 file format specifications aiming at a light-weight HDF5 reader library called libmysofa. With libmysofa, the size of a SOFA reader can be reduced by a factor of eight. The library is open source and available under the Apache license. It provides reading capabilities to access SOFA files and directly addresses loading HRTFs into the system."

Hope this helps,

-- Dave

···

________________________________
From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Quincey Koziol <koziol@lbl.gov>
Sent: Tuesday, September 5, 2017 2:27 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] HDF lib incompatible with HDF file spec?

Indeed. :slight_smile: I don’t have time to look into Markus’s file today, but I will take a look tomorrow and see what the best course of action is.

        Quincey

On Sep 5, 2017, at 2:22 PM, Latham, Robert J. <robl@mcs.anl.gov> wrote:

On Tue, 2017-09-05 at 17:21 +0000, Miller, Mark C. wrote:

Hmm. If I understand you, you have written code that you believe
produces an HDF5 file according to the 3.0 file version
specification, HDF5 File Format Specification Version 3.0
but nevertheless does NOT use the HDF5 library to do it. Furthermore,
where 'extended padding' is concerned, your implementation does
business differently than the HDF5 implementation.

You can prove HDF5 tools will *read* the file ok. But, in a read-
modify-write scenario, the file is getting corrupted by HDF5 library
due to the difference in how the two implementations handle the
extended padding -- a feature that you explain is '...not defined at
all -- not even recommended'.

Is that about right?

If so, it does indeed sound like a potential issue in the file format
specification for HDF5.

Your scenario sounds like a super useful test case...does a wholly
independent implementation produce a file the HDF5 library can
"handle"?

Over in Parallel-NetCDF land a few years back, we took, um, a "rather
aggressive interpretation" of the NetCDF spec with respect to alignment
and then opend a bug with Unidata when their tools did not follow the
rules as written.

As Mark observes, it was a productive exercise in keeping both
implementations honest.

==rob

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Dear Quincey,

yes the file gets corrupted if you add the 4th object. However, the problem I’m observing is not related to the number of objects you add to the file or the number that are already in the file. It’s just because the HDF file spec did not specify the location of different blocks within the file. The entire spec is a linked list that has its origin in the superblock. The superblock itself is the only block that has rules about its location. So in my understanding all software that handles HDF files in any way should first explore the linked list structure and identify afterwards the location that are not used yet and can therefore be used for adding additional content to the HDF file if requested. From what I observe the HDFlib implementation behaves different. It has an algorithm where to locate the different blocks. This algorithm does not consider if these locations are already occupied or not. As long as you use the HDFlib implementation this behavior will not lead to any problems because you are somehow consistent. The problem shows up at that point in time when you generate HDF files with one tool and modify them afterwards with a tool that is based on HDFlib.

Actually I’m quite surprised that this behavior hasn’t been observed before. I guess the reason is that not many projects use HDF files in embedded projects (small 16- or 32bit microcontroller with significant less than 1Mbyte program memory, and no or only a small real-time operating system). Additionally even in applications where computing power and memory is not a topic to be too concerned people use the HDFlib code or binary to save the time it takes to re-write it. Nevertheless, I’m almost sure I found a ‘hole’ in the specification that needs to be fixed. Either in the file specification or the HDFlib implementation.

I did not use h5check or h5debug. Is it necessary to compile the belonging code before I can use it? I’m also not sure if that will give me new results because the file I’m generating is accepted by HDFview with no problem at all. Do you think HDFview will accept files that do not follow the HDF standard?

Best Regards
Markus

···

Von: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] Im Auftrag von Quincey Koziol
Gesendet: Montag, 18. September 2017 17:34
An: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org>
Betreff: Re: [Hdf-forum] HDF lib incompatible with HDF file spec?

Hi Markus,
            I’ve looked at the files you’ve produced and it seems like the first object is getting corrupted when you add the 4th object. Can you see if that’s the case? Also, have you been using the h5debug tool for looking at your files? (in the tools directory) Or h5check?

            Regards,
                        Quincey

On Sep 18, 2017, at 5:03 AM, Krug, Markus <markus.krug@hm.edu<mailto:markus.krug@hm.edu>> wrote:

Dear all,

I just want to come back to my question about incompatibility between the HDFlib and the HDF file spec concerning the actual physical layout of a HDF file. Can anyone confirm my observation that this can lead to corrupt files if they are generated first in a ‘non HDFlib based’ application that complies to the HDF file spec and then is altered in a ‘HDFlib based’ application like HDFview?

Best Regards
Markus
Von: Krug, Markus
Gesendet: Mittwoch, 6. September 2017 17:56
An: 'HDF Users Discussion List' <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Betreff: AW: [Hdf-forum] HDF lib incompatible with HDF file spec?

Dear Mark,

completely correct. I wrote some routines that generate hdf files. However only a small subset of functionality is uses. More less only compressed, compound data types with a maximum number of 5 will be in the files. Very likely not more than two groups. I follow this paper (http://www.ep.liu.se/ecp/076/050/ecp12076050.pdf) concerning the hdf file layout because I have the need to write ‘time series’ in my embedded application.

You are right. The HDF file spec is highly complex. Even my reduced functional set takes me significant more time that I was planning to get an understanding. In the meantime I think I understand what I need for my purpose. However, I’m not saying that the file that I can generate so far are 100% correct in the sense of the HDF file spec. But at least HDFview can read them with no problems. So it cannot be that wrong.

Best Regards
Markus

Von: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] Im Auftrag von Miller, Mark C.
Gesendet: Dienstag, 5. September 2017 19:22
An: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Betreff: Re: [Hdf-forum] HDF lib incompatible with HDF file spec?

Hmm. If I understand you, you have written code that you believe produces an HDF5 file according to the 3.0 file version specification, https://support.hdfgroup.org/HDF5/doc/H5.format.html but nevertheless does NOT use the HDF5 library to do it. Furthermore, where 'extended padding' is concerned, your implementation does business differently than the HDF5 implementation.

You can prove HDF5 tools will *read* the file ok. But, in a read-modify-write scenario, the file is getting corrupted by HDF5 library due to the difference in how the two implementations handle the extended padding -- a feature that you explain is '...not defined at all -- not even recommended'.

Is that about right?

If so, it does indeed sound like a potential issue in the file format specification for HDF5.

Your scenario sounds like a super useful test case...does a wholly independent implementation produce a file the HDF5 library can "handle"?

I wonder if there are settings in HDF5 library you may need to set (such as alignment or block-size or something) such that read-modify-write will indeed work ok? I wonder if there is some metadata missing from your file that will inform the HDF5 library what specific settings it must use to properly read and write to the file? I wonder if there is some boot-block information you have neglected to include so that the HDF5 library is not aware of all the paramaters effecting the file's layout.

The only reason for calling into question many possibilities of your implementation is that the HDF5 file format is fairly complex. I don't think it is easily duplicated without using the library itself. So, I think its highly likely you may be overlooking some important features of the format necessary for the HDF5 library to fully handle it.

All that said, I commend your courage for attempting it and hope others can chime in with more detailed thoughts on what to do about it.

Mark

"Hdf-forum on behalf of Krug, Markus" wrote:

Dear all,

I just came around an interesting issue.
I implemented the writing of HDF files on an embedded system. The amount of functionality I implemented is significant less than the HDF lib offers. So it is just tailored to my needs. I implemented everything on base of the HDF 3.0 file spec. One point of my tailoring was to optimize the file size. Therefore, I write every internal block in the HDF files aligned byte-by-byte to the next – or padded to the address alignment if it is requested by the HDF file specification. The HDF files generated by HDFview or Matlab have plenty of space in-between the internal blocks. Sometimes a few hundred bytes. As far as I read from the HDF file specification this ‘extended padding’ is not defined at all – not even recommended.
However, this ‘extended padding’ that is performed by the HDF lib leads to a behavior that I would consider as an incompatibility to itself. To demonstrate this I attached two HDF files to this email. The first (sizeoptimized.h5) is generated by my embedded software and is optimized concerning the file size. It contains three compounds with each of them has 2 elements. You should be able to open that file in HDFview or similar tools and read all its contents.
The second file (sizeoptimizedextended.h5) is generated by HDFview by adding a fourth compound after the sizeoptimized.h5 file was opened in HDFview. You can see that the file is partly corrupted. The reason for this is that HDFview (and therefore the HDF lib I guess) is not really taking care about the position of the internal blocks of a file that it is writing to. It seems to me it has some internal mapping of those blocks. This mapping gets applied even if it will collide, and therefore corrupt, the existing blocks.
If my observation is correct I think the HDF lib will need a bugfix or the HDF file spec will need a description of how the internal blocks are allowed to be positioned within a HDF file.
I forgot to mention that I tried to use the HDF lib sources and compile it to my system. However, I quit after a couple of days because the way the sources are written are not suitable at all to adopt them to an embedded system that runs a simplified file system and a real-time operating system – and all of it has to fit into a few hundred kilobytes.

Can anyone comment on my observation?

Best Regards
Markus
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi,

No hints?

Beside this H5TBmake_table(...) currently I've got a problem with simple H5Gcreate() call...

I'm starting 4 MPI processes.
When I create a file and set it for parallel access, so:

plist_id = H5Pcreate(H5P_FILE_ACCESS);
H5Pset_fapl_mpio(plist_id, comm, info);
file_id = H5Fcreate(H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, plist_id);

and then I try to create a group with the name containing the process number, there is only 1 group created ("1" - interesting...) in the result file.

What I'm doing wrong?
Should I pass some property list values to this H5Gcreate function call?
I cannot find appropriate "create in parallel I/O mode" property for the group in the documentation...

I've tried:
plist_id = H5Pcreate(H5P_GROUP_ACCESS);
H5Pset_all_coll_metadata_ops( plist_id, true );

and pass this property as the last argument to H5Gcreate function - no success...

I will be grateful for any help.

Regards,
Rafal

W dniu 2017-09-18 o 15:52, Rafal Lichwala pisze:

···

Hi,

Can I use high level API function calls (H5TBmake_table(...)) in parallel version of the HDF5 library?
There are no property list parameters for that function calls...

Regards,
Rafal

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi Rafal,

No, the HDF5 High Level APIs are not supported in the parallel version of HDF5.

-Barbara
help@hdfgroup.org

···

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Rafal Lichwala
Sent: Monday, September 18, 2017 8:53 AM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] high level API for parallel version of HDF5 library

Hi,

Can I use high level API function calls (H5TBmake_table(...)) in parallel version of the HDF5 library?
There are no property list parameters for that function calls...

Regards,
Rafal

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi Markus,
  Sounds like there is more to investigate here. :-/ Unfortunately, I’m very time constrained right now and can’t spend more hours in this direction. I spoke with Elena and she’s going to see about some HDF Group staff to look into the issue.

  Quincey

···

On Sep 18, 2017, at 11:41 PM, Krug, Markus <markus.krug@hm.edu> wrote:

Dear Quincey,

yes the file gets corrupted if you add the 4th object. However, the problem I’m observing is not related to the number of objects you add to the file or the number that are already in the file. It’s just because the HDF file spec did not specify the location of different blocks within the file. The entire spec is a linked list that has its origin in the superblock. The superblock itself is the only block that has rules about its location. So in my understanding all software that handles HDF files in any way should first explore the linked list structure and identify afterwards the location that are not used yet and can therefore be used for adding additional content to the HDF file if requested. From what I observe the HDFlib implementation behaves different. It has an algorithm where to locate the different blocks. This algorithm does not consider if these locations are already occupied or not. As long as you use the HDFlib implementation this behavior will not lead to any problems because you are somehow consistent. The problem shows up at that point in time when you generate HDF files with one tool and modify them afterwards with a tool that is based on HDFlib.

Actually I’m quite surprised that this behavior hasn’t been observed before. I guess the reason is that not many projects use HDF files in embedded projects (small 16- or 32bit microcontroller with significant less than 1Mbyte program memory, and no or only a small real-time operating system). Additionally even in applications where computing power and memory is not a topic to be too concerned people use the HDFlib code or binary to save the time it takes to re-write it. Nevertheless, I’m almost sure I found a ‘hole’ in the specification that needs to be fixed. Either in the file specification or the HDFlib implementation.

I did not use h5check or h5debug. Is it necessary to compile the belonging code before I can use it? I’m also not sure if that will give me new results because the file I’m generating is accepted by HDFview with no problem at all. Do you think HDFview will accept files that do not follow the HDF standard?

Best Regards
Markus
<>
Von: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org <mailto:hdf-forum-bounces@lists.hdfgroup.org>] Im Auftrag von Quincey Koziol
Gesendet: Montag, 18. September 2017 17:34
An: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org <mailto:hdf-forum@lists.hdfgroup.org>>
Betreff: Re: [Hdf-forum] HDF lib incompatible with HDF file spec?

Hi Markus,
            I’ve looked at the files you’ve produced and it seems like the first object is getting corrupted when you add the 4th object. Can you see if that’s the case? Also, have you been using the h5debug tool for looking at your files? (in the tools directory) Or h5check?

            Regards,
                        Quincey

On Sep 18, 2017, at 5:03 AM, Krug, Markus <markus.krug@hm.edu <mailto:markus.krug@hm.edu>> wrote:

Dear all,

I just want to come back to my question about incompatibility between the HDFlib and the HDF file spec concerning the actual physical layout of a HDF file. Can anyone confirm my observation that this can lead to corrupt files if they are generated first in a ‘non HDFlib based’ application that complies to the HDF file spec and then is altered in a ‘HDFlib based’ application like HDFview?

Best Regards
Markus
Von: Krug, Markus
Gesendet: Mittwoch, 6. September 2017 17:56
An: 'HDF Users Discussion List' <hdf-forum@lists.hdfgroup.org <mailto:hdf-forum@lists.hdfgroup.org>>
Betreff: AW: [Hdf-forum] HDF lib incompatible with HDF file spec?

Dear Mark,

completely correct. I wrote some routines that generate hdf files. However only a small subset of functionality is uses. More less only compressed, compound data types with a maximum number of 5 will be in the files. Very likely not more than two groups. I follow this paper (http://www.ep.liu.se/ecp/076/050/ecp12076050.pdf) concerning the hdf file layout because I have the need to write ‘time series’ in my embedded application.

You are right. The HDF file spec is highly complex. Even my reduced functional set takes me significant more time that I was planning to get an understanding. In the meantime I think I understand what I need for my purpose. However, I’m not saying that the file that I can generate so far are 100% correct in the sense of the HDF file spec. But at least HDFview can read them with no problems. So it cannot be that wrong.

Best Regards
Markus

Von: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org <mailto:hdf-forum-bounces@lists.hdfgroup.org>] Im Auftrag von Miller, Mark C.
Gesendet: Dienstag, 5. September 2017 19:22
An: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org <mailto:hdf-forum@lists.hdfgroup.org>>
Betreff: Re: [Hdf-forum] HDF lib incompatible with HDF file spec?

Hmm. If I understand you, you have written code that you believe produces an HDF5 file according to the 3.0 file version specification, https://support.hdfgroup.org/HDF5/doc/H5.format.html but nevertheless does NOT use the HDF5 library to do it. Furthermore, where 'extended padding' is concerned, your implementation does business differently than the HDF5 implementation.

You can prove HDF5 tools will *read* the file ok. But, in a read-modify-write scenario, the file is getting corrupted by HDF5 library due to the difference in how the two implementations handle the extended padding -- a feature that you explain is '...not defined at all -- not even recommended'.

Is that about right?

If so, it does indeed sound like a potential issue in the file format specification for HDF5.

Your scenario sounds like a super useful test case...does a wholly independent implementation produce a file the HDF5 library can "handle"?

I wonder if there are settings in HDF5 library you may need to set (such as alignment or block-size or something) such that read-modify-write will indeed work ok? I wonder if there is some metadata missing from your file that will inform the HDF5 library what specific settings it must use to properly read and write to the file? I wonder if there is some boot-block information you have neglected to include so that the HDF5 library is not aware of all the paramaters effecting the file's layout.

The only reason for calling into question many possibilities of your implementation is that the HDF5 file format is fairly complex. I don't think it is easily duplicated without using the library itself. So, I think its highly likely you may be overlooking some important features of the format necessary for the HDF5 library to fully handle it.

All that said, I commend your courage for attempting it and hope others can chime in with more detailed thoughts on what to do about it.

Mark

"Hdf-forum on behalf of Krug, Markus" wrote:

Dear all,

I just came around an interesting issue.
I implemented the writing of HDF files on an embedded system. The amount of functionality I implemented is significant less than the HDF lib offers. So it is just tailored to my needs. I implemented everything on base of the HDF 3.0 file spec. One point of my tailoring was to optimize the file size. Therefore, I write every internal block in the HDF files aligned byte-by-byte to the next – or padded to the address alignment if it is requested by the HDF file specification. The HDF files generated by HDFview or Matlab have plenty of space in-between the internal blocks. Sometimes a few hundred bytes. As far as I read from the HDF file specification this ‘extended padding’ is not defined at all – not even recommended.
However, this ‘extended padding’ that is performed by the HDF lib leads to a behavior that I would consider as an incompatibility to itself. To demonstrate this I attached two HDF files to this email. The first (sizeoptimized.h5) is generated by my embedded software and is optimized concerning the file size. It contains three compounds with each of them has 2 elements. You should be able to open that file in HDFview or similar tools and read all its contents.
The second file (sizeoptimizedextended.h5) is generated by HDFview by adding a fourth compound after the sizeoptimized.h5 file was opened in HDFview. You can see that the file is partly corrupted. The reason for this is that HDFview (and therefore the HDF lib I guess) is not really taking care about the position of the internal blocks of a file that it is writing to. It seems to me it has some internal mapping of those blocks. This mapping gets applied even if it will collide, and therefore corrupt, the existing blocks.
If my observation is correct I think the HDF lib will need a bugfix or the HDF file spec will need a description of how the internal blocks are allowed to be positioned within a HDF file.
I forgot to mention that I tried to use the HDF lib sources and compile it to my system. However, I quit after a couple of days because the way the sources are written are not suitable at all to adopt them to an embedded system that runs a simplified file system and a real-time operating system – and all of it has to fit into a few hundred kilobytes.

Can anyone comment on my observation?

Best Regards
Markus
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org <mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org <mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi Barbara, Hi All,

Thank you for your answer. That's clear now about H5TBmake_table() call, but...
H5Gcreate() in not a high level API, isn't it?
So why I cannot use it in parallel processes?
Maybe I'm just doing something wrong, so could you please provide me a short example how to create a set of groups (each one is the process number) running 4 parallel MPI processes? You can limit the example code to the sequence of HDF5 calls only...
My current code works fine for just one process, but when I try it for 2 (or more) parallel processes the result file is corrupted:

plist_id = H5Pcreate(H5P_FILE_ACCESS);
H5Pset_fapl_mpio(plist_id, comm, info);
H5Pset_all_coll_metadata_ops( plist_id, true );
file_id = H5Fcreate(H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, plist_id);
H5Pclose(plist_id);
hid_t gr_id = H5Gcreate(file_id, std::to_string(procid).c_str(), H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
H5Gclose(gr_id);
H5Fclose(file_id);

Best regards,
Rafal

W dniu 2017-09-25 o 22:20, Barbara Jones pisze:

···

Hi Rafal,

No, the HDF5 High Level APIs are not supported in the parallel version of HDF5.

-Barbara
help@hdfgroup.org

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Rafal Lichwala
Sent: Monday, September 18, 2017 8:53 AM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] high level API for parallel version of HDF5 library

Hi,

Can I use high level API function calls (H5TBmake_table(...)) in parallel version of the HDF5 library?
There are no property list parameters for that function calls...

Regards,
Rafal

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hi,

The library documentation says that H5Fset_mpio_atomicity() is available since 1.8.9 release, but I cannot find this in the latest 1.10 (built with parallel mode).
How to solve this?

Regards,
Rafal

Calls that affect the metadata need to be collective so that each process has a consistent view of what the file metadata should be.

https://support.hdfgroup.org/HDF5/doc/RM/CollectiveCalls.html

Something like this (or the attached):

plist_id = H5Pcreate(H5P_FILE_ACCESS);

H5Pset_fapl_mpio(plist_id, comm, info);

H5Pset_all_coll_metadata_ops( plist_id, true );

file_id = H5Fcreate(H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, plist_id);

H5Pclose(plist_id);

for(int procid = 0; i < mpi_size; ++i) {

  hid_t gr_id = H5Gcreate(file_id, std::to_string(procid).c_str(), H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);

  H5Gclose(gr_id);

}

H5Fclose(file_id);

h5g_parallel.cpp (6.39 KB)

···

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Rafal Lichwala
Sent: Wednesday, September 27, 2017 12:32 AM
To: hdf-forum@lists.hdfgroup.org
Subject: Re: [Hdf-forum] high level API for parallel version of HDF5 library

Hi Barbara, Hi All,

Thank you for your answer. That's clear now about H5TBmake_table() call, but...

H5Gcreate() in not a high level API, isn't it?

So why I cannot use it in parallel processes?

Maybe I'm just doing something wrong, so could you please provide me a short example how to create a set of groups (each one is the process

number) running 4 parallel MPI processes? You can limit the example code to the sequence of HDF5 calls only...

My current code works fine for just one process, but when I try it for 2 (or more) parallel processes the result file is corrupted:

plist_id = H5Pcreate(H5P_FILE_ACCESS);

H5Pset_fapl_mpio(plist_id, comm, info);

H5Pset_all_coll_metadata_ops( plist_id, true ); file_id = H5Fcreate(H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, plist_id); H5Pclose(plist_id); hid_t gr_id = H5Gcreate(file_id, std::to_string(procid).c_str(), H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT); H5Gclose(gr_id); H5Fclose(file_id);

Best regards,

Rafal

W dniu 2017-09-25 o 22:20, Barbara Jones pisze:

Hi Rafal,

No, the HDF5 High Level APIs are not supported in the parallel version of HDF5.

-Barbara

help@hdfgroup.org<mailto:help@hdfgroup.org>

-----Original Message-----

From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Rafal Lichwala

Sent: Monday, September 18, 2017 8:53 AM

To: hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>

Subject: [Hdf-forum] high level API for parallel version of HDF5 library

Hi,

Can I use high level API function calls (H5TBmake_table(...)) in parallel version of the HDF5 library?

There are no property list parameters for that function calls...

Regards,

Rafal

_______________________________________________

Hdf-forum is for HDF software users discussion.

Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>

http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Twitter: https://twitter.com/hdf5

_______________________________________________

Hdf-forum is for HDF software users discussion.

Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>

http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Twitter: https://twitter.com/hdf5

_______________________________________________

Hdf-forum is for HDF software users discussion.

Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>

http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Twitter: https://twitter.com/hdf5

Hi,

Thank you for an answer and example codes.
Creating metadata (groups, datasets) is clear now and works fine, but I've got the last doubt: what in case I'm running 4 MPI processes but only 3 of them have some data to be written to the given dataset.
Since the H5Dwrite() call is in collective mode, my program hangs...
How to solve this?

Regards,
Rafal

W dniu 2017-09-27 o 22:50, Nelson, Jarom pisze:

···

Calls that affect the metadata need to be collective so that each process has a consistent view of what the file metadata should be.

https://support.hdfgroup.org/HDF5/doc/RM/CollectiveCalls.html

Something like this (or the attached):

plist_id = H5Pcreate(H5P_FILE_ACCESS);

H5Pset_fapl_mpio(plist_id, comm, info);

H5Pset_all_coll_metadata_ops( plist_id, true );

file_id = H5Fcreate(H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, plist_id);

H5Pclose(plist_id);

for(int procid = 0; i < mpi_size; ++i) {

hid_t gr_id = H5Gcreate(file_id, std::to_string(procid).c_str(), H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);

H5Gclose(gr_id);

}

H5Fclose(file_id);

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Rafal Lichwala
Sent: Wednesday, September 27, 2017 12:32 AM
To: hdf-forum@lists.hdfgroup.org
Subject: Re: [Hdf-forum] high level API for parallel version of HDF5 library

Hi Barbara, Hi All,

Thank you for your answer. That's clear now about H5TBmake_table() call, but...

H5Gcreate() in not a high level API, isn't it?

So why I cannot use it in parallel processes?

Maybe I'm just doing something wrong, so could you please provide me a short example how to create a set of groups (each one is the process

number) running 4 parallel MPI processes? You can limit the example code to the sequence of HDF5 calls only...

My current code works fine for just one process, but when I try it for 2 (or more) parallel processes the result file is corrupted:

plist_id = H5Pcreate(H5P_FILE_ACCESS);

H5Pset_fapl_mpio(plist_id, comm, info);

H5Pset_all_coll_metadata_ops( plist_id, true ); file_id = H5Fcreate(H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, plist_id); H5Pclose(plist_id); hid_t gr_id = H5Gcreate(file_id, std::to_string(procid).c_str(), H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT); H5Gclose(gr_id); H5Fclose(file_id);

Best regards,

Rafal

W dniu 2017-09-25 o 22:20, Barbara Jones pisze:

> Hi Rafal,

>

> No, the HDF5 High Level APIs are not supported in the parallel version of HDF5.

>

> -Barbara

> help@hdfgroup.org <mailto:help@hdfgroup.org>

>

> -----Original Message-----

> From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Rafal Lichwala

> Sent: Monday, September 18, 2017 8:53 AM

> To: hdf-forum@lists.hdfgroup.org <mailto:hdf-forum@lists.hdfgroup.org>

> Subject: [Hdf-forum] high level API for parallel version of HDF5 library

>

> Hi,

>

> Can I use high level API function calls (H5TBmake_table(...)) in parallel version of the HDF5 library?

> There are no property list parameters for that function calls...

>

> Regards,

> Rafal

>

>

> _______________________________________________

> Hdf-forum is for HDF software users discussion.

> Hdf-forum@lists.hdfgroup.org <mailto:Hdf-forum@lists.hdfgroup.org>

> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

> Twitter: https://twitter.com/hdf5

>

> _______________________________________________

> Hdf-forum is for HDF software users discussion.

> Hdf-forum@lists.hdfgroup.org <mailto:Hdf-forum@lists.hdfgroup.org>

> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

> Twitter: https://twitter.com/hdf5

>

_______________________________________________

Hdf-forum is for HDF software users discussion.

Hdf-forum@lists.hdfgroup.org <mailto:Hdf-forum@lists.hdfgroup.org>

http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

--

***
Rafał Lichwała
Poznańskie Centrum Superkomputerowo-Sieciowe
ul. Jana Pawła II nr 10
61-139 Poznań
e-mail: syriusz@man.poznan.pl
***

Hi Rafal,

It looks like there may be a small typo in the name of the API that you are specifying ("mpio" vs "mpi").
I found H5Fset_mpi_atomicity (but not H5Fset_mpio_atomicity). Is that what you are looking for?

-Barbara
help@hdfgroup.org

···

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Rafal Lichwala
Sent: Wednesday, September 27, 2017 7:57 AM
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] H5Fset_mpio_atomicity missing in 1.10 release

Hi,

The library documentation says that H5Fset_mpio_atomicity() is available since 1.8.9 release, but I cannot find this in the latest 1.10 (built with parallel mode).
How to solve this?

Regards,
Rafal

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5