Semantic versioning

I would like to suggest to the HDF Group that semantic versioning
should be used for the hdf5 library, starting with the upcoming
release. I have been thinking for years that both the hdf5 project and
the greater hdf5 community would benefit from such a change, but just
this morning discovered a solid and formal definition of semantic
versioning at http://semver.org/ .

Best regards,
Darren

1 Like

Hi Darren (and all),

Thank you for the pointer to the "Semantic Versioning" document. This is a very good discussion topic. I will try to summarize The HDF Group position on it.

The HDF Group uses versioning described here http://www.hdfgroup.org/HDF5/doc/TechNotes/Version.html.

The reason why it is different from the standard you pointed to is that HDF deals not only with APIs, but also with the file format backward/forward compatibility issues.

During the past 15 years we learned that due to our users and applications' base/requirements we cannot have incompatible APIs and file format changes thus making Semantic Versioning "MAJOR" version meaningless. This will not be HDF5 anymore. We also learned that we need to extend HDF5 file format to keep pace with the demand for performance and innovation in HDF5.

As a result, we were thinking about dropping the "major version" number completely since we are not going to create HDF6 :slight_smile: I.e., instead of having 1.10.0 release, we will have 10.0 release. Our commitment is as follows:

In the new versioning scheme the second number will indicate bug fixes and new features (new APIs) that do not require file format changes (except the bug fixes in the file format itself subject to careful consideration). The change in the first number will indicate that new extensions to the file format are introduced along with new APIs (e.g., new HDF5 capabilities such as new chunk indexing schemas to allow fast append).

In terms of the file format for both current and considered versioning, the new library should always read files created by the the previous versions of HDF5. The old libraries should always read HDF5 files created by the new library if no new features (i.e., features unknown to the old libraries) that require file format changes are used to create the files.

In terms of APIs, as much as we would love to obsolete many APIs, we really cannot do it because many of our users cannot move their applications to the new APIs. We provide configure flags to compile HDF5 without obsolete APIs for those users who are able to switch to the latest-greatest APIs; we also provide flags to use particular versions of APIs.

One can argue that we shouldn't be making any changes to the HDF5 file format. This is a valid argument and then semantic versioning will be more applicable, but the question is how can we innovate HDF5 without enhancing the file format?

Is there any problem with the versioning we are using now or the new one outlined here (except that it is not compliant with the semantic versioning standard)? (HDF5 shared library versioning standard is coming soon; the draft is under internal revision in case you were thinking along those lines too).

All,

Our group will really appreciate your thoughts on this issue.

Thank you!

Elena

ยทยทยท

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Nov 6, 2013, at 7:55 AM, Darren Dale <dsdale24@gmail.com> wrote:

I would like to suggest to the HDF Group that semantic versioning
should be used for the hdf5 library, starting with the upcoming
release. I have been thinking for years that both the hdf5 project and
the greater hdf5 community would benefit from such a change, but just
this morning discovered a solid and formal definition of semantic
versioning at http://semver.org/ .

Best regards,
Darren

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Hi Elena,

Thank you for the engaging response. Comments inline...

Hi Darren (and all),

Thank you for the pointer to the "Semantic Versioning" document. This is a very good discussion topic. I will try to summarize The HDF Group position on it.

The HDF Group uses versioning described here http://www.hdfgroup.org/HDF5/doc/TechNotes/Version.html.

The reason why it is different from the standard you pointed to is that HDF deals not only with APIs, but also with the file format backward/forward compatibility issues.

During the past 15 years we learned that due to our users and applications' base/requirements we cannot have incompatible APIs and file format changes thus making Semantic Versioning "MAJOR" version meaningless. This will not be HDF5 anymore. We also learned that we need to extend HDF5 file format to keep pace with the demand for performance and innovation in HDF5.

As a result, we were thinking about dropping the "major version" number completely since we are not going to create HDF6 :slight_smile: I.e., instead of having 1.10.0 release, we will have 10.0 release. Our commitment is as follows:

In the new versioning scheme the second number will indicate bug fixes and new features (new APIs) that do not require file format changes (except the bug fixes in the file format itself subject to careful consideration). The change in the first number will indicate that new extensions to the file format are introduced along with new APIs (e.g., new HDF5 capabilities such as new chunk indexing schemas to allow fast append).

I think an x.y.z versioning scheme would still be advisable, the
semantics would just be slightly different from what is outlined at
semver.org:

Extensions to the file format increments x, resets y and z. New API
features would naturally be expected with a major version bump. (API
changes would warrant an increment of x.)

New API features without changes to file format increments y, resets z.

Debugging release increments z.

This way, you quickly communicate to users, packagers, etc. that no
new features (and associated potential bugs) were introduced in
10.0.1, it was just a bug fix. 10.1.0 provided new API features, but
did not alter the format itself.

In terms of the file format for both current and considered versioning, the new library should always read files created by the the previous versions of HDF5. The old libraries should always read HDF5 files created by the new library if no new features (i.e., features unknown to the old libraries) that require file format changes are used to create the files.

In terms of APIs, as much as we would love to obsolete many APIs, we really cannot do it because many of our users cannot move their applications to the new APIs. We provide configure flags to compile HDF5 without obsolete APIs for those users who are able to switch to the latest-greatest APIs; we also provide flags to use particular versions of APIs.

The HDF Group provides older versions of the library on the website.
If users require an old version of the API, can't they just use the
appropriate version of the library? Perhaps the community would not
object to strongly if it improved documentation and eased the burden
of maintenance, allowing you to invest more resources in new features.
(In which case, APIs could be marked as deprecated in a minor release,
but not removed until a major release.)

One can argue that we shouldn't be making any changes to the HDF5 file format. This is a valid argument and then semantic versioning will be more applicable, but the question is how can we innovate HDF5 without enhancing the file format?

I wouldn't make such an argument, as long as newer libraries can read
older versions of the format. Does hdf5 have a utility that inspects a
file and reports what nodes are inaccessible from version x of the
library?

Darren

ยทยทยท

On Sun, Nov 10, 2013 at 7:18 PM, Elena Pourmal <epourmal@hdfgroup.org> wrote: