Announcing development of the Enterprise Support Edition of HDF5


#1

Dear Friends, Community Members, and Colleagues,

Today, I am writing to all of you to announce the launch of both a Community Edition (CE) and a subscription-based, Enterprise Support Edition (ESE) for HDF5. This model is similar to Red Hat, Lustre, and other open source projects. We are moving down this pathway to address the challenges that continuously face us in achieving sustainability and increasing community involvement.

Since I joined The HDF Group in April 2016, I made it my mission as the CEO to pursue a strategy that not only enables us to survive but thrive as an organization. I truly believe this new model will take us into the future and keep HDF technologies relevant for another 30 years.

As this is a major announcement and change to the community, I want to personally explain why we’re doing this and how this might impact your use of HDF5:

  • We plan to release CE and ESE during the summer of 2018.
  • Core features of CE and ESE are identical. We are NOT taking anything away.
  • Maintenance (i.e. bug fixes and security patches from The HDF Group) is deployed to ESE first.
  • All work from ESE will be released to CE periodically.
  • Add-on modules that provide additional functionality outside the core library will only be available to ESE subscribers.
  • We will setup a Technical Advisory Committee to influence the future directions of HDF5 CE development.

We believe this new approach will ensure stable code while stimulating community-driven innovation and direct participation. The ESE subscriptions will provide The HDF Group the sustained funding that will allow us to better support our mission and the community that depends on HDF technologies every day.

You can learn more about Community Edition and Enterprise Support Edition on our website. We welcome you to join the conversation here, on our forum. Over the coming weeks, we will be sharing more details of the coming launch to make sure we involve our community each step of the way.

Sincerely,

David Pearah, CEO
The HDF Group


#2

Dear David Pearah,

I have read your statement below and the information page on the web site of the
HDF group. I have no comment on the strategy that is necessary to maitain the
activities of the HDF group specifically and my opinion is the one of an
academic user of the hdf5 library.

  1. I will probably never have the funds to buy a subscription. I do theoretical
    physics and there are few users of HDF5 over here.

  2. I have enjoyed the HDF5 library and one of the selling points, in my opinion,
    was the availability of a single central piece of software, the C HDF5 library,
    on which all users of HDF5 relied.

  3. The lack of support for older revisions of HDF5 in the community version will
    be the most impactful change for me. I sometimes release numerical data with
    HDF5 files with the confidence that they will be readable by fellow scientists
    in the years to come at no cost. This guarantee being taken away will very
    likely play a role in my future decisions concerning scientific file formats.

Honestly, I find it a bit weird to be the first to react (on this forum) to such
an annoucement in the HDF5 community. My guess is that many will share the
feeling.

Best regards,

Pierre de Buyl


#3

Dear Pierre,

I am coming from the other side - we are using HDF in a commercial project and so are more likely to subscribe (if it is affordable - where is the price?). However I too find this announcement rather depressing:

  1. Community release drops to every 1-2 years. This relegates HDF to the status of an unsupported legacy library.

  2. Community contribution to community release strongly encouraged. How will that happen with a 2 year release cycle?

  3. HDF5 Support: Current major version only. Not sure what this means? That the older versions will not be available for download, or that they will not be supported? Unclear how they are supported today?

I too understand that the Group needs to be self sufficient, but this is a disturbing change.

rgds

Ewan


#4

I agree that there needs to be some more explanation. My opinions are the following:

Releases: 1-2 years. Every year would be good. 2 years is just way too long since toolkits are now actually accelerating their updates (Clang. GCC. MSVC. Fortran) which seem to be on an every year cadence. My guess is that The HDF Group wants to focus on the side of the business that will bring in revenue which is the enterprise side. As a business these are tough decisions but at some point your business needs to bring in revenue to keep your engineers employed. I think once a year updates would be a good compromise.

HDF5 Support: My interpretation is that ONLY the HDF 1.10.x series will be updated going forward. If you are using 1.8.x (Like I am) and you have compile issues then it will be up to the community to solve those issues.

File Compatibility: Isn’t there settings that you can use that will allow the latest release (1.10.x) to generate HDF5 files that are compatible with the 1.8.x releases? If that is true then there is some backward compatibility built in and we, as the community, probably don’t have much to worry about. I too have an open-source project that is built on top of HDF5 as our primary file format. I would hate to lose the ability going forward for newer versions of my software to read older files.

Just my thoughts.
Mike Jackson
BlueQuartz Software & DREAM.3D


#5

Thanks for the feedback, Pierre.

I will probably never have the funds to buy a subscription. I do theoretical physics and there
are few users of HDF5 over here.

As always, I don’t want to discourage anyone from using HDF5 Community Edition, which will always be open source and free. If however you want to use Enterprise Support Edition – for Help Desk support, regular bug fixes, cloud connectors, etc. – we’ll be posting pricing shortly. We’ve worked hard to make the pricing simple and scalable for individual users to large organizations. Stay tuned!

The lack of support for older revisions of HDF5 in the community version will
be the most impactful change for me. I sometimes release numerical data with
HDF5 files with the confidence that they will be readable by fellow scientists
in the years to come at no cost. This guarantee being taken away will very
likely play a role in my future decisions concerning scientific file formats.

I think we have a real opportunity to clarify what we mean by support (we’ll improve the wording on our website):

  • FORMAT: HDF5 is always backward compatible, meaning that every new version of HDF5 will always read the files created by the previous versions of the library. So your data will always be readable for years to come.
  • LIBRARY: What we meant to say is that older versions of the HDF5 library itself (e.g. 1.8, 1.6, etc.) won’t be supported by the HDF Group for the community… for folks that need support, maintenance, and bug fixes to the older libraries, Enterprise Support Edition is an option.

Hope this helps!

  • Dave

#6

Hey, Ewan. Thanks for the comments. My feedback below:

we are using HDF in a commercial project and so are more likely to subscribe
(if it is affordable - where is the price?).

Pricing is coming shortly… stay tuned! We’ve worked very hard on a model that scales from a single individual to small businesses selling software to large organizations with super computers. :slight_smile:

Community contribution to community release strongly encouraged. How will
that happen with a 2 year release cycle?

The development branch remains open, and we’ll always continue to accept code contributions… which incidentally takes a lot of work because most of the submissions require substantial rework and regression testing. The change is in the frequency of official, supported releases from the HDF Group.

HDF5 Support: Current major version only. Not sure what this means? That the
older versions will not be available for download, or that they will not be supported?
Unclear how they are supported today?

HDF 1.10.2 is the current major version and the one that is available to the community. Eventually, 1.10.2 will be replaced by its successors, at which time all the work that goes into making binaries, releases, bug fixing, maintenance, etc… will be reserved for Enterprise Support Edition clients.

Hope this helps!


#7

Mike,

Great to hear from you… particularly since you’re a fellow traveler navigating the balance between running a business and driving open source. Everyone should check out: http://www.bluequartz.net/

Releases: 1-2 years. Every year would be good. 2 years is just way too long since toolkits are
now actually accelerating their updates (Clang. GCC. MSVC. Fortran)

It’s definitely a balancing act, and we may in fact have opportunities to accelerate if an Enterprise Support Edition client is helping to fund an immediate public Community Edition release (which would be great!).

HDF5 Support: My interpretation is that ONLY the HDF 1.10.x series will be updated
going forward. If you are using 1.8.x (Like I am) and you have compile issues then it will
be up to the community to solve those issues.

Correct on both counts. If for whatever reason someone needs support / maintenance on either 1.10 or 1.8, that’s offered through Enterprise Support Edition.

File Compatibility: Isn’t there settings that you can use that will allow the latest release
(1.10.x) to generate HDF5 files that are compatible with the 1.8.x releases?
If that is true then there is some backward compatibility built in and we, as the
community, probably don’t have much to worry about.

100% correct: we work hard to ensure that ALL versions of HDF5 library are backward compatible regarding reading of files. That will never change.


#8

Security patches go to ESE first and the to CE “periodically”? What does that mean exactly? Sounds horrible.

Sean


#9

I’m concerned about “Python support” being listed as an Enterprise Support Edition feature. Does this refer to technical support from the HDF group, or removing existing Python interface support from the community edition?


#10

Hey, Stephan… creator of the awesome xarray library!: http://xarray.pydata.org/en/stable/

I’m concerned about “Python support” being listed as an Enterprise Support Edition feature.
Does this refer to 1. technical support from the HDF group, or 2. removing existing Python interface
support from the community edition?

Option #1: all this means is that the Enterprise Support Edition subscription includes end-user assistance not just for HDF5 but also related key tools in the Python stack: h5py, PyTables, pandas, and xarray.

It’s totally up to each developer whether to stick with HDF5 Community Edition or move over to Enterprise Support Edition, but I suspect that for open source libraries – like xarray – folks would likely stick with CE unless there is a compelling reason otherwise.


#11

Hey, Sean.

Security patches go to ESE first and the to CE “periodically”? What does that
mean exactly?

It’s actually 2 things:

  1. Between each major release to Community Edition, there are interim quarterly + hotfix releases focused on maintenance, bugs, and security.
  2. The next major release to Community Edition then rolls up ALL the Enterprise Support Edition code (not just security patches).

So yes, the net effect is that security patches first flow to ESE and then eventually to CE.

Why? Because it takes a ton of work: isolating, duplicating, identifying, prioritizing, fixing, patching, testing, building new release, etc. For those that care about this predictable pace of security-focused work, Enterprise Support Edition provides the desired solution. For those that don’t need that, Community Edition will always be there.


#12

Hi,
I understand that people are concerned. I presume, the HDF group also needs some money to continue development. A reasonable model is that HPC centers or whoever really needs tight support (because this means money to them) asks their vendor to provide the support, which in turn may purchase this from the HDF group.

In the best case, that does not take away any resources from the community version but instead increases the effort of the overall development accelerating it. In the past, it always has been best effort for them, like for most open source developers.

This won’t stop in the future, like it did never when they got awarded with a grant from a public body but that model does not scale any more because funding structure changed. A fraction of resources will in the future still be used for free support. If many benefit so much that they can pay even people that rarely use HDF benefit. In that sense, I’m not worried a bit as long as the code is open, and hope users ask from vendors for support if they really need it and vendors are clever enough to buy it if they can’t compete with the price. Alternatively, you can always become a developer or hire someone that works on your behalf on such open codes. The question is, will that be as cost efficient as outsourcing it to the HDF group?

That is my limited understanding of the rational process at least. David feel free to correct me here. I wish you luck to sustain the development. I also hope it can be well communicated that this will mean a chance for power users while it does not reduce development for the community - it will remain open source.

Regards

Julian


#13

Option #1: all this means is that the Enterprise Support Edition subscription includes end-user assistance not just for HDF5 but also related key tools in the Python stack: h5py, PyTables, pandas, and xarray.

OK, great :). My reading of the word “support” was a little ambiguous since there are two meanings of that term.

I hope that this new initiative helps the long-term sustainability of the HDF group, which will of course benefit all downstream users.

I do have concerns about an intentionally slower release cycle for the community edition. A slow release cycle is not very encouraging for outside contributors to open source software – why would I contribute a bug report or patch when it will take up to 1-2 years to make it into a release for non-enterprise users? It is also quite common (for software in general, not HDF5 in particular) that initial major releases with new features are unusable due to new bugs. Without follow-on bug fix releases, the community edition might not be usable at all.

Fortunately HDF5 is pretty stable at this point, but I do think the stick of no bug/security fixes will encourage the scientific community to switch to alternative file-formats – or maybe even consider forking HDF5. This pace encourages workarounds in downstream libraries that use HDF5 rather than true fixes, which is not great for the ecosystem. No new features from the HDF group would be fine, but not releasing bug/security fixes sends a message to the scientific community that you don’t care about the value they provide.

Most of the responses I’ve seen on Twitter are along the same lines:


Finally, about the work of making a release. Yes, I agree that maintaining software is a ton of work, and largely a thankless task. But in my experience, the pain of making a release is mostly proportional to the number of changes/bug-fixes that go into it. With appropriate automation (e.g., for the build process), the incremental work of issuing a release is pretty minimal. And if you’re not testing that build process on a frequent basis, you are going to have a very painful time when you discover everything in your tool-chain that has broken over the past 1-2 years all at once, rather than incrementally as it arises.


#14

Hi, Julian… thanks for the thoughtful comments.

A reasonable model is that HPC centers or whoever really needs tight support
(because this means money to them) asks their vendor to provide the support,
which in turn may purchase this from the HDF group.

I think that’s a great suggestion: HDF5 is standard software that ships on most HPC systems, largely because it’s considered “critical software” and just needs to run and run well (e.g. no one wants to buy an amazing laptop that can’t run a browser).

But the paradox is that while the HDF Group bears the full cost of making great software and supporting users when it doesn’t work, most HPC system integrators contribute nothing back to the HDF community: code, assisting users, investment, etc. In some cases, the HPC system integrator will actually charge the client for HDF support, even though they obviously can’t support or fix anything in the code.

In my opinion, HPC buyers and users can help by driving the following:

  • Does my HPC system include support for critical software like HDF?
  • If yes, then does my vendor actually work with the HDF Group or is this an empty promise?

This won’t stop in the future, like it did never when they got awarded with a grant
from a public body but that model does not scale any more because funding structure changed.

In general, we don’t get any grants. What we do have are wonderful and amazing clients who pay us for our consulting expertise in 3 key software areas: 1. HPC, 2. Big Data, 3. Metadata. In rare cases, our consulting work actually includes work inside the library, often to extend functionality. But typically, it’s to do work outside the library, which means that all the hard work inside the library has to be funded by us. We really are passionate about the work we do with our clients to support their mission: it’s just challenging to align to our mission of maintaining and extending HDF5.

I wish you luck to sustain the development. I also hope it can be well communicated
that this will mean a chance for power users while it does not reduce development
for the community - it will remain open source.

HDF5 Community Edition will remain free and open source, and it is my sincere belief that folks like you in the community will help advocate for us to get on a sustainable path.

Thanks!
– Dave


#15

I’m also a bit worried about the split between ESE and CE.

So far the discussion was mostly about bug fixing, but I’m wondering new features will be made available. Probably they will first appear in ESE and only (much) later in CE.

I assume that some features added to the ESE might cause changes in the file format. If such features will only be added later to the CE version, such HDF5 files will not be readable by the CE version for some time. This would be very bad in cases where an organisation using EDE exports HDF5 files to clients using CE.

In particular I’m thinking of the SWMR feature which is being worked on and has quite some impact on the file format. I assume that the full SWMR feature will only appear in ESE and only later in CE. Most likely in the future more such cases will occur.

I’m also wondering about the cooperation between HDF5 and ADIOS. I understood it will be made possible that the system can read each others file format. Will that also only appear in ESE and later in CE.

Regards,

Ger van Diepen


#16

Hello!

Thank you for bringing up this issue!

The HDF Group is committed to provide full file format compatibility between CE and ESE.

The features that require file format extensions will be released in CE first. All issues related to the file format compatibilities triggered by the addition of the new features will be addressed in CE in a timely manner to assure full compatibility.

Thank you!

Elena


#17

Elena,

Thank you for clarifying that the HDF5 file format will always be compatible between ESE and CE.

However, new features will introduce new bugs, sometimes making a new feature unworkable. Since bug fixes will appear much later in CE, there is still quite a chance that a new feature (with changes in the file format) can be used in ESE, but not in CE. How will the HDF5 group address such issues?

Regards,

Ger


#18

Ger,

Thank you for good questions!

Ger_van_Diepen

May 14
Elena,

Thank you for clarifying that the HDF5 file format will always be compatible between ESE and CE.

However, new features will introduce new bugs, sometimes making a new feature unworkable. Since bug fixes will appear much later in CE, there is still quite a chance that a new feature (with changes in the file format) can be used in ESE, but not in CE. How will the HDF5 group address such issues?

Before any new feature is released there is an extended period of time when it is tested internally and is available for testing by the community. Our group also publishes design documents (RFC - Requests for Comments) and user documentation during development cycle to assist with the feature development, testing and adoption.

During 20 years of HDF5 existence we had 5 major releases (HDF5 1.2.0, 14.0, 16.0, 1.8.0. 1.10.0) and we rarely had issues with the new features per se. Software always have bugs/issues. During all these years we relied on the community to help us with finding the issues in the source and in the documentation. We hope that community will be even more involved now to make every CE release as stable as possible. We are also committed to resolve any critical bugs such as file format issues, data corruption issues, and backward/forward compatibility issues since the mission of our company is to provide free and open access to data stored in HDF5.

If any critical issue emerges The HDF Group developers will work with the community on the CE release that addresses the issue. This is our commitment.

All,

Please don’t hesitate to ask questions about CE and ESE. This model is new to us and we definitely didn’t think through all “use cases”. Having open and frank discussion and setting up expectations will help with the smooth transition.

Thanks again for your support!

Elena


#19

As promised in our initial announcement to the community, The HDF Group will continue to provide more details on the coming launch of HDF5 Enterprise Support Edition (ESE) and HDF5 Community Edition (CE). Please join us this Friday, May 18th, from 1:00 PM - 2:00 PM. CT as David Pearah (@david.pearah), CEO of The HDF Group, presents an overview of our new, open source CE and ESE models.

Register now for the HDF5 Enterprise Support Edition (ESE) and HDF5 Community Edition (CE) Webinar on May 18, 2018, 1:00 PM CT at:

https://attendee.gotowebinar.com/register/7802714344088511233

After registering, you will receive a confirmation email containing information about joining the webinar.

If you are unable to attend or would like to join the discussion, please feel free to continue the conversation here on the forum. We appreciate your participation in these important discussions about the future of HDF5.

The HDF Group


#20

The HDF Group CEO Dave Pearah presented the new HDF5 Enterprise Support Edition and HDF5 Community Edition in a community webinar on May 18, 2018—that video is attached below. In the interest of efficiency, we clipped the live Q&A session from the webinar and have instead compiled the questions we’ve received on Twitter, on the forum, and during the webinar into one continually evolving FAQ on our site: https://www.hdfgroup.org/faq-enterprise-support-edition/

We would love to have your contribution to the discussion. Please feel free to chime in here, on the forum, it’s the best avenue for the community and The HDF Group to respond and keep track of the discussion.