Reference Manual in Doxygen


#1

Please take a look at this page and let us know what you think (in this thread)! We are finishing up H5P and tidying up a few other things.

Kudos to the Eigen project where we found a lot of inspiration.

Don’t get too excited, the survey link isn’t live (yet). :sunglasses:

Please leave any comments in this thread, or create a new more specific thread.

We would also like to know which of the technical documents you’d like to see in this format. For example, take a look at API Compatibility Macros.

And you can get involved! Keep an eye on the doxygen folder in the doxygen2 branch, eventually develop, or create a PR.

G.


#2

It looks good for a start! Of course, lots of work to get all the info there and up to date.

The left panel contains some references to external links, such as the glossary or user guide. I’d think such should be somehow indicated that clicking those leaves the doxygen page.

What would make sense is to also have various examples in this doxygen documentation, because those examples would then directly link to the respective function documentation, and the functions itself can also list the examples where they are used.

Just as comment to the prominent paragraph on " C++ Developers using HDF5 C-API functions beware:" This text seems to refer to the usage of exceptions within C++ code and particularly in callback functions. That seems implicitly clear, but I guess that this should be stated more prominently since usage of exceptions is not that all common (but of course it is in the HDF5 C++ API, even overly common there). It should be rather clear to C++ developers that C function callbacks don’t support the C++ exception stack. Another issue may be concerning C++20 coroutines… I am not sure how that would perform if a callback function yields execution to another routine which also does HDF5 calls, as this is not exactly same as multithreading. Something for future investigations, just to mention it at this occasion.


#3

Good point. Of course, we’d hope for those redirects to go away completely at some point. But then, what’s our record on the last transition? :disappointed:

Yes, there are a few mickey mouse examples here and there, but there should also be a more extensive collection, e.g., as part of the cookbook.

Again, excellent point. Let’s work on a better formulation! (At least we are at a point where we have a Doxygen alias for it and changing it in one place will fix it everywhere :smiley:)

Ditto. (I suspect it’ll deadlock because the next HDF5 call won’t be able to acquire the library lock. To be investigated…)

G.


#4

How about a formulation like this RE: C/C++?

Several functions in this C-API take function pointers or callbacks as arguments. Examples include H5Pset_elink_cb(), H5Pset_type_conv_cb(), H5Tconvert(), and H5Ewalk2(). When used in C++ code, the corresponding callbacks may use the full list of C++ types and functions provided that the callbacks return normally. This behavior is necessary for the HDF5 library to manage its resources and maintain a consistent state. In particular, exceptions raised and not handled inside the callback are not supported as it might leave the HDF5 library in an inconsistent state. Similarly, C++20 coroutines cannot be used as callbacks, since they do not support plain return statements.


#5

Suggestion:

Several functions in this C-API take function pointers or callbacks as arguments. Examples include H5Pset_elink_cb(), H5Pset_type_conv_cb(), H5Tconvert(), and H5Ewalk2(). Application code must ensure that those callback functions return normally such to allow the HDF5 to manage its resources and maintain a consistent state. For instance, those functions must not use the C setjmp/longjmp mechanism to leave those callback functions. Within the context of C++, any exceptions thrown within the callback function must be caught, such as with a catch(…) statement. Any exception state can be placed within the provided user data function call arguments, and may be thrown again once the calling function has returned. Exceptions raised and not handled inside the callback are not supported as it might leave the HDF5 library in an inconsistent state. Similarly, using C++20 coroutines cannot be used as callbacks, since they do not support plain return statements. If a callback function yields execution to another C++20 coroutine calling HDF5 functions as well, this may lead to undefined behavior.


#6

Thanks. Any suggestions for Technical Notes & Specs. we should migrate first. I put up a few examples:

Let me know!
G.


#7

Both sections look good and clean. It’s an overwhelming amount of information there, but I don’t know how to make it easier digestible. Maybe the specifications could be split up in multiple pages such that not all is one page; but then, for those who want to delve into the details, it may be good to have all on one page.


#8

There are a few remaining issues, but the current version is ready for use by early adopters. Unfortunately, getting a cup of coffee while File Access Properties is loading is no longer an option. The underlying GitHub branch is doxygen2.

The next three milestones will be:

  1. Merge doxygen2 to develop
  2. Put some real examples with each API.
  3. Have a RM release w/ 1.12.1

1. shouldn’t be hard, because only comments in public header files are affected. 2. is an interesting challenge. Ideally, we’d write a single program that in due course calls each function, and then just quote the relevant lines as an example with the corresponding API. The underlying example(s) will be tested as part of our daily test suite! For starters, we’ll aim at a dozen non-trivial examples. In the end, we might have to settle for two dozen, but that’d be better than “Example: Coming soon!” all over the place. You don’t believe it, and let’s not pretend we do. If you have a favorite example or a particular call that mystifies you, chime in! 3. is just sweat and tears.

G.


#9

UPDATE: Yesterday, we merged the doxygen2 branch to develop. A preview is available online. Pull requests welcome!

We’re adding typical life cycle examples to the beginning of each RM module. Have a look at H5A and let us know what you think!

Step 2. will be ongoing for a while and. Step 3 has an end-of-May-ish deadline.

G.


#10

If the documentation is automatically extracted from the source code how do you handle documentation updates after the code has been tagged and released?


#11

I think the philosophy is that documentation that’s part of the source code goes the same way as any other source code changes. How difficult this is in practice remains to be seen.

HDF5 1.12.1 is pretty close to develop, so no biggy there. 1.10.x and 1.8.x will be a lot harder because of our idiosyncratic way of API versioning. The biggest problem is the remanents of senseless repetition in the current documentation. The number one rule I’ve learned in this process is DRY (Don’t repeat yourself!), and if you do, use Doxygen templates so that things need to be changed in only one place.

On the practical side, I believe there will be release-specific variants (1.12.x, 1.10.x, 1.8.x) hosted online, including a develop preview.

What did you have in mind?
G.


#12

The concept of locking the documentation to a source code commit is flawed–although how serious this flaw is will likely vary. The purpose of documentation is to clearly describe our best understanding of how something works. What gets committed with source code is a description of how an author (often the programmer) thinks something works. When those two diverge there should, ideally, be a mechanism in the workflow to present users with the best (i.e. corrected) description. Updating the source code doesn’t work because those updates will be linked to future releases when they need to apply to past releases.

A similar problem applies to certain metadata in data products.


#13

Documentation typically includes descriptions of concepts and tasks and reference material.
As far as the reference manual is concerned, to

is not the purpose. It’s more modest, namely to document APIs w/ their parameters, return values, pre- and post-conditions, etc.

There are other parts of non-API documentation such as Metadata Caching in HDF5, which are more loosely affiliated with the code (such as concepts and tasks). I take your point that such (version-)cross-cutting documentation should be managed differently, perhaps in a submodule. OK?

G.


#14

Regardless of how you constrain the scope of a particular component of the documentation the goal is is always to have prose which is clear and correct to the best of your understanding. Otherwise you are saying that mistakes and omissions in the reference material corresponding to a particular release should go uncorrected.

A typical example is that after release it becomes clear that the reference material does not or doesn’t correctly describe the return values in a particular corner case that no one considered before. The reference documentation corresponding to all affected releases (past and future) should be updated to reflect that improved understanding.

Reference material should absolutely describe our best understanding of how something works. That doesn’t mean the reference material is going to contain a comprehensive description but it needs to be clear; complete within the scope of its purpose; and, of course, it shouldn’t be wrong. Not all of these things are achieved when a piece of source is tagged for release.


#15

Dan,

Thank you for your comments.

Please note that RM in the develop branch is a superset of RM for all previous releases and current maintenance branches. We plan to provide snapshots of the code and post updated documentation with every snapshot from the develop branch. Therefore, HDF5 users will have access to the updated documentation on a regular basis.

We also plan to merge documentation to the 1_12 maintenance branch and it will be available in the HDF5 1.12.1 release planned for May 31, 2021. At this point we don’t have time estimate for moving documentation to the 1_10 and 1_8 maintenance branches and corresponding releases. This will not be a straight forward task since code diverged and merges will need some work. We think our efforts should focus on making the rest of our documentation (e.g., User’s Guide, Tech Notes, File Format, etc.) searchable and maintainable.

We hope that by moving RM to Doxygen it would be easier for the community members to contribute and make HDF5 documentation better. We welcome and appreciate all contributions from spelling and grammar fixes to better explanation of the functions, code examples, etc.

Thank you!

Elena


#16

I understand the concern raised by Dan is a bug in the documentation for a release e.g. 1.10.6 when the latest release is already at 1.10.12 (e.g.) and he wants the documentation of an older release to be fixed. I guess that is the same as with bugs in the source code just as well. If each release is a branch, then bugs in the documentation can just be fixed in this branch, affecting only the documentation part there, but not the actual source. Whether it is worth to do such fixing in previous releases rather than just the most recent one, is another question… more an issue of how the versioning maintenance is done rather than an issue of having documentation itself extracted from comments at the source code. What would be wrong about considering a bug in the documentation same as a bug in the source code?


#17

This does indeed seem like what @daniel.kahn is saying. OT1H, it seems reasonable that if an organization claims to maintain previous releases (e.g. fix bugs in previous releases) then that should extend to documentation associated with those releases. “bugs” in the documentation for a release should be fixed. OTOH, this does increase, potentially significantly, the effort involved in maintaining previous releases. Worse, with doc content embedded in source code, updates to previously released code that change only docs and not code (e.g. functionality) could create a lot of extra work for consumers who want/need to keep up to date with latest previous releases. I can’t imagine consumers would be happy to discover that in many cases the effort they make in doing this results in nothing in the updated binaries changing. An appropriately chosen version numbering scheme might help with this (e.g. patch release digits divisible by 3 are doc-only updates) but it would have to be a departure from current practice.

I could see a technology solution to this too where THG has tooling to carefully host the docs such that what is seen is a sort of “slice” through all versions of the docs relevant to the user. For logical blocks (API function + its documentation) docs-only bug fixes are immediately reflected through all releases of the code for the associated function.

A big challenge I see with @daniel.kahn suggestion/request is what it means for releases prior to introduction of docs-like-code workflow for HDF5 development? If THG adopted the practice of maintianing previous release docs, it might make sense to do so only for releases occurring after the docs-like-code worklow was adopted.


#18

That’s what it is. It’s just a different way of looking at (parts of) the code.

Yes.

I think we’ll find a middle ground between the following:

  1. HDF5 1.x.y releases are maintenance releases. Typically, we don’t fix code issues in 1.8.3. We fix them (as of now) in 1.8.23. I think the same applies to documentation. The release notes should show changes/improvements/fixes of documentation the same way they document other changes.
  2. A lot of this discussion is RM-specific. We need to get it right, but it covers only one part of the documentation. There are other important kinds, and different ways to slice and dice it.

I don’t see any insurmountable technical problems to get all that into Doxygen/GitHub (and maintain it). As others have found, the main challenges are cultural.

G.