HDF5 C++ Webinar Followup - recording and Q&A transcript

Hi all,

This post contains information on the HDF5 C++ Webinar, which happened on January 24th, 2019. This is also an introduction to this section of our forum, the C++ User’s Group. Feel free to post in this category whether related to the webinar content, or HDF5 C++ in general.

As a reminder, these were the presentations:

(You can read more about the presentations on our blog.)

You can catch the recording of the webinar at https://youtu.be/7A5dPL7zrj0

A transcript of the question and answer session has been posted on our blog at https://www.hdfgroup.org/2019/01/hdf5-c-webinar-followup/

We hope you appreciate these presentations from our community members and welcome your participation in our community discussions. Our team is eager to learn and work directly with you on your HDF5 initiatives and programs. Let us know how we can help by contacting us at info@hdfgroup.org.

Again, thank you to our community member presenters, and to all who could join us.

1 Like

Good presentation! Hot topic with me about the c++ wrappers having created my own internal version , closer to h5cpp"wrapper"; I’m sorry I missed this one as I fell ill at the time.

I have some responses to the questions and meeting content.

  1. Biggest issue - H5CPP and h5cpp cannot have the same name. Draw lots, anything, just one of the projects has to rename, sooner the better. I already used h5cpp “wrapper” and highfive and I didn’t even know H5CPP was a different thing. Now I have to explain the situation to others and HDF5 is the victim in terms of adoption and maturity perceptions, on top of the injury to engineering projects the confusion causes.

  2. Make the position of the internal HDF5 C++98 wrapper api deprecated and in maintenance mode for commercial customers. Put it in the documentation. Redirect to the other libraries. Elena basically said all these things but haven’t made the position absolutely clear or something people visiting the site will pick up on.

  3. I do not see the reason for public Ntuple with both H5CPP/h5cpp wrapper at this point. I understand how and why it was created though. I’m an h5py / pandas user too and I can do row or column major (struct of arrays or arrays or structs) with the h5cpp styled libraries, depending on where I want one or the other. Not a criticism!

  4. I need more time to evaluate H5CPP from Steven Varga, I’ve already been down my own road of the approach it has taken with llvm /clang https://github.com/nevion/metapod -and librarys more in the vein of h5cpp “Wrapper”, however I didn’t find much for how to build compound types manually - I want both as first class citizens. Probably has it but the front facing documentation page got in the way and just pointed to h5cpp at every opportunity. Out of the box support for Eigen et al matrix flavours is definitely noted but I think the most noteworthy thing - that again, I need to audit - is the new locally implemented “packet table”. I’ve got lots of history with the packet table as it exists already (search ML…)

  5. Between h5cpp wrapper and H5CPP , I think you both did great libraries. But you both don’t do different things. You fill the exact same gaps as far as I can tell. Consolidate again - the choice will only fragment and cause harm in the long term. HDF5 group, please make a recommendation for a c++ library, even if softly through blog posts or somesuch. H5py is the most fully defined “hdf5” library in python, and even that is somewhat more murkey than helpful (I don’t believe it got a recommendation at any point), but I’m not sure if without a recommendation things will become clear to people in C++ since the crowd drawn to C++ hdf5 has been considerably less than python.

  6. Thread safety, for independent threads - either for performance reasons - reading from multiple locations for higher IO (yes raid has overlap with this, but it doesn’t tap out NAS, or JBOD using 1 file at a time, and sometimes you want to amortize latency, too) - support for independent threads not blocking eachother though is the real gift that keeps giving. Modern programs have threads going alot of work, not just computations, openmp programs even. This isn’t so much on the side of the wrappers though as this is an internal limitation, more than anything else, of hdf5. With “unified memory” /svm I see no reason to involve hdf5 anything with the idea of GPU memory directly, I see no complications other than care for performance with that, but I see lots of reasons to support thread concurrency.

1 Like

Thank you for the questions!

  1. Please note that h5cpp is a neutral, descriptive name discovered and used independently in two different part of the world. We did have a friendly discussion whether to decide this pressing issue in arm wrestling or beer drinking. For now we’re both focusing on to deliver a better user experience.

  2. Great question to the HDFGroup, and I am curious of their response.

  3. Thank you for the feedback. Tuples / hypercube of POD structs are indeed not the linear algebra way to store data, but is a valid and native representation for events from various fields: particle collider, financial market, RealTimBidding, Sensor network on oil rigs, etc. Preserving structure is not unusual idea in these fields; If your use case is different please refer to data primitives supported by major linear algebra systems, or raw memory pointers.

  4. When H5CPP was presented in Chicago C++ usergroup meetings the LLVM compiler has been criticized, as a response a linux binary packages are provided so users can try and provide feedback to compiler assisted ‘introspection’. Did you prefer manual work please read the examples generated.h files. You need to specialize h5::register_struct<your_type>(){ } then register it with the macro: H5CPP_REGISTER_STRUCT. Here is an example:

namespace h5 {
   template<> hid_t inline register_struct<sn::example::Record>(){
        hid_t ct_00 = H5Tcreate(H5T_COMPOUND, sizeof (sn::example::Record));
        // see HDF5 CAPI COMPUND Dataype for details
		return ct_00; // <-- note the returned hid_t will be closed by H5CPP
   }
}
// don't forget to register the structure with H5CPP templates
H5CPP_REGISTER_STRUCT(sn::example::Record);

Thank you for your ‘audit’ let me know how it went. As of now the packet table runs on direct chunk IO, is near bare hardware speed, and as I further develop the library will have extensive filtering support with multithreading option. The packet table has been reworked from the original 2011 design to accommodate matrices, vectors, and element wise append. See ‘examples’ directory for details.

  1. Thank you for sharing your insight and advising a European and a Canadian independent group to work together. HDF5 users are diverse, as a Canadian I embrace diversity by showing the difference.
    The three projects are similar in a sense that all tying to the HDF5 CAPI to some level. However there are differences at first glance:
  • seamless POD struct through compiler assisted reflection vs something-else
  • easy to use pythonic experience based on template meta programming vs something-else
  • header only library, no other dependencies than HDF5 CAPI vs linked library
  • high performance IO based on chunk IO vs plain old HDF5 CAPI calls
  1. Threads, processes, MPI-ROMIO, RAID, … support some level of parallelism. I suggest reading on C++11 threading primitives and taking relevant courses to further expand on what we already know: To have access from different threads to the same IO device will only make it slower.
    There are cases when threads make a difference, filtering pipeline is a good example. Dedicate a thread as an IO server, and make request from other threads. This has been discussed in HDF Group SWMR approach. A much better one is use MPI-IO which will do the same with an added re-ordering of the blocks so it will do the right thing.
    HDF5 internally is very clean and fast, in my internal study it performs near bare hardware speed, on par with underlying filesystem. This gutted out version of HDF5 is/will be the base of H5CPP.
    Direct CUDA DMA from/to DISK is a hot topic, it doubles the available bandwidth which is significant gain in machine learning. One way to reorder priority is through grants/donations or a specific contract.

All in all If you find H5CPP interesting and need support to make it fit into your existing or new c++ project you can reach me at steven@vargaconsulting.ca

1 Like

RE: “HDF Group recommendations”

[I do work for The HDF Group, but I don’t speak for The HDF Group.]

Pieter Hintjens’ comment on the “Architecture of the 0MQ Community” comes
to mind where he explains that, “A lot of languages have multiple bindings
(…), written by different people over time or taking varying approaches.
We don’t regulate these in any way. There are no ‘official’ bindings.
You vote by using one or the other, contributing to it, or ignoring it.”

I think that ‘recommendation’ as in “HDF Group publicizes an interesting piece
of work in the community” or “We have seen someone doing something similar.
Why don’t you take a look at this?”, that’d be fair.
Beyond that it’s going out on a limb.

Many good (some might argue: most) things are happening in the HDF5 ecosystem
because The HDF Group is NOT involved.

1 Like

Now to perform some necromancy.

  1. Please think out of the box on the confusion this causes. I think this single issue is enough to undermine both of your efforts. I’ve seen the confusion this can cause people and how they react to this specific issue with both of your projects. Don’t shrug it off, you are doing both team’s work a disservice and causing a harm by confusion to your target user bases, including those once removed by increasing the likelihood they won’t understand or get working your libraries. Maybe look at it on the other side - what simpler action is there to increase the chances of project success or getting future contracts?

  2. I’m not sure how the HDF5 group is planning and handling here after these few months, I do hope they’re making progress on officially deprecating in documentation. The front facing of this page and dead link does not bode well for clarifying to c++ users.

  3. I don’t think you guys all carved out very strong spaces to justify stand alone projects, especially the 2 h5cpp libraries. You do so little differently and have such small and reasonable differences. Header only isn’t very strong in this day and age of cmake dynamic package downloads / JIT builds and meson/better build systems. What I’m saying is header only’s place in the ecosystem is not what it was 5 years ago, though it’s not bad… and really both are small enough to make this happen. The compiler support you target is your strongest differentiator but also really easy to make function on both… again - keep in mind I did this exact project, to the same purposes, already myself Your packet table is pretty interesting too, forgot about that one. Seems like it could be considered a side library to shared wrapping both h5cpp libraries do, though. Still not seeing the point to the tuples/hypercube over what your own library can offer, and packet tables… I used to do these same sorts of real time sensor logging as well. It had a stronger point in a world without your library or usage of packet tables. The real-timeish dataframe concept maps nicely to a struct of arrays where each array is one of your packet tables. Maybe some transparent batching of row-to-columnar memory order conversion is warranted there, but packet table probably matters faster, and I’d still have preference for something more closely to structure of (packet table) arrays

  4. Haven’t spent user time with H5CPP yet, full audit still outstanding. Didn’t know multithreading was in your equation for your direct chunk packet table… is it fine or coarse grained locking? Does it grab the HDF5 global lock? I used to “hack” my way out of hdf5’s limitations by having dataset’s that were really IO’d out to posix files, sometimes mmap’d files, but I always had to do post processing to merge it in later, meaning more data and process.

  5. Not sure what justification there is for not collaborating and consolidating on software that does the same thing on the same library with the same name. Best way to contribute is to help science done and and help people not get lost in confusion. Neither h5cpp project has a readme or faq entry on what their differences are or why they won’t work together.

I’ve already torn into a few of your points as differences. I do think you’ve got a couple micro-libraries/modules ontop of the common subset for sure though.

  1. “To have access from different threads to the same IO device will only make it slower.” Nope. And you’ll see this usually just doesn’t pan out true in practice. Infiniband, CUDA, general networking - it is almost never fastest to use only processes or only 1 thread doing IO, or to oversubscribe too much. You need generally need a few threads (2-3) working on IO to saturate any given IO device for maximum throughput (or maybe just higher utilization, NAS’s/RAIDs are weird like that). And you need to deal with different device pathways - who said you can have only 1 high performance mount be it NAS, RAID, tmpfs, or just a pcie device. Especially if you’re doing things like compression, your device utilization will suffer. See C10K/C100K problem and webserver design for instance. Further, your model of dedicating a thread as an IO server is slower, you can see the limit of performance this model offers by looking at MPI+openmp, as that’s essentially what that is. The software design is also way more restrictive, cumbersome, more context switches , and less performance typically than using fully independent threads - and IPC / shm over thread based communication is also much less efficient and much harder to do, if not hiding behind the likes of MPI and kind (and these are restrictive while still incurring performance loss due to process IPC overhead).

For MPI-IO , that is a very square peg to alot of round holes, the flexibility is not great or dynamic, I generally don’t want to use it if I can at all avoid it. I’d rather use HDF5’s VDS and parallel file writes, which is so much more flexible to program design. Also, what makes you think I’m not familiar to c++ threads, posix threads, or processes - or even that what performance we’re talking about is covered in the standard or a classroom course? That’s a pretty undercutting suggestion and I hope you’ve changed your mind on it.

Just throwing out there, I think DOE or DARPA / Navy is likely interested in making sure parallel write / read performance is higher and easier to use. Existing solutions are having problems fully utilizing hardware or are models that are not simple in software. Try pursuing it - traditional HPC middleware is at an all-time low in efficiency where 1 machine has a ~4-10x of cores and infiniband links and GPUs they had 5 years ago, and with that more than that factor of work to do in exchange. It’s a great problem to work on.

I think the ZMQ parallel was a good one to draw, but I don’t think a multitude of options served people well (I’ve had my turn with a few of them). I would’ve rather used an official well designed library than sift through a bunch of attempts that fall out of style with time. One of these things I think that is fundamentally wrong is that too much choice is not a good thing, maybe only a little bit choice is good for some things. See the Android OS operating system’s fragmentation and version support - google, and maybe samsung seem to be the winners there, but so many other efforts bombed out and attracted people, only to have their platform/os go dark after a year or so. We knew google was going to stay relevant.

Yea the “soft” recommendation could be nice, I think it fosters a sort of peer review/community engagement and agility in showing what works, while also popularizing the specific library and purpose/capability.

That last comment… that’s weird. It’s just so weird to be hand’s off when you’re trying to be the the standard and implementation of scientific data archiving/serialization.