Need Help Getting Started with PInvoke

I want to begin by saying that I have never used HDF5 before, so some of my questions may sound trivial.

I am working on converting a project from using a SQL database to use a HDF5 database, but I am having trouble finding up to date documentation or examples of syntax.

We are using .NET version 4.5, so I was checking out using HDF.PInvoke (https://github.com/HDFGroup/HDF.PInvoke) since I was told that HDF5DotNet was not supported any more.

I’ve tried to find examples of how to get started using PInvoke, but the only resource I could find was the “Cookbook” on the GitHub page, which doesn’t show me how to create a new HDF5 database from the code behind.

Could anyone help me get started using PInvoke? Particularly, I want to know syntax for the basics (create a new database file and do the equivalent of SQL’s add, select, and delete operations on the data, and how to do the equivalent of SQL’s Join operation)

Also, can I just install the NuGet package for HDF.PInvoke, v1.10.3? Or are there other files I need to add as well to be able to use PInvoke? I am using Visual Studio 2015

Thank you for your time, I hope to hear from you soon

Before getting lost in (.NET) technical details you should spend some time to get the hang of HDF5. I’d recommend to take a look at HDFql, PyTables, and h5py. HDF5 may not be the kind of database you have in mind.

I am working on converting a project from using a SQL database to use a
HDF5 database, but I am having trouble finding up to date documentation
or examples of syntax.

As with any flexible, powerful tool like HDF5, there are lots of options here. Not surprisingly, they really depend on the use-case, e.g.

  • SQL database with live access to an HDF5 file
  • Complete ETL (e.g. extract and conversion) of the data from SQL DB to 1 or more HDF5 files
  • Ability to have SQL “language” and query capabilities on an HDF5 file, but without the overhead of an actual SQL DB
  • etc.

As Gerd mentioned, PInvoke may or may not have anything to do with your final solution, since all it is a handy way for .NET prorgammers to use our C library.

Could you provide more info on why you’re trying to convert, and what’s the end goal? This would help us point you in the right direction. Thanks!

Basically we just want to store our data in a more compact file type than an SQL database

The application we are designing is a Post Processing application that reads in data from a database file (Currently we are using an SQLite database) and uses the data to generate graphs and data grids that it displays to users. A separate application will create and write it’s output to the database (which could be millions of tuples). The application we are designing will only need to read the data from the database and organize the results in a way the rest of the application can use

The problem we have with SQLite is that the size of the database file is way too large (2 GB and up if we have over 1 million data tuples, which is very likely). We would like to save the output into an HDF5 database so we can store the data in a more compact way.

I would prefer to keep the SQL queries we have right now if possible, so the “Ability to have SQL ‘language’ and query capabilities on an HDF5 file, but without the overhead of an actual SQL DB” is what I think is the most accurate approach.

Following Gerd’s recommendation, I took a look at HDFql, which caught my eye as a good possible solution to our problem. I like that we can use SQL syntax to read and organize data from the database. I will give that a try for now

In the meantime, if anyone knows a better way to go about reading from a HDF5 database in .NET, let me know and I will check it out.

Thank you for all the assistance you have provided so far, and let me know if I can provide you with any more information.

Here’s another reference for .NET: https://ilnumerics.net/hdf5-interface.html