Seeking help to optimize HDF Reads of very wide compound types


I am reaching out to seek help on how to better utilize HDF5 for my job. I’m going to give as much upfront background as possible and am willing to come back with more information if asked.


We use hdf5 to record our simulation data, all data is stored as compound types and gets written out as a Nx1 array of these compound types. We try to maximize disk space by using maximum gzip compression. A single hdf we write can range from 25 MB to upwards of 2.5 GB. We write around 100 datasets at the minimum.

Some datasets can be upwards of 460 fields. Meaning, the compound type representing an entry in that dataset can have up to 460 “fields” that the equivalent c++ struct would have. All fields of a dataset are known apriori, but the amount of rows that dataset will have from the simulation is unknown (we resize the datasets often I believe).

(I’m more than happy to share more, or even give an example dump of one of our hdfs, I just don’t know what to share)


Our biggest problem, in my opinion, and what I’m trying to optimize is reading the datasets back out. We read from the datasets many many times and when profiling code, 90% of our runtime is reading from disk (specifically in h5py). When reading we use the standard driver of the OS (Linux in this case), I tried toying around with Core Driver to read hdf into memory but it didn’t seem to provide any performance benefits.

For the datasets that contain dozens to 100s of fields, we typically care about a very small subset at most on any given read. We also read in all entries. This is the biggest problem, we want all rows but only a small subset of the fields.

So how can we optimize this sort of reading with hdf?


I’ve tried doing partially I/O of the compound type fields, but in some testing on our largest compound types (460 fields), the read times between reading all entries of 1 field and all entries of all fields, is 2 seconds at best. Which to me, makes partial field reading pointless.

Additionally, I’ve tried breaking the fields of these compound types into separate arrays. Meaning, in the case of a compound type with 460 fields. What I did was write 460 arrays, each in a dataset of that field name. This however killed our write times to the point that it would in no way be acceptable.

Unless you read full records, compounds w/ dozens or hundreds of fields tend to be a hindrance. Breaking things out into columns or column groups makes sense (also better compression).

How about H5D[read,write]_multi?

See also the RFC.


1 Like

I was assuming you are using MPI. Is your code sequential or parallel? G.

Serial unfortunately, we are restricted to pre mpi version

So all in all at the moment, it seems like we need to severely rethink our current data architecture.

Is there any additional information, such as a dump, that could help here?

Hi Matt,

You might want to do IO in blocks, so called chunks in HDF5 terminology – and pack related fields into a single or multiple blocks such that you end up with an acceptable fill rate. Efficiently packing/unpacking the blocks is the key to the problem.
Once the blocks are packed the IO throughput using HDF5 direct chunk read/write will be the function of the fill rate. At this point you are done with HDF5, but not the problem.

There are various approaches the packing shapes in the area, but the general idea here is to recognise you can trade storage space for simplicity / latency. This means storing some record fields multiple times can be favourable – in this relaxed model you might want to design your IO system backward, starting from the queries.

Of course this is much easier to write than to do.

You could also consider breaking datasets out by datatype and have two-dimensional datasets mimicking columns. You could use attributes or separate datasets to keep track of the column names. That way you can do hyperslab selections to filter out row- and column-ranges. As @steven.varga suggested, combined with chunking, this should be pretty fast.


To round out possible options: create separate HDF5 file for each compound field and have a “main” HDF5 file that would include all the “field” HDF5 files as external links. This should allow best write throughput if separate processes can write compound field data simultaneously. And with each field data in separate file, you can achieve optimized read operations as well. You may just end up with a lot more HDF5 files than you ever expected.