Chunking and 1-D Datasets


#1

Hello HDF Forum,

I’ve pored over the HDF user manual and the Advanced chunking page (https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/), and I haven’t particularly found much to answer a question I saw in a 2009 presentation given by the HDF group that asks, “Why shouldn’t I make a chunk with dimension sizes equal to 1?”

Currently, I am dealing with 1-D arrays, and I am writing out to nx1 chunks. My program is working fine, but I came across this question as I was researching something else on chunking.

So:

  • Why is a chunk (or even a dataset) with a dimension of 1 (either nx1, or 1xn) bad?
  • Are there I/O performance hits associated with a one dimension?
  • Let’s say I write to a dataset that is n x 2, where n is extendible and chunks of m x 2. Will this improve performance, or change my data, even if I am only dealing with 1-D arrays?

Thanks!!! Apologies if this question is very elementary,
George


#2

Hi, as far as I know, chunking should match your access patterns. A 1x1 chunk in a 512x512 array is terrible of course, one chunk per element would kill performance. Also consider that compression works per chunk - a 1x1 element cannot be compressed. So a one-dimensional chunk should be avoided in an n-dimensional dataset, but if your dataset is one-dimensional anyway, then a one-dimensional chunk should be fine. Just make it still reasonably large, and try to match access patterns. For instance, if you usually read or write 512-element hyperslabs, don’t use a chunk size of 511 or 513, for obvious reasons.


#3

To address your original inquiry…I am not aware of anything special regarding setting some of the dimensions of a chunk to unity (e.g. one). I do this in H5Z-ZFP when it is appropriate. It means such a chunk does not span those dimensions of the dataset in which it is embedded. But, it should span at least some of those dimensions such that the chunk size is large enough to amortize time and space performance for storing and accessing the data in the dataset.

If you are dealing with 1D arrays, then I think you should deal with them in HDF5 as 1D datasets. However, as previous respondent indicated, you don’t wanna choose a chunk size that would result in tiny chunks. So, if you have a 1D, extendible dataset of doubles, you might choose a chunk with dimension 512. That would mean the chunk size, in bytes, is 4096 (4K) which matches most file block sizes on ext4 file systems.

Does that help?


#4

@werner @miller86

Thanks for your responses! They are both very helpful and clarifying. I’m generally doing as you both have advised already – extending my datasets by a fairly large amount, and I’m ensuring that it matches my access patterns. When I first read the remark about “why I shouldn’t have a 1-D chunk” it gave me pause about my current strategy.

Thanks again!


#5

So, I probably spoke a little too soon also without knowing more about your situation. So, lets say you were dealing with literally thousands of 1D datasets…then it could make sense to handle that as a single 2D dataset where the first dimension represents the “which dataset” dimension. If you only ever expected to be dealing with the same thousands of datasets all the time, then that dimension might be best to be NOT-extendible. OTOH, if you expect to be able to add new 1D datasets over the life of the file, then you’d probably wanna make that first dimension extendible. The other dimension represents the original 1D, extendible dimension you started with. Instead of a slew of 1D dataset objects in the HDF5 file you would have a single 2D dataset. Which route you take depends on your workflow and what producer(s) and consumer(s) expect to want to (easily) do with the data.


#6

Hello,
a few notes to add what already been established. I’ve been working on a high throughput data pipeline for H5CPP a c++ persistence with HDF5 datastore.
From data access perspective it does matter how memory is accessed and so does its alignment, however there is more to it, especially if you go through the HDF5 CAPI. It is a good idea to swap chunk dimensions and see if there is a difference, or use a rugged optimizer to find the right values on a given architecture. DE or differential evolutionary method is such method – although not the only one nor is the fastest.

If you are only interested in the final result, and focus on the main part of your work you might want to give a try to H5CPP high throughput pipeline. The caveat is not all features are in place yet, but if you are fine with saving data sets either incrementally by h5::append or one shot by h5::write then this may be an option for you.

Here is a quick throughput measurement on my Lenovo 230 i7 laptop for small objects 350MB data 2-3GB/sec and large 3.5GB data 200-400MB/sec. CAPI stands for HDF5 CAPI built in pipeline, RAW BINARY and ARMA BINARY is to put in context what average good result is and H5CPP results are measurements for different scenarios. Expect full completion by end of January…

The test may be verified / challenged by (h5cpp root)/profile/throughput make clean && make && make ./tile
Best wishes,
Steven

------------------------ THROUGHPUT TEST ----------------------------
MEMORY SIZE of DATASET:368.64MB
DATASET: 100x720x1280 CHUNK: 1x72x1280

HDF5 1.10.4 CAPI PIPELINE:   9.56586 throughput: 38.5371 MB/s
HDF5 1.10.4 H5CPP HIGH THROUGHPUT PIPELIN:
0.112945 throughput: 3263.89 MB/s
HDF5 1.10.4 H5CPP APPEND: scalar values  directly into chunk buffer
0.72458 throughput: 508.764 MB/s
HDF5 1.10.4 H5CPP APPEND: objects with matching chunk size, data directly written from object memory
0.461195 throughput: 799.315 MB/s
RAW  BINARY 
0.973073 throughput: 378.841 MB/s
ARMA BINARY 
1.42448 throughput: 258.789 MB/s

large objects:

------------------------ THROUGHPUT TEST ----------------------------
MEMORY SIZE of DATASET:3686.4MB
DATASET: 1000x720x1280 CHUNK: 1x72x1280
HDF5 1.10.4 CAPI PIPELINE:
103.828 throughput: 35.5049 MB/s
HDF5 1.10.4 H5CPP HIGH THOUPUT PIPELINE:
14.2666 throughput: 258.394 MB/s
HDF5 1.10.4 H5CPP APPEND: scalar values  directly into chunk buffer
12.1014 throughput: 304.626 MB/s
HDF5 1.10.4 H5CPP APPEND: objects with matching chunk size, data directly written from object memory
12.8119 throughput: 287.733 MB/s
RAW  BINARY 
13.2719 throughput: 277.759 MB/s
ARMA BINARY 
12.9693 throughput: 284.24 MB/s