I’m writing relatively large text files to HDF5 in Fortran and would like to chunk and compress them. The way I currently do it requires to create a type of the string length. Unfortunately, that means that chunking is not really possible: the only chunk has the size/length of the string.
My use case does not involve reading parts of the string. Still, I was wondering whether chunking would be advantageous, for example for the compression filters. I would appreciate suggestions for implementations.
EDIT: A serious limitation of having one chunk seems to be the limit to 4GiB. Once the string length exceeds this limit, writing a chunked dataset will fail.
An MWE illustrating the current situation is attached. test.f90 (2.6 KB)
@m.diehl, your MWE works fine for me. It successfully writes a compressed string in a single chunk, and shows file size reduction. My only change was to reduce the demo string length from 2^26 to 2^20, because the larger size crashed my program on my older Mac, for some irrelevant reason.
So I do not understand your question. What problem are you trying to solve by adding chunking? I do not believe that chunking your single long strings would improve compression in the slightest degree. But it would add complexity.
@dave.allured thanks for the clarification, good to know that compression is independent of the chunk size.
Actually, my question originated from an issue I encountered a while ago from which I just remembered that it was related to chunking.
After careful checking, I realized that the actual problem is the chunk size: It should not exceed 4GiB. So for very large strings, chunking (and check sums and compression) is not possible.
I’ve updated the title and edited the original question. A small reproducer is given here: test.f90 (2.7 KB)