I’ve come across performance problems in a case where chunks are much bigger than the default chunk cache size. The default of 1 MB cache per dataset seems extremely small now that even laptops have multiple GB of RAM, and HPC cluster nodes can have hundreds of GB.
I found a thread about this from a couple of years ago. It looks like having the library try to guess a good cache size is not an option, and I’m OK with that: I’d rather have simple, predictable behaviour even if it is wrong in some cases.
However, I’d like to be able to experiment with different cache sizes without having to recompile the software in question. So I’d propose adding an environment variable to be used like this:
This would override the default, but it could be overridden if the application called
H5Pset_chunk_cache. Suffixes K, M or G would multiply the number by the relevant power of 1024 to get a size in bytes.