Environment variable for chunk cache size?

paramon · July 4, 2018, 8:38am

Hi Thomas!

03.07.2018 18:49, Thomas Kluyver пишет:

I’ve come across performance problems in a case where chunks are much
bigger than the default chunk cache size. The default of 1 MB cache per
dataset seems extremely small now that even laptops have multiple GB of
RAM, and HPC cluster nodes can have hundreds of GB.

I found a thread about this from a couple of years ago
https://forum.hdfgroup.org/t/chuck-cache-size-proposal/3684. It looks
like having the library try to guess a good cache size is not an option,
and I’m OK with that: I’d rather have simple, predictable behaviour even
if it is wrong in some cases.

Please see also:
http://hdf-forum.184993.n3.nabble.com/Global-cache-size-td4027913.html

However, I’d like to be able to experiment with different cache sizes
without having to recompile the software in question. So I’d propose
adding an environment variable to be used like this:

HDF5_CHUNK_CACHE_SIZE=128M |

This would override the default, but it could be overridden if the
application called |H5Pset_cache| or |H5Pset_chunk_cache|. Suffixes K, M
or G would multiply the number by the relevant power of 1024 to get a
size in bytes.

That’s one neat idea! However, I’m pretty sure it will break some
existing applications relying on the defaults…

Best wishes,
Andrey Paramonov

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Environment variable for chunk cache size?