Thank you for the information @ajelenak! I’d prefer to use h5py if setting buffer/cache sizes is not possible via h5repack.
Do I understand correctly that the rdcc_nbytes and page_buf_size properties are not inherent properties of the HDF5 file on disk, but are instead read configuration settings that are only specified when reading the data? Would this workflow be appropriate:
Set the PAGE size and dataset chunking using h5repack:
Yes, you are correct. Dataset chunk caches and page buffer cache are libhdf5 runtime in-memory caches, nothing to do with HDF5 files.
I assume the -G 9999 is just a placeholder file page size value. Note that you specify file page size in bytes but new dataset chunks by their shape. This means the dataset chunk size in bytes will be a product of the total number of HDF5 dataset elements in the chunk and that dataset’s datatype size in bytes.
Typically you want something like 8-16MiB for a page size so it can hold 4-8 dataset chunks. And the page buffer cache should be large enought to hold all the internal file metadata pages plus at least several data pages. Best is to decide on the new larger chunk sizes (in bytes) first and then work out the page and cache sizes. One-size-fits-all may not be the best approach and tuning dataset chunk sizes and their caches per dataset may be needed to avoid large memory use.