The h5py.File argument page_buf_size is confusing me, and since there’s a lot of enthusiasm for cloud-optimization of HDF5 files, I thought asking for a brief public explainer would be useful.
The docs say the value “must be a power of two value and greater or equal than the file space page size when creating the file”. And yet … if I violate that guidance, I can see that page buffering is enabled.
Here’s the h5stat -S on a test file:
Filename: s3://nasa-cryo-scratch/itcarroll/cloud-PACE/PACE_OCI_L2_AOP/8388608/G4194150056-OB_CLOUD
File space management strategy: H5F_FSPACE_STRATEGY_PAGE
File space page size: 8388608 bytes
Summary of file space information:
File metadata: 320158 bytes
Raw data: 18571705 bytes
Amount/Percent of tracked free space: 14658473 bytes/43.7%
Unaccounted space: 4096 bytes
Total space: 33554432 bytes
Note the page size is 2**23. When I track the fsspec logs, I see this nicely reflected!
xarray.open_dataset(..., engine=“h5netcdf”, driver_kwds = {“page_buf_size”: 2}, open_kwargs={“cache_type”: “none”})
DEBUG:fsspec:... read: 0 - 8
DEBUG:fsspec:... read: 0 - 8
DEBUG:fsspec:... read: 0 - 48
DEBUG:fsspec:... read: 48 - 560
DEBUG:fsspec:... read: 0 - 8388608
DEBUG:fsspec:... read: 16777216 - 25165824
DEBUG:fsspec:... read: 0 - 8388608
But did you notice that I’m setting page_buf_size=2? How is the page buffer set when you don’t follow the guidance? Since I generally don’t know the page size ahead of time, is it good enough to set the page_buf_size to 1?
Version info:
- hdf5 2.1.0
- h5py 3.16.0
- xarray 2026.4.0
- fsspec 2026.4.0
- s3fs 2026.4.0
