Explainer on outcome of "page_buf_size" value

The h5py.File argument page_buf_size is confusing me, and since there’s a lot of enthusiasm for cloud-optimization of HDF5 files, I thought asking for a brief public explainer would be useful.

The docs say the value “must be a power of two value and greater or equal than the file space page size when creating the file”. And yet … if I violate that guidance, I can see that page buffering is enabled.

Here’s the h5stat -S on a test file:

Filename: s3://nasa-cryo-scratch/itcarroll/cloud-PACE/PACE_OCI_L2_AOP/8388608/G4194150056-OB_CLOUD
File space management strategy: H5F_FSPACE_STRATEGY_PAGE
File space page size: 8388608 bytes
Summary of file space information:
  File metadata: 320158 bytes
  Raw data: 18571705 bytes
  Amount/Percent of tracked free space: 14658473 bytes/43.7%
  Unaccounted space: 4096 bytes
Total space: 33554432 bytes

Note the page size is 2**23. When I track the fsspec logs, I see this nicely reflected!

xarray.open_dataset(..., engine=“h5netcdf”, driver_kwds = {“page_buf_size”: 2}, open_kwargs={“cache_type”: “none”})
DEBUG:fsspec:... read: 0 - 8 
DEBUG:fsspec:... read: 0 - 8 
DEBUG:fsspec:... read: 0 - 48 
DEBUG:fsspec:... read: 48 - 560 
DEBUG:fsspec:... read: 0 - 8388608 
DEBUG:fsspec:... read: 16777216 - 25165824 
DEBUG:fsspec:... read: 0 - 8388608

But did you notice that I’m setting page_buf_size=2? How is the page buffer set when you don’t follow the guidance? Since I generally don’t know the page size ahead of time, is it good enough to set the page_buf_size to 1?

Version info:

  • hdf5 2.1.0
  • h5py 3.16.0
  • xarray 2026.4.0
  • fsspec 2026.4.0
  • s3fs 2026.4.0