Page Buffer with pHDF5

luc.grosheintz · November 7, 2022, 3:44pm

I would like to instruct HDF5 to read a page aggregated file using the page buffer. A complete reproducer at the end of the post. The file is created like this:

auto fcpl = H5Pcreate(H5P_FILE_CREATE);
H5Pset_file_space_strategy(fcpl, H5F_FSPACE_STRATEGY_PAGE, false, 0);

auto file = H5Fcreate(filename.c_str(), H5F_ACC_TRUNC, fcpl, H5P_DEFAULT);

To open the file with a page buffer configured I use the following snippet:

auto fapl = H5Pcreate(H5P_FILE_ACCESS);
H5Pset_page_buffer_size(fapl, 1024, 0, 0);

auto file = H5Fopen(filename.c_str(), H5F_ACC_RDONLY, fapl);

When building against the serial version of HDF5 everything works as expected. However, if I build against the parallel version of HDF5, without requesting any MPI-IO or any other change to the code, it fails, producing the following error:

  #006: H5Fint.c line 1950 in H5F_open(): page buffering is disabled for parallel
    major: File accessibility
    minor: Unable to open file

Which leads us to the following line in the repository:

github.com

HDFGroup/hdf5/blob/94119211a74ce966380d0013efd48c6effe01f10/src/H5Fint.c#L1922


/* Check if page buffering is enabled */
if (H5P_get(a_plist, H5F_ACS_PAGE_BUFFER_SIZE_NAME, &page_buf_size) < 0)
    HGOTO_ERROR(H5E_FILE, H5E_CANTGET, NULL, "can't get page buffer size")
if (page_buf_size) {
#ifdef H5_HAVE_PARALLEL
    /* Collective metadata writes are not supported with page buffering */
    if (file->shared->coll_md_write)
        HGOTO_ERROR(H5E_FILE, H5E_CANTOPENFILE, NULL,
                    "collective metadata writes are not supported with page buffering")


    /* Temporary: fail file create when page buffering feature is enabled for parallel */
    HGOTO_ERROR(H5E_FILE, H5E_CANTOPENFILE, NULL, "page buffering is disabled for parallel")
#endif /* H5_HAVE_PARALLEL */
    /* Query for other page buffer cache properties */
    if (H5P_get(a_plist, H5F_ACS_PAGE_BUFFER_MIN_META_PERC_NAME, &page_buf_min_meta_perc) < 0)
        HGOTO_ERROR(H5E_FILE, H5E_CANTGET, NULL, "can't get minimum metadata fraction of page buffer")
    if (H5P_get(a_plist, H5F_ACS_PAGE_BUFFER_MIN_RAW_PERC_NAME, &page_buf_min_raw_perc) < 0)
        HGOTO_ERROR(H5E_FILE, H5E_CANTGET, NULL, "can't get minimum raw data fraction of page buffer")
} /* end if */


/*

When reading “RFC: Page Buffering” I get the impression that this usecase should be covered, since it mentions regression testing for the parallel case and also alludes to partial support in the parallel version of HDF5. However, the line of code that raises the error is unconditional.

I don’t want to use any MPI-IO when reading this particular file. However, I’d like to still be able to build against the parallel version of HDF5 because:

I’d like the possibility to read/write other files using MPI-IO.
It would simplify dependency issues.

Is there some way around this issue? Does anyone know the reasons for why this “temporary” line is still present in the code base?

Many thanks in advance.

// file: paged_buffer.cpp
#include <string>
#include <hdf5.h>

void create_file(const std::string& filename) {
  auto fcpl = H5Pcreate(H5P_FILE_CREATE);
  H5Pset_file_space_strategy(fcpl, H5F_FSPACE_STRATEGY_PAGE, false, 0);

  auto file = H5Fcreate(filename.c_str(), H5F_ACC_TRUNC, fcpl, H5P_DEFAULT);

  H5Pclose(fcpl);
  H5Fclose(file);
}

void read_file(const std::string& filename) {
  auto fapl = H5Pcreate(H5P_FILE_ACCESS);
  H5Pset_page_buffer_size(fapl, 1024, 0, 0);

  auto file = H5Fopen(filename.c_str(), H5F_ACC_RDONLY, fapl);

  H5Pclose(fapl);
  H5Fclose(file);
}

int main() {
  std::string filename = "page_allocated.h5";

  create_file(filename);
  read_file(filename);

  return 0;
}

# file: CMakeLists.txt
project(page_buffer)

find_package(HDF5 REQUIRED)

add_executable(page_buffer)
target_sources(page_buffer PRIVATE page_buffer.cpp)
target_link_libraries(page_buffer PUBLIC HDF5::HDF5)

epourmal1 · November 7, 2022, 4:06pm

Page buffer was designed for parallel HDF5 but, unfortunately, was never implemented due to lack of funding.

luc.grosheintz · November 8, 2022, 8:21am

Thank you for your quick response.

jhenderson · November 8, 2022, 4:41pm

Hi @luc.grosheintz,

it appears that this error check was a bit overly restrictive when added; there should be no issue with using page buffering when a file is opened serially via a parallel-enabled library. The fix should be straightforward and I plan to have that later today.

In the meantime, you could either stick with a serial build, or just remove the error check and rebuild the library, making sure that any file accesses where page buffering is enabled are done serially (or in parallel with a single MPI rank).

luc.grosheintz · November 8, 2022, 4:51pm

Thank you! This is what I was hoping would be the case.

contact · November 9, 2022, 4:54pm

Hi @luc.grosheintz,

In the upcoming new release of HDFql (version 2.5.0), we are introducing a new feature coined sliding cursor. It can be seen as some kind of page buffering (for reads). An extensive explanation and example about this new feature can be found in this post also from this forum.

Hope it helps!

luc.grosheintz · November 10, 2022, 4:54pm

To my understanding, the HDF5 page buffer simply stores aligned parts of the underlying file in a buffer, called paged. Regardless of what’s stored there. For my particular application this is crucial, since it will pull, an entire page worth of, e.g. metadata or tiny datasets, into RAM in a single disk access. In subsequent requests for other nearby datasets, if those happen to be in a page that’s in the buffer then that dataset (or part of it) will be read from RAM. Similarly for the metadata of such datasets. (However, I think this has one more step because of the metadata cache).

If I understand the HDFql approach correctly, it provides a buffered iterator (or buffered random access) for one dataset. Does it also enable buffering multiple datasets; or does it only buffer within a single dataset?

contact · November 14, 2022, 5:19pm

Hi @luc.grosheintz,

Yes, your understanding is correct: a HDFql sliding cursor can be seen as a buffered iterator where a new slice/subset of the dataset is read - through the usage of an (implicit) hyperslab by HDFql - whenever the iterator falls outside the slice/subset limits it currently iterates/traverses. This allows out-of-core operations in a simple way which can be really useful for users, especially when dealing with huge datasets.

In the upcoming new release of HDFql (version 2.5.0), a sliding cursor will only support buffering a single dataset. The good news is that we are also working on another front to enable HDFql to read and (post-)process multiple datasets/attributes potentially held across multiple HDF5 files - see this post for additional details. When this front is solved, it will be a small step to extend a sliding cursor to support buffering multiple datasets as well.

Hope it helps!

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Page Buffer with pHDF5