[HDF5 parallel chunked compressed I/O in Fortran]

Dear all,

I’m using HDF5 1.10.6 in a Fortran code. I would like to write time series data into an extendable chunked and compressed dataset. I achieve this via the property list propID:

call H5Pcreate_f(H5P_DATASET_CREATE_F, propID, info)
call H5Pset_chunk_f(propID, spaceRank, chunkDims, info)
call H5Pset_deflate_f(propID, 9, info)
...
call H5Dwrite_f(..., xfer_prp=propID)

I apply these commands at first, when I create the dataset. Once the dataset is in place I’m appending data to the dataset with the following commands:

call H5Pcreate_f(H5P_DATASET_XFER_F, propID, info)
call H5Pset_dxpl_mpio_f(propID, H5FD_MPIO_COLLECTIVE_F, info)
call H5Dwrite_f(..., xfer_prp = propID)

Here, while appending the data I don’t apply the compression filter under the assumption, that it is already in place after the initial dataset creation commands. Would you please confirm, that my assumption is correct? If not, would you please direct me: how to apply compression filter at the subsequent writes (appending the data)? My initial experiments show little difference in terms of HDF5 file sizes. I believe, that my assumption may be wrong. Thank you and have a good day ahead!


Best wishes,
Maxim

Dear all,

After further investigation I can confirm the GNU gzip “deflate” filter is active on the dataset before I append the data. Please find a short code snippet below to show you, how I verify this:

  ! Get dataset
  call H5Dopen_f(fileID, dataSetName, dataSetID, info)

  ! @test Retrieve filter information
  call H5Dget_create_plist_f(dataSetID, propID, info)
  print *, "Filter information:"
  print *, "  info:      ", info
  print *, "  propID:    ", propID
  print *, "  dataSetID: ", dataSetID

  ! @test Retrieve no. of filters
  call H5Pget_nfilters_f(propID, nFltrs, info)
  print *, "  info:      ", info
  print *, "  propID:    ", propID
  print *, "  nFltrs:    ", nFltrs

  ! @test Retrieve filter information
  call H5Pget_filter_f(propID, 0, flags, nElems, cdVals, fltrNameLength, fltrName, fltrType, info)
  print *, "  fltrType:  ", fltrType
  print *, "  fltrName:  ", fltrName

I’m starting to think maybe the GNU gzip compression is not that great after all? To be on the safe side below is a full list of routines I’m calling when appending the data. Would you be so kind to look at it and verify, that I don’t miss anything obvious?

! Get dataset
call H5Dopen_f(fileID, dataSetName, dataSetID, info)

! Extend space
call H5Dset_extent_f(dataSetID, spaceDims, info)

! Create 2D array memory space (frame, bead)
call H5Screate_simple_f(memRank, memDims, memID, info)

! Get dataspace
call H5Dget_space_f(dataSetID, spaceID, info)

! Write to extended part of dataset
!   Select hyperslab in dataspace
call H5Sselect_hyperslab_f(spaceID, H5S_SELECT_SET_F, &
                           off, cnt, info)

!   Get array datatype
call H5Topen_f(fileID, arrTypeName, arrTypeID, info)

! Create property list for collective dataset write
call H5Pcreate_f(H5P_DATASET_XFER_F, propID, info)
call H5Pset_dxpl_mpio_f(propID, H5FD_MPIO_COLLECTIVE_F, info)

!   Write data into file collectively
call H5Dwrite_f(dataSetID, arrTypeID, arrBeads, dataSetDims, info, &
                file_space_id = spaceID, mem_space_id = memID,     &
                xfer_prp = propID)

! Close dataspace
call H5Sclose_f(spaceID, info)

! Close memory space
call H5Sclose_f(memID, info)

! Close datatype
call H5Tclose_f(arrTypeID, info)

! Close dataset
call H5Dclose_f(dataSetID, info)

! Close property list
call H5Pclose_f(propID, info)