Slow writing of a multidimensional array with h5py for Fortran read

lsawade · June 5, 2024, 7:05pm

Hi,

For a project I need both a Python and a Fortran API for reading datasets written with H5PY. For the Python API, I create and array

# Approximate sizes: Nstations=200, Ncomp=3, Ndisp=3, Ncoor=~1000, Ntime=3600
dim = (Nstations, Ncomp, Ndisp, Ncoor, Ntime)
ds.create_dataset("displacement", dim, dtype="f")

Then I populate the stations iteratively like so

for i in stations:
    ds['displacement'][_i, :,:,:,:] = stationdisplacement

This works totally fine within Python for writing and reading, and performance is in fact much better than I expected. But since Fortran is column first I tried to transpose the arrays and write it like so

dim = (Nstations, Ncomp, Ndisp, Ncoor, Ntime)

# Tranposing dimensions and station wise array
if fortran:
    dim = dim [::-1]
    stationsdisplacement = stationdisplacement.transpose((3,2,1,0))

# Creating to full dataset:
ds.create_dataset("displacement", dim, dtype="f")

for i in stations:
    ds['displacement'][:,:,:,:,_i] = stationdisplacement

The problem really is the writing by the last index. Which I assume has to do with how HDF5 traverse the disk when writing (I tested putting the index first).

Is there a way that I can optimize this. Or, is there a way in the Python API to say write in fortran order/ in the Fortran API to say read in C order?

Preferably, I would like to avoid transposing on the Fortran side because it would require to transpose the entire array with all stations.

Thank you!

Cheers,

Lucas

daniel.kahn · June 5, 2024, 7:41pm

You say “the problem is”… but you don’t really describe the problem. What’s wrong?

I think you are going at this the wrong way. Don’t worry about changing the ordering in the file to match the programming language, but worry about the implementation of your algorithm in each language to processes the array elements in the most efficient order. For Python that means the right-most element varies fastest, for Fortran it’s the left-most. Don’t transpose the arrays, but change the ordering of the nested loops as you go from a Python implementation to a Fortran one. The fastest varying dimensions should be the inner loops in your algorithm, this has the effect of making the Fortran code “look backwards” from the Python.

What’s really important is to identify which dimensions will vary fastest and which will vary slowest in your algorithm. Sometimes this is obvious and you can do make this determination, sometimes you can’t predict the order the end user will access the data. If your data are being written by Python the fastest varying dimension should be right-most, as I think you know.

Where this kind of analysis becomes even more important is if you have very large arrays and the applications will only read pieces of them using HDF5 hyperslabs. If the hyperslab consists of the fastest dimensions then you’ll be reading contiguous blocks off number off the disk.

lsawade · June 5, 2024, 8:56pm

Hi Daniel,

Thanks already for answering so quickly. Let me try to clarify some things.

You say “the problem is”… but you don’t really describe the problem. What’s wrong?

The problem is really just that the writing using the last index is incredibly slow. And, as you mention, that has to do with the fastest. The writing having the index last is 60x slower than have the index first.

My specific issue is that I wrote a fairly extensive Fortran API to load files like this and extract from them. So I want to either write files faster with the Fortran index layout so I don’t have to change the Fortran API, or, figure out a different way of reading the C order array into the proper shape in Fortran, so that I don’t have to rewrite every single loop in my software.

I think you are going at this the wrong way.

Definitely.

Don’t worry about changing the ordering in the file to match the programming language, but worry about the implementation of your algorithm in each language […] Don’t transpose the arrays, but change the ordering of the nested loops as you go from a Python implementation to a Fortran one.

I don’t think I understand this. Do I have control over how an array is read/written in Fortran? I implemented a 5D array reader in Fortran which just reads a full 5D array of whatever size. I’m not reading single elements in a loop. Is that what you are suggesting? Attaching relevant Fortran pieces below.

Where this kind of analysis becomes even more important is if you have very large arrays and the applications will only read pieces of them using HDF5 hyperslabs.

For my specific application, I cannot use hyperslabs if I understand the Fortran docs correctly, which is why I’m just reading the full arrays on large memory nodes and work with them later. The reason is that for each analysis, the Ncoor dimension indexing would be heterogeneous (125 indeces, almost randomly picked from [0, ~1000]). I am using hyperslabs in h5py, where I order the indeces in increasing order and then reorder the array after reading from the file for a pretty hefty speedup.

This piece defines all HDF5 fortran abstractions in to
“read_array”.

github.com/lsawade/GF3DF

src/utils/hdf5_utils.F90

e3471b84f


      
            ! close dataset
            call h5dclose_f(dset_id, error)
            if (error /= 0) then
              write(*,'("cannot close hdf5 dataset",/)')
              return
            endif
          
            if (error /= 0) got = .false.
          end subroutine read_real_array_2d
          
          subroutine read_real_array_5d(x, name, id, got, error)
          
            real(kind=8),            intent(out) :: x(:,:,:,:,:)
            character(*),    intent(in)  :: name
            integer(hid_t),  intent(in)  :: id
            logical,         intent(out) :: got
            integer,         intent(out) :: error
          
            integer, parameter :: ndims = 5
            integer(hsize_t)   :: xshape(ndims)
            integer(hid_t)     :: dset_id

This one three things. 1. get dimensions of the array 2. allocate the array 3. read array from the HDF5 file into the allocated array.

github.com/lsawade/GF3DF

src/gf/io.F90

e3471b84f


      
            call throwerror(errorflag, "Error Reading burials")
          
            ! Finally define a midpoint number on the fly.
            GF%midx = GF%ngllx / 2 + 1
            GF%midy = GF%nglly / 2 + 1
            GF%midz = GF%ngllz / 2 + 1
          
          end subroutine load_header
          
          
          module subroutine load_arrays(file_id, GF)
          
            use constants, only: GAUSSALPHA, GAUSSBETA, IMAIN, DEBUG
            use gll_library, only: zwgljd
            use lagrange_poly, only: lagrange_any, lagrange_deriv_GLL
            use utils, only: throwerror
            use hdf5
            use hdf5_utils, only: read_from_hdf5, get_dset_dims
          
            implicit none

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Slow writing of a multidimensional array with h5py for Fortran read