Hi everyone,
I hope that I can ask for some hint here since I am new in this topic. I would like to implement the most effective way (collective) to write down a set of ordered data in a MPI Fortran CFD code by means of parallel HDF5. In each MPI process, I have NZ zones, and in each of them there are there are some 3d variables in a structure
VAR(Z)%R(NI(Z),NJ(Z),NK(Z))
where R is a variable (with some other with the same dimension) and NI(Z), NJ(Z),NK(Z) are the number of cells in dimensions 1,2,3.
Thank you in advance.
Have you looked into using CGNS, cgns.org?
It works with Fortran and is tuned to use HDF5 as the backend when doing parallel IO.
I have considered cgns, but I would like to focus on performances. Since cgns relays over hdf5, I think that there is a little bit of overhead, or am I wrong? Thank you.
Sorry, I just saw this.
Yes, any standard will incur an overhead, just as using HDF5 is going to have an overhead compared to raw IO.
Most of the overhead in using standards is associated with the metadata. Still, if you are smart about it, you can minimize this overhead by having only a limited number of the ranks involved in the metadata stage.
In the case of, for example, cgp_field_write_data_f, that is, for the most part, just calling H5Dwrite. I would be surprised if there would be much performance difference between the two APIs. However, the key is the tuning (via H5P APIs) baked into CGNS’s H5Dwrite. So you will need to spend time tuning your implementation (rarely does H5Dwrite with H5P_DEFAULT properties perform optimally in parallel) to mitigate the overhead of using a standard such as CGNS.
Hi. I become aware that, by pure chance, we are already interacting by email (the mail title is the same as the topic here). If you allow, I would report here the conversation in order to help other people and let who is interested to join in the discussion. Of course, thank you again for the (double) support.
By the way, the CGNS call cgp_field_write_data_f is designed to write a concurrent array scattered between MPI processses with reference to offset and count for each process, whereas I need to write in a concurrent (and efficient!) way different and independent arrays in the various processes, as you perhaps know.
I return on this topic. I have developed a code that write properly the data on a process by creating a compound dadatype (compound.t5.f90), albeit with some trickies in memory locations since the use of MPI scrambles memory locations of the structure objects. But it works.
module var_mod
type vt
sequence
double precision,dimension(:,:,:),allocatable :: r,p
endtype vt
type(vt),target,dimension(:),allocatable :: var
endmodule var_mod
PROGRAM main
USE hdf5
USE ISO_C_BINDING
USE MPI
USE var_mod
IMPLICIT NONE
! FILES
CHARACTER(LEN=20) :: H5FILE_NAME
CHARACTER(LEN=*),PARAMETER :: DATASETNAME = "CompoundOfArrays"
CHARACTER(LEN=10) :: varname
INTEGER,PARAMETER :: NB = 4
INTEGER,PARAMETER :: RANK = 3
!----------------------------------------------------------------
INTEGER :: b,i,j,k,me,np,mpierr,bmin,bmint,bmax,bmaxt
INTEGER(hid_t) :: file,dataset,space
INTEGER(hsize_t) :: DIM(1) = (/1/) ! Dataspace dimensions
INTEGER(SIZE_T) :: maxloc,minloc,offset,type_size,type_sized ! Size of the double precision datatype
INTEGER :: hdferr
TYPE(C_PTR) :: min_ptr
INTEGER(hid_t) :: vwrite ! File datatype identifier
INTEGER(hid_t),dimension(nb) :: tidr,tidp ! /* Nested Array Datatype ID */
INTEGER(HSIZE_T),DIMENSION(3) :: tdims
integer,dimension(:),allocatable :: ni,nj,nk
!----------------------------------------------------------------
me = 0
call mpi_init(mpierr)
call mpi_comm_size(mpi_comm_world,np,mpierr)
call mpi_comm_rank(mpi_comm_world,me,mpierr)
allocate(ni(nb))
allocate(nj(nb))
allocate(nk(nb))
allocate(var(nb))
do b = 1,NB
ni(b) = 10*(b/3+1)
nj(b) = 10*(b)
nk(b) = 10*(b/5+1)
allocate(var(b)%r(ni(b),nj(b),nk(b)))
allocate(var(b)%p(ni(b),nj(b),nk(b)))
ENDDO
!
! Initialize FORTRAN interface.
!
CALL h5open_f(hdferr)
!
! Initialize the data
!
do b = 1,nb
DO k = 1,nk(b)
DO j = 1,nj(b)
DO i = 1,ni(b)
var(b)%r(i,j,k) = 10000*me+1000*b+i*100+j*10+k
var(b)%p(i,j,k) = -var(b)%r(i,j,k)
ENDDO
ENDDO
ENDDO
enddo
!
! Create the file.
!
write(H5FILE_NAME,12) me
12 format('mpivars_p',I0,'.h5')
CALL H5Fcreate_f(H5FILE_NAME,H5F_ACC_TRUNC_F,file,hdferr)
CALL h5tget_size_f(H5T_NATIVE_DOUBLE,type_sized,hdferr)
!
! find minimum and maximum address memory
!
minloc = loc(var) ; bmin = 1 ; bmint = 1
maxloc = loc(var) ; bmax = nb ; bmaxt = 3
do b = 1,NB
write(10+me,*) b,loc(var(b)),loc(var(b)%r),loc(var(b)%p),minloc
if(minloc-loc(var(b))>0) then
minloc = loc(var(b))
bmin = b
bmint = 1
endif
if(minloc-loc(var(b)%r)>0) then
minloc = loc(var(b)%r)
bmin = b
bmint = 2
endif
if(minloc-loc(var(b)%p)>0) then
minloc = loc(var(b)%p)
bmin = b
bmint = 3
endif
if(maxloc-loc(var(b))<0) then
maxloc = loc(var(b))
bmax = b
bmaxt = 1
endif
if(maxloc-loc(var(b)%r)<0) then
maxloc = loc(var(b)%r)
bmax = b
bmaxt = 2
endif
if(maxloc-loc(var(b)%p)<0) then
maxloc = loc(var(b)%p)
bmax = b
bmaxt = 3
endif
enddo
! write(*,*) ' lowest memory address:',minloc
! write(*,*) 'highest memory address:',maxloc,bmax
select case(bmint)
case(1)
min_ptr = c_loc(var(bmin))
case(2)
min_ptr = c_loc(var(bmin)%r)
case(3)
min_ptr = c_loc(var(bmin)%p)
endselect
type_size = 0
do i = 1,nb
type_size = type_size + 2*ni(i)*nj(i)*nk(i)*8
enddo
write(*,*) 'min theorical type size:',type_size
type_size = maxloc-minloc+ni(bmax)*nj(bmax)*nk(bmax)*type_sized*2
!
! Create the memory data type.
!
write(*,*) 'actual type size:',type_size
! write(*,*) 'cloc:',LOC(var)
! do i = 1,nb
! write(*,*) 'ni*nj*nk*8:',ni(i)*nj(i)*nk(i)*8
! write(*,*) 'cloc:',i,LOC(var(i))
! write(*,*) 'cloc r:',i,LOC(var(i)%r)
! write(*,*) 'cloc p:',i,LOC(var(i)%p)
! enddo
!
! do i = 1,nb
! if(i>1) write(*,*) 'offset:',i,H5OFFSETOF(C_LOC(var(i-1)),C_LOC(var(i)))
! write(*,*) 'offset r:',i,H5OFFSETOF(C_LOC(var(i)),C_LOC(var(i)%r))
! write(*,*) 'offset p:',i,H5OFFSETOF(C_LOC(var(i)%r),C_LOC(var(i)%p))
! ENDDO
CALL H5Tcreate_f(H5T_COMPOUND_F,type_size,vwrite,hdferr)
do b = 1,nb
tdims = (/ni(b),nj(b),nk(b)/)
CALL h5tarray_create_f(H5T_NATIVE_DOUBLE,RANK,tdims,tidr(b),hdferr)
CALL h5tarray_create_f(H5T_NATIVE_DOUBLE,RANK,tdims,tidp(b),hdferr)
enddo
offset = 0
do b = 1,nb
write(varname,10) b
10 format('var_r_b',I0)
offset = H5OFFSETOF(min_ptr,C_LOC(var(b)%r))
CALL h5tinsert_f(vwrite,varname,offset,tidr(b),hdferr)
if(hdferr>0) write(*,*) b,'insert r:',hdferr
! offset = offset+type_sized*ni(b)*nj(b)*nk(b)
write(varname,11) b
11 format('var_p_b',I0)
offset = H5OFFSETOF(min_ptr,C_LOC(var(b)%p))
CALL h5tinsert_f(vwrite,varname,offset,tidp(b),hdferr)
if(hdferr>0) write(*,*) b,'insert p:',hdferr
! offset = offset+type_sized*ni(b)*nj(b)*nk(b)
enddo
!
! Create the data space.
!
!
CALL H5Screate_simple_f(1,dim,space,hdferr)
!
! Create the dataset.
!
CALL H5Dcreate_f(file,DATASETNAME,vwrite,space,dataset,hdferr)
! CALL H5Tpack_f(vwrite,hdferr)
!
! Write data to the dataset
!
CALL H5Dwrite_f(dataset,vwrite,min_ptr,hdferr)
!
! Release resources
!
CALL H5Tclose_f(vwrite,hdferr)
CALL H5Sclose_f(space,hdferr)
CALL H5Dclose_f(dataset,hdferr)
CALL H5Fclose_f(file,hdferr)
do b = 1,nb
deallocate(var(b)%r,var(b)%p)
ENDDO
deallocate(var)
deallocate(ni,nj,nk)
! end if
call mpi_finalize(mpierr)
ENDPROGRAM main
Then I tryed to extend this approach to the parallel one, using an array of compound datatypes to be written in collective mode by hyperslabs (final_test_hdf5). The code is ok over a single process, but it hangs in an infinite closure loop when two or more processes are used. Could you support me? Thanks in advance.
module var_mod
type vt
sequence
double precision,dimension(:,:,:),allocatable :: r,p
end type vt
type(vt),target,dimension(:),allocatable :: var
end module var_mod
PROGRAM FINALH5TEST
USE HDF5 ! This module contains all necessary modules
USE ISO_C_BINDING
USE MPI_f08
USE var_mod
IMPLICIT NONE
!
! HDF5 definitions
!
character(len=20) :: varname
CHARACTER(LEN=10),PARAMETER :: filename = "final.h5" ! File name
CHARACTER(LEN=10),PARAMETER :: dsetname = "CompArray" ! Dataset name
INTEGER(HID_T) :: file_id ! File identifier
INTEGER(HID_T) :: dset_id,vwrite! Dataset identifier
INTEGER(HID_T) :: filespace ! Dataspace identifier in file
INTEGER(HID_T) :: memspace ! Dataspace identifier in memory
INTEGER(HID_T) :: plist_id ! Property list identifier
INTEGER(hid_t),dimension(:),allocatable :: tidr,tidp ! /* Nested Array Datatype ID */
INTEGER(HSIZE_T),DIMENSION(1) :: dimsf ! Dataset dimensions.
INTEGER(HSIZE_T),DIMENSION(1) :: count
INTEGER(HSIZE_T),DIMENSION(3) :: tdims
INTEGER(HSSIZE_T),DIMENSION(1) :: offset
! INTEGER,ALLOCATABLE :: data(:) ! Data to write
INTEGER :: rank = 1 ! Dataset rank
INTEGER :: rank3 = 3 ! Arrays rank
INTEGER :: hdferr ! hdferr flags
!
! MPI definitions
!
INTEGER :: mpierror ! MPI hdferr flag
INTEGER :: mpi_size,mpi_rank,mpicomm,mpiinfo
type(mpi_comm) :: comm
type(mpi_info) :: info
!
! program definitions
!
integer :: nb,b,bg,i,j,k,bmin,bmint,bmax,bmaxt
integer,dimension(:),allocatable :: ni,nj,nk
TYPE(C_PTR) :: min_ptr
INTEGER(SIZE_T) :: maxloc,minloc,memoff,type_size,type_sized
!----------------------------------------------------------------
comm = MPI_COMM_WORLD
info = MPI_INFO_NULL
mpicomm = comm%mpi_val
mpiinfo = info%mpi_val
CALL MPI_INIT(mpierror)
CALL MPI_COMM_SIZE(comm,mpi_size,mpierror)
CALL MPI_COMM_RANK(comm,mpi_rank,mpierror)
!
! allocate and initialize data structures
dimsf = (/ mpi_size /)
nb = mpi_rank+1
allocate (ni(nb))
allocate (nj(nb))
allocate (nk(nb))
allocate (var(nb))
do b = 1,NB
ni(b) = 1*(b/3+1)
nj(b) = 1*(b)
nk(b) = 1*(b/5+1)
allocate (var(b)%r(ni(b),nj(b),nk(b)))
allocate (var(b)%p(ni(b),nj(b),nk(b)))
END DO
!
! Initialize the data
!
do b = 1,nb
DO k = 1,nk(b)
DO j = 1,nj(b)
DO i = 1,ni(b)
var(b)%r(i,j,k) = 10000*mpi_rank+1000*b+i*100+j*10+k
var(b)%p(i,j,k) = -var(b)%r(i,j,k)
END DO
END DO
END DO
end do
!
! find minimum and maximum address memory
!
minloc = loc(var) ; bmin = 1 ; bmint = 1
maxloc = loc(var) ; bmax = nb ; bmaxt = 3
do b = 1,NB
if(minloc-loc(var(b))>0) then
minloc = loc(var(b))
bmin = b
bmint = 1
endif
if(minloc-loc(var(b)%r)>0) then
minloc = loc(var(b)%r)
bmin = b
bmint = 2
endif
if(minloc-loc(var(b)%p)>0) then
minloc = loc(var(b)%p)
bmin = b
bmint = 3
endif
if(maxloc-loc(var(b))<0) then
maxloc = loc(var(b))
bmax = b
bmaxt = 1
endif
if(maxloc-loc(var(b)%r)<0) then
maxloc = loc(var(b)%r)
bmax = b
bmaxt = 2
endif
if(maxloc-loc(var(b)%p)<0) then
maxloc = loc(var(b)%p)
bmax = b
bmaxt = 3
endif
enddo
select case(bmint)
case(1)
min_ptr = c_loc(var(bmin))
case(2)
min_ptr = c_loc(var(bmin)%r)
case(3)
min_ptr = c_loc(var(bmin)%p)
endselect
!
! Initialize FORTRAN predefined datatypes
!
CALL h5open_f(hdferr)
!
! get double precision size in hdf5 and evaluate data size dimension in memory
!
CALL h5tget_size_f(H5T_NATIVE_DOUBLE,type_sized,hdferr)
type_size = maxloc-minloc+ni(bmax)*nj(bmax)*nk(bmax)*type_sized*2
!
! create compound datatype
!
allocate(tidp(nb))
allocate(tidr(nb))
CALL H5Tcreate_f(H5T_COMPOUND_F,type_size,vwrite,hdferr)
do b = 1,nb
tdims = (/ni(b),nj(b),nk(b)/)
CALL h5tarray_create_f(H5T_NATIVE_DOUBLE,RANK3,tdims,tidr(b),hdferr)
CALL h5tarray_create_f(H5T_NATIVE_DOUBLE,RANK3,tdims,tidp(b),hdferr)
enddo
memoff = 0
do b = 1,nb
bg = b+nb*mpi_rank
write(varname,10) bg
10 format('var_r_b',I0)
memoff = H5OFFSETOF(min_ptr,C_LOC(var(b)%r))
CALL h5tinsert_f(vwrite,varname,memoff,tidr(b),hdferr)
if(hdferr>0) write(*,*) b,'insert r:',hdferr
write(varname,11) bg
11 format('var_p_b',I0)
memoff = H5OFFSETOF(min_ptr,C_LOC(var(b)%p))
CALL h5tinsert_f(vwrite,varname,memoff,tidp(b),hdferr)
if(hdferr>0) write(*,*) b,'insert p:',hdferr
enddo
write(*,*) mpi_rank,'min_ptr,size:',type_size
!
! Setup file access property list with parallel I/O access.
!
CALL h5pcreate_f(H5P_FILE_ACCESS_F,plist_id,hdferr)
CALL h5pset_fapl_mpio_f(plist_id,mpicomm,mpiinfo,hdferr)
!
! Create the file collectively.
!
CALL h5fcreate_f(filename,H5F_ACC_TRUNC_F,file_id,hdferr,access_prp=plist_id)
CALL h5pclose_f(plist_id,hdferr)
!
! Create the data space for the dataset.
!
CALL h5screate_simple_f(rank,dimsf,filespace,hdferr)
!
! Create the dataset with default properties.
!
! CALL h5dcreate_f(file_id,dsetname,H5T_NATIVE_INTEGER,filespace,dset_id,hdferr)
CALL H5Dcreate_f(file_id,dsetname,vwrite,filespace,dset_id,hdferr)
CALL h5sclose_f(filespace,hdferr)
!
! Each process defines dataset in memory and writes it to the hyperslab
! in the file.
!
count(1) = 1
offset(1) = mpi_rank
CALL h5screate_simple_f(rank,count,memspace,hdferr)
!
! Select hyperslab in the file.
!
CALL h5dget_space_f(dset_id,filespace,hdferr)
CALL h5sselect_hyperslab_f(filespace,H5S_SELECT_SET_F,offset,count,hdferr)
!
! Initialize data buffer with trivial data.
!
! ALLOCATE (data(count(1)))
! data = mpi_rank+10
!
! Create property list for collective dataset write
!
CALL h5pcreate_f(H5P_DATASET_XFER_F,plist_id,hdferr)
CALL h5pset_dxpl_mpio_f(plist_id,H5FD_MPIO_COLLECTIVE_F,hdferr)
!
! Write the dataset collectively.
!
! CALL h5dwrite_f(dset_id,H5T_NATIVE_INTEGER,data,dimsf,hdferr, &
CALL h5dwrite_f(dset_id,vwrite,min_ptr,hdferr, &
file_space_id=filespace,mem_space_id=memspace,xfer_prp=plist_id)
!
! Write the dataset independently.
!
! CALL h5dwrite_f(dset_id, H5T_NATIVE_INTEGER, data, dimsfi, hdferr, &
! CALL h5dwrite_f(dset_id, vwrite, min_ptr, hdferr, &
! file_space_id = filespace, mem_space_id = memspace)
!
! Deallocate data buffer.
!
! DEALLOCATE (data)
!
! Close dataspaces.
!
CALL h5sclose_f(filespace,hdferr)
CALL h5sclose_f(memspace,hdferr)
!
! Close the dataset and property list.
!
CALL h5dclose_f(dset_id,hdferr)
CALL h5pclose_f(plist_id,hdferr)
CALL H5Tclose_f(vwrite,hdferr)
!
! Close the file.
!
CALL h5fclose_f(file_id,hdferr)
!
! Close FORTRAN predefined datatypes.
!
CALL h5close_f(hdferr)
!
! deallocates data
!
do b = 1,nb
deallocate (var(b)%r,var(b)%p)
END DO
deallocate (var)
deallocate (ni,nj,nk)
deallocate(tidr,tidp)
!
! terminate MPI environment
!
CALL MPI_FINALIZE(mpierror)
END PROGRAM FINALH5TEST
The issue is each rank is creating a different compound datatype, i.e., in the H5Tcreate_f, the type_size for vwrite is different depending on the rank.
The main problem is the call to H5Dcreate_f because each rank passes a different vwrite datatype definition for the dataset. HDF5 does not know how to resolve the datatype conflict between ranks when it tries to create the dataset. H5Dcreate is a collective operation, meaning all the ranks must be consistent with the parameters they pass.