Hello,
I am designing a new IO module of an existing CFD & MHD code (Pencil Code). We are in Fortran, so the first dimension is the fastest (= memory aligned) in our data arrays.
Our in-memory data representation is a large 4D array (size: nx×ny×nz×na), where the first three dimensions are the spatial extent of each processor’s subdomain (nx×ny×nz) and the fourth dimension (na) is an index of a physical quantity, which could be any number from 4 to 20, depending on the physical setup. A physical setup with npx×npy×npz processors along the three spatial directions then has a 4D size of ngx×ngy×ngz×na, where ngx = nx×npx.
We do already have one parallel HDF5 IO module, which writes large monolithic files and each physical quantity is stored in a separate 3D HDF5 dataset (an array). Unfortunately, this is slow for large setups (> 1024 processors), presumably because the combination of the data into the file representation means a lot of copying of individual x-aligned lines from the in-memory representation…
I am now thinking on new possible IO strategies to make our code’s write routines faster. In particular I am thinking of collecting along the x and y directions and create npz HDF5 files, so that output can be faster. Furthermore, I am thinking if writing the whole block of each processor’s data in its memory-aligned way into the HDF5 files would make our output faster. We would then have a different representation in the npz separate files with one 4D array of the size ngx×ngy×nz×na.
What are your thoughts on this? Would this approach significantly reduce our writing times?
Otherwise, which other proposals for IO strategies would you suggest?
Regarding chunking: So far, we do not use chunking, because we need to store also “ghost cells”, which means the outer/boundary processor’s subdomains have a size that is not equal to inner processors. But I am also thinking of separating the ghost cells into other datasets, so that we can start using chunking (and later also compression).
Thank you and best greetings,
Philippe.