Dear community,
we are currently optimizing our GPU-parallel code with output via the adios2 library to run on TACC Vista which uses a VAST filesystem. We are experiencing extreme performance regression during the HDF5 output or file accessibility errors during the write.
More details: The main issue is severe performance degradation during ADIOS2 calls, which varies with our MPI decomposition. Profiling with NVIDIA Nsys shows a slowdown factor of approximately 300x for decomposition along one specific direction. We have attempted various compiler combinations (GNU and NVCC) and tried both the provided parallel hdf5 module and a self-built HDF5, but the issue persists.
So far, we have narrowed our issues down to a parallel write of multiple processes into one hdf5 file on VAST (while all works fine on LUSTRE). As mentioned, the performance also heavily depends on the mesh decomposition. For a 2D test array, performance is acceptable for MPI decomposition in one direction, while slowing down substantially for MPI decomposition in the other direction. VAST does not support stripe count or stripe size.
We have tried substantial troubleshooting with the support team itself, including several environment variables that control file locking. All without success.
We would be highly interested in learning about experiences with the VAST filesystem and what could be done to mitigate these performance regressions on VAST as compared to writing the same data to LUSTRE.
Thank you very much!
Jens