HDF5 is a "self-describing" format, which means that HDF5 metadata
stored in a dataset object header allows the HDF5 C library and any
other non-C applications built on top of it, to retrieve a raw data
(i.e. elements of a multidimensional array) in the correct order.
(Let's for a second forget about HDF5, C and Fortran, Python and
Matlab )
If we have a matrix A(N,M,K), we usually count dimensions from left
to right saying that the first dimension has size N, the second
dimension has size M, the third dimension has size K, and so on.
(Now let's talk about HDF5 but without referring to any language.)
When we describe a matrix using HDF5 datatspace object, we use the
same convention (i.e. specifying dimensions from left to right): the
first dimension has size N, the second dimension has size M, the
third dimension has size K. (Aside: Please notice that this
description is valid for both C and Fortran HDF5 applications, i.e. C
and Fortran dims array needed by H5Screate_simple
(h5screate_simple_f) will have the values dims [] = {N,M,K}).
The question is: how does HDF5 know how to interpret a blob of {N x
M x K x by sizeof(datatype)} bytes of dataset raw data stored in the
file? Was A(N,M,K) stored? Or was it A(K,N,M) stored? Or any other
permutation of (K,N,M)?
HDF5 file has no clue about matrices and their dimensions, and the
languages they were written from. This is application's
responsibility to interpret data correctly and pass the correct
interpretation to the HDF5 C library to store in a file.
As it was mentioned above, dimensions of the matrix are described
using HDF5 dataspace object and are stored in the file. d integers
P1, ..., Pd, where d is a rank of a matrix, are stored in a dataspace
object header according to the following convention: the last value
- Pd is the size of the FASTEST changing dimension of the matrix,
i.e. HDF5 file spec and HDF5 C library follow C storage convention
(no wonder, it is a C library :-). Therefore there is no ambiguity in
interpreting {N x M x K x sizeof(datatype)} bytes, and HDF5 file has
enough information to interpret data correctly by any "row-major" or
"column-major" application (including bypassing HDF5 C library and
reading directly from the HDF5 file!)
Here is what is happening when HDF5 Fortran library is used:
Suppose we want to write A(N,M,K) matrix to the HDF5 file. HDF5
Fortran API describes dataspace with the first dimension being N, the
second dimension being M, the third dimension being K (as we would do
it in C and any other language). But HDF5 Fortran API also knows
that the fastest changing dimension has size N (i.e. we have
column-major order). Therefore HDF5 Fortran library instructs C
library to store K,M,N values in the dataspace object header instead
of N,M,K, since N is the size of the fastest changing dimension.
So, if we read matrix A(N,M,K) ((i.e. N x M x K x sizeof(datatype)
blob) written from Fortran by a C application, we will read it to
the matrix B(K,M,N) ( C API that requests sizes of the first, second
and third dimensions will return values K,M,N stored in the dataspace
header)
If we read matrix A(N,M,K) written from Fortran by Fortran
application, we will read it once again into B(N,M,K) ( Fortran API
that requests sizes of the first, second and third dimension will
flip an array K,M,N stored in the file and return N,M,K)
In other words: HDF5 library stores information about how to
interpret data. Interpretation follows C storage convention: the last
dimension specified for the dataspace object is the fastest changing
one. It is the responsibility of the application (in this case
FORTRAN HDF5 library) to interpret correctly the order of dimensions
and pass to/ from the HDF5 C library.
Please notice that there is no need to transpose data itself: one
only has to pass a correct interpretation of the data to the HDF5 C
Library and to make sure it is done according to the HDF5 C library
convention - the first value stored in the dataspace header
corresponds to the slowest changing dimension, ...., the last value
stored in the dataspace header corresponds to the fastest changing
dimension).