HDF5 is a "self-describing" format, which means that HDF5 metadata

stored in a dataset object header allows the HDF5 C library and any

other non-C applications built on top of it, to retrieve a raw data

(i.e. elements of a multidimensional array) in the correct order.

(Let's for a second forget about HDF5, C and Fortran, Python and

Matlab )

If we have a matrix A(N,M,K), we usually count dimensions from left

to right saying that the first dimension has size N, the second

dimension has size M, the third dimension has size K, and so on.

(Now let's talk about HDF5 but without referring to any language.)

When we describe a matrix using HDF5 datatspace object, we use the

same convention (i.e. specifying dimensions from left to right): the

first dimension has size N, the second dimension has size M, the

third dimension has size K. (Aside: Please notice that this

description is valid for both C and Fortran HDF5 applications, i.e. C

and Fortran dims array needed by H5Screate_simple

(h5screate_simple_f) will have the values dims [] = {N,M,K}).

The question is: how does HDF5 know how to interpret a blob of {N x

M x K x by sizeof(datatype)} bytes of dataset raw data stored in the

file? Was A(N,M,K) stored? Or was it A(K,N,M) stored? Or any other

permutation of (K,N,M)?

HDF5 file has no clue about matrices and their dimensions, and the

languages they were written from. This is application's

responsibility to interpret data correctly and pass the correct

interpretation to the HDF5 C library to store in a file.

As it was mentioned above, dimensions of the matrix are described

using HDF5 dataspace object and are stored in the file. d integers

P1, ..., Pd, where d is a rank of a matrix, are stored in a dataspace

object header according to the following convention: the last value

- Pd is the size of the FASTEST changing dimension of the matrix,

i.e. HDF5 file spec and HDF5 C library follow C storage convention

(no wonder, it is a C library :-). Therefore there is no ambiguity in

interpreting {N x M x K x sizeof(datatype)} bytes, and HDF5 file has

enough information to interpret data correctly by any "row-major" or

"column-major" application (including bypassing HDF5 C library and

reading directly from the HDF5 file!)

Here is what is happening when HDF5 Fortran library is used:

Suppose we want to write A(N,M,K) matrix to the HDF5 file. HDF5

Fortran API describes dataspace with the first dimension being N, the

second dimension being M, the third dimension being K (as we would do

it in C and any other language). But HDF5 Fortran API also knows

that the fastest changing dimension has size N (i.e. we have

column-major order). Therefore HDF5 Fortran library instructs C

library to store K,M,N values in the dataspace object header instead

of N,M,K, since N is the size of the fastest changing dimension.

So, if we read matrix A(N,M,K) ((i.e. N x M x K x sizeof(datatype)

blob) written from Fortran by a C application, we will read it to

the matrix B(K,M,N) ( C API that requests sizes of the first, second

and third dimensions will return values K,M,N stored in the dataspace

header)

If we read matrix A(N,M,K) written from Fortran by Fortran

application, we will read it once again into B(N,M,K) ( Fortran API

that requests sizes of the first, second and third dimension will

flip an array K,M,N stored in the file and return N,M,K)

In other words: HDF5 library stores information about how to

interpret data. Interpretation follows C storage convention: the last

dimension specified for the dataspace object is the fastest changing

one. It is the responsibility of the application (in this case

FORTRAN HDF5 library) to interpret correctly the order of dimensions

and pass to/ from the HDF5 C library.

Please notice that there is no need to transpose data itself: one

only has to pass a correct interpretation of the data to the HDF5 C

Library and to make sure it is done according to the HDF5 C library

convention - the first value stored in the dataspace header

corresponds to the slowest changing dimension, ...., the last value

stored in the dataspace header corresponds to the fastest changing

dimension).