[Pytables-users] Reading Fortran arrays with correct array indexing

Hi,

An HDF5/PyTables user asked whether HDF5 supports meta-information for
keeping Fortran/C ordering in datasets. By reading the docs, it seems
to me that HDF5 doesn't support this yet. Are there plans to support
this feature?

Thanks,

---------- Missatge transmès ----------

···

Subject: Re: [Pytables-users] Reading Fortran arrays with correct array
indexing
Date: Saturday 31 May 2008
From: "Milos Ilak" <milosilak@gmail.com>
To: "Francesc Alted" <faltet@pytables.com>

Hi Francesc,

thanks a lot! I didn't know MATLAB used Fortran order too. My Python
code needs to read in files written in both orders, so I just added an
attribute in my Fortran output routine which the Python code looks for
and if it is there, it transposes the data after loading.

I would have thought that the meta-information about the order would be
stored somewhere in the file. Do you know if the future versions of HDF5
will support this? Thanks again,

Milos

On Fri, May 30, 2008 at 8:22 AM, Francesc Alted <faltet@pytables.com> wrote:

A Thursday 29 May 2008, Milos Ilak escrigué:
> Hi all,
>
> I apologize if this has been discussed, but I could not find any
> information in the archives. I am creating HDF5 files with 3-D

arrays

> in Fortran 90, and I need to read them in both Python and MATLAB.
> While MATLAB recognizes the correct dimensions of the arrays,
> PyTables gets them backwards (i.e. (x,y,z) in Fortran becomes

(z,y,x)

> when PyTables reads it). I know that this is due to the fact that

the

> order in which Fortran stores arrays is different than that of
> Python, C or MATLAB, and I couldn't determine how exactly MATLAB
> 'knows' that Fortran arrays are being read.

Well, it is easy: because MATLAB writes and reads arrays in *Fortran*
order. So, if you write your arrays with Fortran, then you are not
going to have any problem to read them in the correct order from
MATLAB. However, as PyTables uses a C API to access HDF5 files, and

as

C follows a different order for matrices in memory, you will get
inverted dimensions for your Fortran created files (as it is the

case).

> I have tried using the
>
> 'isfortran' command in numpy, but I get the following error:
> >>> hh5f.root
>
> / (RootGroup) ''
> children := ['eta' (Array), 'u' (Array), 'w' (Array), 'v' (Array),
> 'y' (Array), 'x' (Array), 'z' (Array)]
>
> >>> hh5f.root.v
>
> /v (Array(16L, 33L, 32L)) ''
> atom := Float64Atom(shape=(), dflt=0.0)
> maindim := 0
> flavor := 'numpy'
> byteorder := 'little'
> chunkshape := None
>
> >>> numpy.isfortran(hh5f.root.v)
>
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/sw/lib/python2.5/site-packages/numpy/core/numeric.py", line
> 184, in isfortran
> return a.flags.fnc
> AttributeError: 'Array' object has no attribute 'flags'
>
> It seems like there is perhaps some kind of flag I should add when
> writing in Fortran to indicate that the array is in Fortran order,
> but MATLAB somehow seems to know that anyway. Any advice would be
> greatly appreciated.

You are applying the numpy isfortran() function to a pytables Array

and

not a numpy object. The correct call would be:

>>> numpy.isfortran(hh5f.root.v[:])

because the result of reading a pytables Array is a numpy object.

However, this won't tell you anything about the actual order (Fortran

or

C) in which the array was written because this meta-information is not
saved anywhere in the file (apparently HDF5 does not support this

yet).

So, unless you want to provide this info yourself by using, say, an

HDF5

attribute, your best bet is to *deduce* the ordering by knowing that
the file comes from a Fortran or a C program and *transpose* manually
your arrays after reading them (if you need to).

Hope this helps,

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

-------------------------------------------------------

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Francesc,

Hi,

An HDF5/PyTables user asked whether HDF5 supports meta-information for
keeping Fortran/C ordering in datasets. By reading the docs, it seems
to me that HDF5 doesn't support this yet. Are there plans to support
this feature?

  We don't support this currently, and there's not a lot of demand for this currently (as far as I've heard). Perhaps Elena can add some more information here...

  Quincey

···

On May 31, 2008, at 4:54 AM, Francesc Alted wrote:

Thanks,

---------- Missatge transmès ----------

Subject: Re: [Pytables-users] Reading Fortran arrays with correct array
indexing
Date: Saturday 31 May 2008
From: "Milos Ilak" <milosilak@gmail.com>
To: "Francesc Alted" <faltet@pytables.com>

Hi Francesc,

thanks a lot! I didn't know MATLAB used Fortran order too. My Python
code needs to read in files written in both orders, so I just added an
attribute in my Fortran output routine which the Python code looks for
and if it is there, it transposes the data after loading.

I would have thought that the meta-information about the order would be
stored somewhere in the file. Do you know if the future versions of HDF5
will support this? Thanks again,

Milos

On Fri, May 30, 2008 at 8:22 AM, Francesc Alted <faltet@pytables.com> > wrote:

A Thursday 29 May 2008, Milos Ilak escrigué:

Hi all,

I apologize if this has been discussed, but I could not find any
information in the archives. I am creating HDF5 files with 3-D

arrays

in Fortran 90, and I need to read them in both Python and MATLAB.
While MATLAB recognizes the correct dimensions of the arrays,
PyTables gets them backwards (i.e. (x,y,z) in Fortran becomes

(z,y,x)

when PyTables reads it). I know that this is due to the fact that

the

order in which Fortran stores arrays is different than that of
Python, C or MATLAB, and I couldn't determine how exactly MATLAB
'knows' that Fortran arrays are being read.

Well, it is easy: because MATLAB writes and reads arrays in *Fortran*
order. So, if you write your arrays with Fortran, then you are not
going to have any problem to read them in the correct order from
MATLAB. However, as PyTables uses a C API to access HDF5 files, and

as

C follows a different order for matrices in memory, you will get
inverted dimensions for your Fortran created files (as it is the

case).

I have tried using the

'isfortran' command in numpy, but I get the following error:

hh5f.root

/ (RootGroup) ''
children := ['eta' (Array), 'u' (Array), 'w' (Array), 'v' (Array),
'y' (Array), 'x' (Array), 'z' (Array)]

hh5f.root.v

/v (Array(16L, 33L, 32L)) ''
atom := Float64Atom(shape=(), dflt=0.0)
maindim := 0
flavor := 'numpy'
byteorder := 'little'
chunkshape := None

numpy.isfortran(hh5f.root.v)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/sw/lib/python2.5/site-packages/numpy/core/numeric.py", line
184, in isfortran
   return a.flags.fnc
AttributeError: 'Array' object has no attribute 'flags'

It seems like there is perhaps some kind of flag I should add when
writing in Fortran to indicate that the array is in Fortran order,
but MATLAB somehow seems to know that anyway. Any advice would be
greatly appreciated.

You are applying the numpy isfortran() function to a pytables Array

and

not a numpy object. The correct call would be:

numpy.isfortran(hh5f.root.v[:])

because the result of reading a pytables Array is a numpy object.

However, this won't tell you anything about the actual order (Fortran

or

C) in which the array was written because this meta-information is not
saved anywhere in the file (apparently HDF5 does not support this

yet).

So, unless you want to provide this info yourself by using, say, an

HDF5

attribute, your best bet is to *deduce* the ordering by knowing that
the file comes from a Fortran or a C program and *transpose* manually
your arrays after reading them (if you need to).

Hope this helps,

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

-------------------------------------------------------

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Francesc and All,

If you only knew how many times this question was asked and how many attempts were done to explain :slight_smile: Here is another one. It is little-bit lengthy, please forgive :slight_smile:
But I hope it will shed a light on why HDF5 doesn't support meta-information for Fortran/C ordering in datasets (the short answer is - it actually does, but in a more abstract way).

HDF5 is a "self-describing" format, which means that HDF5 metadata stored in a dataset object header allows the HDF5 C library and any other non-C applications built on top of it, to retrieve a raw data (i.e. elements of a multidimensional array) in the correct order.

(Let's for a second forget about HDF5, C and Fortran, Python and Matlab :slight_smile: )

If we have a matrix A(N,M,K), we usually count dimensions from left to right saying that the first dimension has size N, the second dimension has size M, the third dimension has size K, and so on.

(Now let's talk about HDF5 but without referring to any language.)

When we describe a matrix using HDF5 datatspace object, we use the same convention (i.e. specifying dimensions from left to right): the first dimension has size N, the second dimension has size M, the third dimension has size K. (Aside: Please notice that this description is valid for both C and Fortran HDF5 applications, i.e. C and Fortran dims array needed by H5Screate_simple (h5screate_simple_f) will have the values dims [] = {N,M,K}).

The question is: how does HDF5 know how to interpret a blob of {N x M x K x by sizeof(datatype)} bytes of dataset raw data stored in the file? Was A(N,M,K) stored? Or was it A(K,N,M) stored? Or any other permutation of (K,N,M)?

HDF5 file has no clue about matrices and their dimensions, and the languages they were written from. This is application's responsibility to interpret data correctly and pass the correct interpretation to the HDF5 C library to store in a file.

As it was mentioned above, dimensions of the matrix are described using HDF5 dataspace object and are stored in the file. d integers P1, ..., Pd, where d is a rank of a matrix, are stored in a dataspace object header according to the following convention: the last value - Pd is the size of the FASTEST changing dimension of the matrix, i.e. HDF5 file spec and HDF5 C library follow C storage convention (no wonder, it is a C library :-). Therefore there is no ambiguity in interpreting {N x M x K x sizeof(datatype)} bytes, and HDF5 file has enough information to interpret data correctly by any "row-major" or "column-major" application (including bypassing HDF5 C library and reading directly from the HDF5 file!)

Here is what is happening when HDF5 Fortran library is used:

Suppose we want to write A(N,M,K) matrix to the HDF5 file. HDF5 Fortran API describes dataspace with the first dimension being N, the second dimension being M, the third dimension being K (as we would do it in C and any other language). But HDF5 Fortran API also knows that the fastest changing dimension has size N (i.e. we have column-major order). Therefore HDF5 Fortran library instructs C library to store K,M,N values in the dataspace object header instead of N,M,K, since N is the size of the fastest changing dimension.

So, if we read matrix A(N,M,K) ((i.e. N x M x K x sizeof(datatype) blob) written from Fortran by a C application, we will read it to the matrix B(K,M,N) ( C API that requests sizes of the first, second and third dimensions will return values K,M,N stored in the dataspace header)

If we read matrix A(N,M,K) written from Fortran by Fortran application, we will read it once again into B(N,M,K) ( Fortran API that requests sizes of the first, second and third dimension will flip an array K,M,N stored in the file and return N,M,K)

In other words: HDF5 library stores information about how to interpret data. Interpretation follows C storage convention: the last dimension specified for the dataspace object is the fastest changing one. It is the responsibility of the application (in this case FORTRAN HDF5 library) to interpret correctly the order of dimensions and pass to/from the HDF5 C library.

Please notice that there is no need to transpose data itself: one only has to pass a correct interpretation of the data to the HDF5 C Library and to make sure it is done according to the HDF5 C library convention - the first value stored in the dataspace header corresponds to the slowest changing dimension, ...., the last value stored in the dataspace header corresponds to the fastest changing dimension).

Please let me know if my explanation made things worse. Frankly speaking I think it did :wink: but I tried.....

Elena

···

On May 31, 2008, at 4:54 AM, Francesc Alted wrote:

Hi,

An HDF5/PyTables user asked whether HDF5 supports meta-information for
keeping Fortran/C ordering in datasets. By reading the docs, it seems
to me that HDF5 doesn't support this yet. Are there plans to support
this feature?

Thanks,

---------- Missatge transmès ----------

Subject: Re: [Pytables-users] Reading Fortran arrays with correct array
indexing
Date: Saturday 31 May 2008
From: "Milos Ilak" <milosilak@gmail.com>
To: "Francesc Alted" <faltet@pytables.com>

Hi Francesc,

thanks a lot! I didn't know MATLAB used Fortran order too. My Python
code needs to read in files written in both orders, so I just added an
attribute in my Fortran output routine which the Python code looks for
and if it is there, it transposes the data after loading.

I would have thought that the meta-information about the order would be
stored somewhere in the file. Do you know if the future versions of HDF5
will support this? Thanks again,

Milos

On Fri, May 30, 2008 at 8:22 AM, Francesc Alted <faltet@pytables.com> > wrote:

A Thursday 29 May 2008, Milos Ilak escrigué:

Hi all,

I apologize if this has been discussed, but I could not find any
information in the archives. I am creating HDF5 files with 3-D

arrays

in Fortran 90, and I need to read them in both Python and MATLAB.
While MATLAB recognizes the correct dimensions of the arrays,
PyTables gets them backwards (i.e. (x,y,z) in Fortran becomes

(z,y,x)

when PyTables reads it). I know that this is due to the fact that

the

order in which Fortran stores arrays is different than that of
Python, C or MATLAB, and I couldn't determine how exactly MATLAB
'knows' that Fortran arrays are being read.

Well, it is easy: because MATLAB writes and reads arrays in *Fortran*
order. So, if you write your arrays with Fortran, then you are not
going to have any problem to read them in the correct order from
MATLAB. However, as PyTables uses a C API to access HDF5 files, and

as

C follows a different order for matrices in memory, you will get
inverted dimensions for your Fortran created files (as it is the

case).

I have tried using the

'isfortran' command in numpy, but I get the following error:

hh5f.root

/ (RootGroup) ''
children := ['eta' (Array), 'u' (Array), 'w' (Array), 'v' (Array),
'y' (Array), 'x' (Array), 'z' (Array)]

hh5f.root.v

/v (Array(16L, 33L, 32L)) ''
atom := Float64Atom(shape=(), dflt=0.0)
maindim := 0
flavor := 'numpy'
byteorder := 'little'
chunkshape := None

numpy.isfortran(hh5f.root.v)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/sw/lib/python2.5/site-packages/numpy/core/numeric.py", line
184, in isfortran
   return a.flags.fnc
AttributeError: 'Array' object has no attribute 'flags'

It seems like there is perhaps some kind of flag I should add when
writing in Fortran to indicate that the array is in Fortran order,
but MATLAB somehow seems to know that anyway. Any advice would be
greatly appreciated.

You are applying the numpy isfortran() function to a pytables Array

and

not a numpy object. The correct call would be:

numpy.isfortran(hh5f.root.v[:])

because the result of reading a pytables Array is a numpy object.

However, this won't tell you anything about the actual order (Fortran

or

C) in which the array was written because this meta-information is not
saved anywhere in the file (apparently HDF5 does not support this

yet).

So, unless you want to provide this info yourself by using, say, an

HDF5

attribute, your best bet is to *deduce* the ordering by knowing that
the file comes from a Fortran or a C program and *transpose* manually
your arrays after reading them (if you need to).

Hope this helps,

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

-------------------------------------------------------

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Hi Fransesc,
The HDF group decided not to support this within the HDF5 format. We had done so in the original version of HDF ("HDF4"), and found it caused a great deal of confusion. Another option would be to have an attribute with a reserved name describing how the data was ordered. It would be nice to have such a convention, but as Quincey says, there hasn't been sufficient interest to do even that.

Mike

···

At 03:58 PM 5/31/2008, Quincey Koziol wrote:

Hi Francesc,

On May 31, 2008, at 4:54 AM, Francesc Alted wrote:

Hi,

An HDF5/PyTables user asked whether HDF5 supports meta-information for
keeping Fortran/C ordering in datasets. By reading the docs, it seems
to me that HDF5 doesn't support this yet. Are there plans to support
this feature?

        We don't support this currently, and there's not a lot of demand for
this currently (as far as I've heard). Perhaps Elena can add some
more information here...

        Quincey

Thanks,

---------- Missatge transmès ----------

Subject: Re: [Pytables-users] Reading Fortran arrays with correct
array
indexing
Date: Saturday 31 May 2008
From: "Milos Ilak" <milosilak@gmail.com>
To: "Francesc Alted" <faltet@pytables.com>

Hi Francesc,

thanks a lot! I didn't know MATLAB used Fortran order too. My Python
code needs to read in files written in both orders, so I just added an
attribute in my Fortran output routine which the Python code looks for
and if it is there, it transposes the data after loading.

I would have thought that the meta-information about the order would
be
stored somewhere in the file. Do you know if the future versions of
HDF5
will support this? Thanks again,

Milos

On Fri, May 30, 2008 at 8:22 AM, Francesc Alted <faltet@pytables.com> >>wrote:

A Thursday 29 May 2008, Milos Ilak escrigué:

Hi all,

I apologize if this has been discussed, but I could not find any
information in the archives. I am creating HDF5 files with 3-D

arrays

in Fortran 90, and I need to read them in both Python and MATLAB.
While MATLAB recognizes the correct dimensions of the arrays,
PyTables gets them backwards (i.e. (x,y,z) in Fortran becomes

(z,y,x)

when PyTables reads it). I know that this is due to the fact that

the

order in which Fortran stores arrays is different than that of
Python, C or MATLAB, and I couldn't determine how exactly MATLAB
'knows' that Fortran arrays are being read.

Well, it is easy: because MATLAB writes and reads arrays in *Fortran*
order. So, if you write your arrays with Fortran, then you are not
going to have any problem to read them in the correct order from
MATLAB. However, as PyTables uses a C API to access HDF5 files, and

as

C follows a different order for matrices in memory, you will get
inverted dimensions for your Fortran created files (as it is the

case).

I have tried using the

'isfortran' command in numpy, but I get the following error:

hh5f.root

/ (RootGroup) ''
children := ['eta' (Array), 'u' (Array), 'w' (Array), 'v' (Array),
'y' (Array), 'x' (Array), 'z' (Array)]

hh5f.root.v

/v (Array(16L, 33L, 32L)) ''
atom := Float64Atom(shape=(), dflt=0.0)
maindim := 0
flavor := 'numpy'
byteorder := 'little'
chunkshape := None

numpy.isfortran(hh5f.root.v)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/sw/lib/python2.5/site-packages/numpy/core/numeric.py", line
184, in isfortran
   return a.flags.fnc
AttributeError: 'Array' object has no attribute 'flags'

It seems like there is perhaps some kind of flag I should add when
writing in Fortran to indicate that the array is in Fortran order,
but MATLAB somehow seems to know that anyway. Any advice would be
greatly appreciated.

You are applying the numpy isfortran() function to a pytables Array

and

not a numpy object. The correct call would be:

numpy.isfortran(hh5f.root.v[:])

because the result of reading a pytables Array is a numpy object.

However, this won't tell you anything about the actual order (Fortran

or

C) in which the array was written because this meta-information is
not
saved anywhere in the file (apparently HDF5 does not support this

yet).

So, unless you want to provide this info yourself by using, say, an

HDF5

attribute, your best bet is to *deduce* the ordering by knowing that
the file comes from a Fortran or a C program and *transpose* manually
your arrays after reading them (if you need to).

Hope this helps,

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

-------------------------------------------------------

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org .
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

--
Mike Folk The HDF Group http://hdfgroup.org 217.244.0647
1901 So. First St., Suite C-2, Champaign IL 61820

Elena,

Thanks for your detailed explanation. After reading it, and in my
opinion, it is not completely true that HDF5 does implement the
C/Fortran ordering meta-information (even in a more abstract way, as
you said), because, as it is now, there is always an ambiguity on how
to interpret the ordering of data on-disk.

I think the key point in your exposition can be resumed in the next
sentence:

"""
Therefore HDF5 Fortran library instructs C library to store
K,M,N values in the dataspace object header instead of N,M,K, since N
is the size of the fastest changing dimension.
"""

So, what HDF5 is actually ensuring is the consistency between the the
order of the dimensions in the dataspace and *fastest changing
dimension* ordering in memory of the user datasets, but not the
absolute *C/Fortran* ordering. This is what leads to the reported
ambiguity in the dimension ordering of the datasets when you try to
read an HDF5 file that was written in Fortran from a C-based program
(or vice versa).

At first sight, I'd have preferred that, provided that HDF5 has a C
ordering convention, when someone is making use of the Fortran
wrappers, that HDF5 itself would have transposed the *data* to be saved
(instead of just "tranposing" the *dimension ordering*), so that the
interpretation of both data and dimensionality ordering would have been
completely unambiguous. However, I guess that you have chosen not do
that in order to not penalize the performance of Fortran users
(transposing the data is a quite costly operation). In some way, you
have sacrificed data portability between C/Fortran users for the sake
of performance, and I agree that this is a sensible approach for an
efficient library like HDF5 tries to be.

Having said that, and although HDF5 already does a terrific work in
terms of cross-platform data portability by supporting metadata
information for platform independent data types (including endianess),
failing to support specific metadata about C/Fortran ordering is, IMHO,
a serious design fault in terms of portability. That could easily be
solved by adding the C/Fortran metadata, so that users can easily
identify the original *intended* data ordering, and give them a chance
to correctly interpret that ordering. That way, they would be able to
choose whether to transpose the *data* at loading time in order
efficiently deal with that data in-memory or just add some metainfo to
their data containers (for example, NumPy does support this) stating
that the in-memory ordering is different than the native one for the
reading platform.

Moreover, providing this C/Fortran ordering metadata is completely
backward compatible, so my vote is +1 for HDF5 supporting it in the
future.

Thanks,
  Francesc

A Sunday 01 June 2008, Elena Pourmal escrigué:

···

Hi Francesc and All,

If you only knew how many times this question was asked and how many
attempts were done to explain :slight_smile: Here is another one. It is little-
bit lengthy, please forgive :slight_smile:
But I hope it will shed a light on why HDF5 doesn't support meta-
information for Fortran/C ordering in datasets (the short answer is -
it actually does, but in a more abstract way).

HDF5 is a "self-describing" format, which means that HDF5 metadata
stored in a dataset object header allows the HDF5 C library and any
other non-C applications built on top of it, to retrieve a raw data
(i.e. elements of a multidimensional array) in the correct order.

(Let's for a second forget about HDF5, C and Fortran, Python and
Matlab :slight_smile: )

If we have a matrix A(N,M,K), we usually count dimensions from left
to right saying that the first dimension has size N, the second
dimension has size M, the third dimension has size K, and so on.

(Now let's talk about HDF5 but without referring to any language.)

When we describe a matrix using HDF5 datatspace object, we use the
same convention (i.e. specifying dimensions from left to right): the
first dimension has size N, the second dimension has size M, the
third dimension has size K. (Aside: Please notice that this
description is valid for both C and Fortran HDF5 applications, i.e. C
and Fortran dims array needed by H5Screate_simple
(h5screate_simple_f) will have the values dims [] = {N,M,K}).

The question is: how does HDF5 know how to interpret a blob of {N x
M x K x by sizeof(datatype)} bytes of dataset raw data stored in the
file? Was A(N,M,K) stored? Or was it A(K,N,M) stored? Or any other
permutation of (K,N,M)?

HDF5 file has no clue about matrices and their dimensions, and the
languages they were written from. This is application's
responsibility to interpret data correctly and pass the correct
interpretation to the HDF5 C library to store in a file.

As it was mentioned above, dimensions of the matrix are described
using HDF5 dataspace object and are stored in the file. d integers
P1, ..., Pd, where d is a rank of a matrix, are stored in a dataspace
object header according to the following convention: the last value
- Pd is the size of the FASTEST changing dimension of the matrix,
i.e. HDF5 file spec and HDF5 C library follow C storage convention
(no wonder, it is a C library :-). Therefore there is no ambiguity in
interpreting {N x M x K x sizeof(datatype)} bytes, and HDF5 file has
enough information to interpret data correctly by any "row-major" or
"column-major" application (including bypassing HDF5 C library and
reading directly from the HDF5 file!)

Here is what is happening when HDF5 Fortran library is used:

Suppose we want to write A(N,M,K) matrix to the HDF5 file. HDF5
Fortran API describes dataspace with the first dimension being N, the
second dimension being M, the third dimension being K (as we would do
it in C and any other language). But HDF5 Fortran API also knows
that the fastest changing dimension has size N (i.e. we have
column-major order). Therefore HDF5 Fortran library instructs C
library to store K,M,N values in the dataspace object header instead
of N,M,K, since N is the size of the fastest changing dimension.

So, if we read matrix A(N,M,K) ((i.e. N x M x K x sizeof(datatype)
blob) written from Fortran by a C application, we will read it to
the matrix B(K,M,N) ( C API that requests sizes of the first, second
and third dimensions will return values K,M,N stored in the dataspace
header)

If we read matrix A(N,M,K) written from Fortran by Fortran
application, we will read it once again into B(N,M,K) ( Fortran API
that requests sizes of the first, second and third dimension will
flip an array K,M,N stored in the file and return N,M,K)

In other words: HDF5 library stores information about how to
interpret data. Interpretation follows C storage convention: the last
dimension specified for the dataspace object is the fastest changing
one. It is the responsibility of the application (in this case
FORTRAN HDF5 library) to interpret correctly the order of dimensions
and pass to/ from the HDF5 C library.

Please notice that there is no need to transpose data itself: one
only has to pass a correct interpretation of the data to the HDF5 C
Library and to make sure it is done according to the HDF5 C library
convention - the first value stored in the dataspace header
corresponds to the slowest changing dimension, ...., the last value
stored in the dataspace header corresponds to the fastest changing
dimension).

Please let me know if my explanation made things worse. Frankly
speaking I think it did :wink: but I tried.....

Elena

On May 31, 2008, at 4:54 AM, Francesc Alted wrote:
> Hi,
>
> An HDF5/PyTables user asked whether HDF5 supports meta-information
> for keeping Fortran/C ordering in datasets. By reading the docs,
> it seems to me that HDF5 doesn't support this yet. Are there plans
> to support this feature?
>
> Thanks,
>
> ---------- Missatge transmès ----------
>
> Subject: Re: [Pytables-users] Reading Fortran arrays with correct
> array
> indexing
> Date: Saturday 31 May 2008
> From: "Milos Ilak" <milosilak@gmail.com>
> To: "Francesc Alted" <faltet@pytables.com>
>
> Hi Francesc,
>
> thanks a lot! I didn't know MATLAB used Fortran order too. My
> Python code needs to read in files written in both orders, so I
> just added an attribute in my Fortran output routine which the
> Python code looks for and if it is there, it transposes the data
> after loading.
>
> I would have thought that the meta-information about the order
> would be
> stored somewhere in the file. Do you know if the future versions of
> HDF5
> will support this? Thanks again,
>
> Milos
>
>
> On Fri, May 30, 2008 at 8:22 AM, Francesc Alted > > <faltet@pytables.com> > > > > wrote:
>> A Thursday 29 May 2008, Milos Ilak escrigué:
>>> Hi all,
>>>
>>> I apologize if this has been discussed, but I could not find any
>>> information in the archives. I am creating HDF5 files with 3-D
>
> arrays
>
>>> in Fortran 90, and I need to read them in both Python and MATLAB.
>>> While MATLAB recognizes the correct dimensions of the arrays,
>>> PyTables gets them backwards (i.e. (x,y,z) in Fortran becomes
>
> (z,y,x)
>
>>> when PyTables reads it). I know that this is due to the fact that
>
> the
>
>>> order in which Fortran stores arrays is different than that of
>>> Python, C or MATLAB, and I couldn't determine how exactly MATLAB
>>> 'knows' that Fortran arrays are being read.
>>
>> Well, it is easy: because MATLAB writes and reads arrays in
>> *Fortran* order. So, if you write your arrays with Fortran, then
>> you are not going to have any problem to read them in the correct
>> order from MATLAB. However, as PyTables uses a C API to access
>> HDF5 files, and
>
> as
>
>> C follows a different order for matrices in memory, you will get
>> inverted dimensions for your Fortran created files (as it is the
>
> case).
>
>>> I have tried using the
>>>
>>> 'isfortran' command in numpy, but I get the following error:
>>>>>> hh5f.root
>>>
>>> / (RootGroup) ''
>>> children := ['eta' (Array), 'u' (Array), 'w' (Array), 'v'
>>> (Array), 'y' (Array), 'x' (Array), 'z' (Array)]
>>>
>>>>>> hh5f.root.v
>>>
>>> /v (Array(16L, 33L, 32L)) ''
>>> atom := Float64Atom(shape=(), dflt=0.0)
>>> maindim := 0
>>> flavor := 'numpy'
>>> byteorder := 'little'
>>> chunkshape := None
>>>
>>>>>> numpy.isfortran(hh5f.root.v)
>>>
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> File "/sw/lib/python2.5/site-packages/numpy/core/numeric.py",
>>> line 184, in isfortran
>>> return a.flags.fnc
>>> AttributeError: 'Array' object has no attribute 'flags'
>>>
>>> It seems like there is perhaps some kind of flag I should add
>>> when writing in Fortran to indicate that the array is in Fortran
>>> order, but MATLAB somehow seems to know that anyway. Any advice
>>> would be greatly appreciated.
>>
>> You are applying the numpy isfortran() function to a pytables
>> Array
>
> and
>
>> not a numpy object. The correct call would be:
>>>>> numpy.isfortran(hh5f.root.v[:])
>>
>> because the result of reading a pytables Array is a numpy object.
>>
>> However, this won't tell you anything about the actual order
>> (Fortran
>
> or
>
>> C) in which the array was written because this meta-information is
>> not
>> saved anywhere in the file (apparently HDF5 does not support this
>
> yet).
>
>> So, unless you want to provide this info yourself by using, say,
>> an
>
> HDF5
>
>> attribute, your best bet is to *deduce* the ordering by knowing
>> that the file comes from a Fortran or a C program and *transpose*
>> manually your arrays after reading them (if you need to).
>>
>> Hope this helps,
>>
>> --
>> Francesc Alted
>> Freelance developer
>> Tel +34-964-282-249
>>
>> ------------------------------------------------------------------
>>------- This SF.net email is sponsored by: Microsoft
>> Defy all challenges. Microsoft(R) Visual Studio 2008.
>> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
>> _______________________________________________
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
> -------------------------------------------------------
>
> --
> Francesc Alted
> Freelance developer
> Tel +34-964-282-249
>
> -------------------------------------------------------------------
>--- This mailing list is for HDF software users discussion.
> To subscribe to this list, send a message to
> hdf-forum-subscribe@hdfgroup.org .
> To unsubscribe, send a message to
> hdf-forum-unsubscribe@hdfgroup.org.

---------------------------------------------------------------------
- This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to
hdf-forum-subscribe@hdfgroup.org. To unsubscribe, send a message to
hdf-forum-unsubscribe@hdfgroup.org.

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

I think that when writing Fortran arrays into HDF5, the shape
information should be reversed to make the last axis the fastest varying
one and make it appear in C order. In that way you do not need to keep
info around about Fortran or C ordering. When reading back, you can do
the same to make it Fortran again. No transpose of data is needed; only
reversal of axes info.

Also note that numpy nominally supports Fortran ordering, but in
practice it only supports C-ordering.
E.g. when adding 1 to an array in Fortran order, the result is in C
order. Also numpy operations on arrays in Fortran order are twice as
slow as those in C order, because it traverses the array in an
inefficient way (so the cache behaviour is terrible).

Ger

Mike Folk <mfolk@hdfgroup.org> 06/01/08 12:34 AM >>>

Hi Fransesc,
The HDF group decided not to support this within
the HDF5 format. We had done so in the original
version of HDF ("HDF4"), and found it caused a
great deal of confusion. Another option would be
to have an attribute with a reserved name
describing how the data was ordered. It would be
nice to have such a convention, but as Quincey
says, there hasn't been sufficient interest to do even that.

Mike

Hi Francesc,

Hi,

An HDF5/PyTables user asked whether HDF5 supports meta-information

for

keeping Fortran/C ordering in datasets. By reading the docs, it

seems

to me that HDF5 doesn't support this yet. Are there plans to

support

this feature?

        We don't support this currently, and
there's not a lot of demand for
this currently (as far as I've heard). Perhaps Elena can add some
more information here...

        Quincey

Thanks,

---------- Missatge transmès ----------

Subject: Re: [Pytables-users] Reading Fortran arrays with correct
array
indexing
Date: Saturday 31 May 2008
From: "Milos Ilak" <milosilak@gmail.com>
To: "Francesc Alted" <faltet@pytables.com>

Hi Francesc,

thanks a lot! I didn't know MATLAB used Fortran order too. My Python
code needs to read in files written in both orders, so I just added

an

attribute in my Fortran output routine which the Python code looks

for

and if it is there, it transposes the data after loading.

I would have thought that the meta-information about the order would
be
stored somewhere in the file. Do you know if the future versions of
HDF5
will support this? Thanks again,

Milos

A Thursday 29 May 2008, Milos Ilak escrigué:

Hi all,

I apologize if this has been discussed, but I could not find any
information in the archives. I am creating HDF5 files with 3-D

arrays

in Fortran 90, and I need to read them in both Python and MATLAB.
While MATLAB recognizes the correct dimensions of the arrays,
PyTables gets them backwards (i.e. (x,y,z) in Fortran becomes

(z,y,x)

when PyTables reads it). I know that this is due to the fact that

the

order in which Fortran stores arrays is different than that of
Python, C or MATLAB, and I couldn't determine how exactly MATLAB
'knows' that Fortran arrays are being read.

Well, it is easy: because MATLAB writes and reads arrays in

*Fortran*

order. So, if you write your arrays with Fortran, then you are not
going to have any problem to read them in the correct order from
MATLAB. However, as PyTables uses a C API to access HDF5 files,

and

as

C follows a different order for matrices in memory, you will get
inverted dimensions for your Fortran created files (as it is the

case).

I have tried using the

'isfortran' command in numpy, but I get the following error:

hh5f.root

/ (RootGroup) ''
children := ['eta' (Array), 'u' (Array), 'w' (Array), 'v'

(Array),

'y' (Array), 'x' (Array), 'z' (Array)]

hh5f.root.v

/v (Array(16L, 33L, 32L)) ''
atom := Float64Atom(shape=(), dflt=0.0)
maindim := 0
flavor := 'numpy'
byteorder := 'little'
chunkshape := None

numpy.isfortran(hh5f.root.v)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/sw/lib/python2.5/site-packages/numpy/core/numeric.py",

line

184, in isfortran
   return a.flags.fnc
AttributeError: 'Array' object has no attribute 'flags'

It seems like there is perhaps some kind of flag I should add when
writing in Fortran to indicate that the array is in Fortran order,
but MATLAB somehow seems to know that anyway. Any advice would be
greatly appreciated.

You are applying the numpy isfortran() function to a pytables Array

and

not a numpy object. The correct call would be:

numpy.isfortran(hh5f.root.v[:])

because the result of reading a pytables Array is a numpy object.

However, this won't tell you anything about the actual order

(Fortran

or

C) in which the array was written because this meta-information is
not
saved anywhere in the file (apparently HDF5 does not support this

yet).

So, unless you want to provide this info yourself by using, say, an

HDF5

attribute, your best bet is to *deduce* the ordering by knowing

that

the file comes from a Fortran or a C program and *transpose*

manually

your arrays after reading them (if you need to).

Hope this helps,

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

-------------------------------------------------------

--
Francesc Alted
Freelance developer
Tel +34-964-282-249

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to
hdf-forum-subscribe@hdfgroup.org .
To unsubscribe, send a message to

hdf-forum-unsubscribe@hdfgroup.org.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to

hdf-forum-subscribe@hdfgroup.org.

···

At 03:58 PM 5/31/2008, Quincey Koziol wrote:

On May 31, 2008, at 4:54 AM, Francesc Alted wrote:

On Fri, May 30, 2008 at 8:22 AM, Francesc Alted <faltet@pytables.com> >>wrote:

To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

--
Mike Folk The HDF Group http://hdfgroup.org 217.244.0647
1901 So. First St., Suite C-2, Champaign IL 61820

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to
hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.