HD5dump

Pradeep_Jha1 · February 27, 2013, 12:32pm

Hello,

I am trying to convert a unformatted data file created by fortran into the
*h5 format. To do this I am using this
program<http://www.hdfgroup.org/ftp/HDF5/examples/introductory/F90/h5_rdwt.f90>provided
on the website as the base.

I am dealing with a data of size Nx by Ny by Nz. So I changed the "dims"
declaration in the code to something like

···

-------------------------------------------

**** code ****

INTEGER :: Nx, Ny, Nz

INTEGER(HSIZE_T), DIMENSION(3) :: dims

**** code ****

dims(1) = Nx

dims(2) = Ny

dims(3) = Nz

**** code ****

--------------------------------------------

This code is working perfectly fine. But when I am doing a "h5dump -H" on
the output file, the output is:

HDF5 "output_file" {
GROUP "/" {
   DATASET "variable" {
      DATATYPE H5T_IEEE_F32LE
      DATASPACE SIMPLE { ( Nz,Ny,Nx ) / ( Nz,Ny,Nx) }
   }
}
}

Can you please explain is this how the order of Nz,Ny and Nx should be?
Does this represent a cube of size Nx by Ny by Nz? Or does dims(1) actually
represent the z dimension and I should assign it the value Nz?

Thank you,
Pradeep

brtnfld · February 27, 2013, 3:25pm

The HDF5 file uses C storage conventions, which is why the matrix is transposed.

See http://www.hdfgroup.org/HDF5/doc/UG/UG_frame12Dataspaces.html

section: 7.3.2.5. C versus Fortran Dataspaces

···

On 2013-02-27 06:32, Pradeep Jha wrote:

Hello,

I am trying to convert a unformatted data file created by fortran
into the *h5 format. To do this I am using this program [1] provided
on the website as the base.

I am dealing with a data of size Nx by Ny by Nz. So I changed the
"dims" declaration in the code to something like

-------------------------------------------

**** code ****

INTEGER :: Nx, Ny, Nz

INTEGER(HSIZE_T), DIMENSION(3) :: dims

**** code ****
 dims\(1\) = Nx                                                    
                                               
 dims\(2\) = Ny                                                    
                                              
 dims\(3\) = Nz 
**** code ****

--------------------------------------------

This code is working perfectly fine. But when I am doing a "h5dump
-H" on the output file, the output is:

HDF5 "output_file" {
GROUP "/" {
DATASET "variable" {
DATATYPE H5T_IEEE_F32LE
DATASPACE SIMPLE { ( Nz,Ny,Nx ) / ( Nz,Ny,Nx) }
}
}

Can you please explain is this how the order of Nz,Ny and Nx should
be? Does this represent a cube of size Nx by Ny by Nz? Or does dims(1)
actually represent the z dimension and I should assign it the value
Nz?

Thank you,
Pradeep

Links:
------
[1] http://www.hdfgroup.org/ftp/HDF5/examples/introductory/F90/h5_rdwt.f90

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Pradeep_Jha1 · February 28, 2013, 3:01am

Thanks for the response. So from what I understand, the HD5 fortran wrapper
automatically transposes the matrix to store it in the C storage
conventions. So puttings "dims(1) = Nx" and "dims(3) = Nz" is correct. HDF5
fortran wrapper is just transposing the data inside the program before
storing it in the final h5 format.

But this is confusing me about something.

I convert my original unformatted data written by fortran to h5 format so
that I can visualize the original data using a software (Paraview). Does
that mean that the data I will visualize using Paraview and the h5 data
file will be a transposed data of what I originally intended to visualize?

And if that is true, are there any simple ways to make sure that the HDF5
stores untransposed data? Will I need to pass a transposed data to HDF5
fortran wrapper to ensure this?

Thank you again for the response,
Pradeep

···

2013/2/28 <brtnfld@hdfgroup.org>

The HDF5 file uses C storage conventions, which is why the matrix is
transposed.

See http://www.hdfgroup.org/HDF5/**doc/UG/UG_frame12Dataspaces.**html<http://www.hdfgroup.org/HDF5/doc/UG/UG_frame12Dataspaces.html>

section: 7.3.2.5. C versus Fortran Dataspaces

On 2013-02-27 06:32, Pradeep Jha wrote:

Hello,

I am trying to convert a unformatted data file created by fortran
into the *h5 format. To do this I am using this program [1] provided

on the website as the base.

I am dealing with a data of size Nx by Ny by Nz. So I changed the
"dims" declaration in the code to something like

------------------------------**-------------

**** code ****

INTEGER :: Nx, Ny, Nz

INTEGER(HSIZE_T), DIMENSION(3) :: dims

**** code ****

     dims(1) = Nx

     dims(2) = Ny

     dims(3) = Nz

**** code ****

------------------------------**--------------

This code is working perfectly fine. But when I am doing a "h5dump
-H" on the output file, the output is:

HDF5 "output_file" {
GROUP "/" {
   DATASET "variable" {
      DATATYPE H5T_IEEE_F32LE
      DATASPACE SIMPLE { ( Nz,Ny,Nx ) / ( Nz,Ny,Nx) }
   }
}
}

Can you please explain is this how the order of Nz,Ny and Nx should
be? Does this represent a cube of size Nx by Ny by Nz? Or does dims(1)
actually represent the z dimension and I should assign it the value
Nz?

Thank you,
Pradeep

Links:
------
[1] http://www.hdfgroup.org/ftp/**HDF5/examples/introductory/**
F90/h5_rdwt.f90<http://www.hdfgroup.org/ftp/HDF5/examples/introductory/F90/h5_rdwt.f90>

______________________________**_________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/**mailman/listinfo/hdf-forum_**hdfgroup.org<http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org>

______________________________**_________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/**mailman/listinfo/hdf-forum_**hdfgroup.org<http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org>

Pradeep_Jha1 · March 4, 2013, 2:42am

Hello,

can someone please clarify my doubt please? I would really appreciate it.

Thanks,
Pradeep

···

2013/2/28 Pradeep Jha <pradeep@ccs.engg.nagoya-u.ac.jp>

Thanks for the response. So from what I understand, the HD5 fortran
wrapper automatically transposes the matrix to store it in the C storage
conventions. So puttings "dims(1) = Nx" and "dims(3) = Nz" is correct. HDF5
fortran wrapper is just transposing the data inside the program before
storing it in the final h5 format.

But this is confusing me about something.

I convert my original unformatted data written by fortran to h5 format so
that I can visualize the original data using a software (Paraview). Does
that mean that the data I will visualize using Paraview and the h5 data
file will be a transposed data of what I originally intended to visualize?

And if that is true, are there any simple ways to make sure that the HDF5
stores untransposed data? Will I need to pass a transposed data to HDF5
fortran wrapper to ensure this?

Thank you again for the response,
Pradeep

2013/2/28 <brtnfld@hdfgroup.org>

The HDF5 file uses C storage conventions, which is why the matrix is

transposed.

See http://www.hdfgroup.org/HDF5/**doc/UG/UG_frame12Dataspaces.**html<http://www.hdfgroup.org/HDF5/doc/UG/UG_frame12Dataspaces.html>

section: 7.3.2.5. C versus Fortran Dataspaces

On 2013-02-27 06:32, Pradeep Jha wrote:

Hello,

I am trying to convert a unformatted data file created by fortran
into the *h5 format. To do this I am using this program [1] provided

on the website as the base.

I am dealing with a data of size Nx by Ny by Nz. So I changed the
"dims" declaration in the code to something like

------------------------------**-------------

**** code ****

INTEGER :: Nx, Ny, Nz

INTEGER(HSIZE_T), DIMENSION(3) :: dims

**** code ****

     dims(1) = Nx

     dims(2) = Ny

     dims(3) = Nz

**** code ****

------------------------------**--------------

This code is working perfectly fine. But when I am doing a "h5dump
-H" on the output file, the output is:

HDF5 "output_file" {
GROUP "/" {
   DATASET "variable" {
      DATATYPE H5T_IEEE_F32LE
      DATASPACE SIMPLE { ( Nz,Ny,Nx ) / ( Nz,Ny,Nx) }
   }
}
}

Can you please explain is this how the order of Nz,Ny and Nx should
be? Does this represent a cube of size Nx by Ny by Nz? Or does dims(1)
actually represent the z dimension and I should assign it the value
Nz?

Thank you,
Pradeep

Links:
------
[1] http://www.hdfgroup.org/ftp/**HDF5/examples/introductory/**
F90/h5_rdwt.f90<http://www.hdfgroup.org/ftp/HDF5/examples/introductory/F90/h5_rdwt.f90>

______________________________**_________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/**mailman/listinfo/hdf-forum_**hdfgroup.org<http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org>

______________________________**_________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/**mailman/listinfo/hdf-forum_**hdfgroup.org<http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org>

brtnfld · March 5, 2013, 5:37am

Thanks for the response. So from what I understand, the HD5 fortran
wrapper automatically transposes the matrix to store it in the C
storage conventions. So puttings "dims(1) = Nx" and "dims(3) = Nz" is
correct. HDF5 fortran wrapper is just transposing the data inside the
program before storing it in the final h5 format.

But this is confusing me about something.

I convert my original unformatted data written by fortran to h5
format so that I can visualize the original data using a software
(Paraview). Does that mean that the data I will visualize using
Paraview and the h5 data file will be a transposed data of what I
originally intended to visualize?

Yes, if you are writing a multidimensional array using the fortran APIs and you want to read the data back using the same dimensional array then you need to use the fortran APIs. If you were to use a C program to read the hdf5 data back then you need to account for the transposing in the C program, or handle it when writing from fortran.

Here is an explanation from the archives:

···

On 2013-02-27 21:01, Pradeep Jha wrote:

HDF5 is a "self-describing" format, which means that HDF5 metadata
stored in a dataset object header allows the HDF5 C library and any
other non-C applications built on top of it, to retrieve a raw data
(i.e. elements of a multidimensional array) in the correct order.

(Let's for a second forget about HDF5, C and Fortran, Python and
Matlab )

If we have a matrix A(N,M,K), we usually count dimensions from left
to right saying that the first dimension has size N, the second
dimension has size M, the third dimension has size K, and so on.

(Now let's talk about HDF5 but without referring to any language.)

When we describe a matrix using HDF5 datatspace object, we use the
same convention (i.e. specifying dimensions from left to right): the
first dimension has size N, the second dimension has size M, the
third dimension has size K. (Aside: Please notice that this
description is valid for both C and Fortran HDF5 applications, i.e. C
and Fortran dims array needed by H5Screate_simple
(h5screate_simple_f) will have the values dims = {N,M,K}).

The question is: how does HDF5 know how to interpret a blob of {N x
M x K x by sizeof(datatype)} bytes of dataset raw data stored in the
file? Was A(N,M,K) stored? Or was it A(K,N,M) stored? Or any other
permutation of (K,N,M)?

HDF5 file has no clue about matrices and their dimensions, and the
languages they were written from. This is application's
responsibility to interpret data correctly and pass the correct
interpretation to the HDF5 C library to store in a file.

As it was mentioned above, dimensions of the matrix are described
using HDF5 dataspace object and are stored in the file. d integers
P1, ..., Pd, where d is a rank of a matrix, are stored in a dataspace
object header according to the following convention: the last value
- Pd is the size of the FASTEST changing dimension of the matrix,
i.e. HDF5 file spec and HDF5 C library follow C storage convention
(no wonder, it is a C library :-). Therefore there is no ambiguity in
interpreting {N x M x K x sizeof(datatype)} bytes, and HDF5 file has
enough information to interpret data correctly by any "row-major" or
"column-major" application (including bypassing HDF5 C library and
reading directly from the HDF5 file!)

Here is what is happening when HDF5 Fortran library is used:

Suppose we want to write A(N,M,K) matrix to the HDF5 file. HDF5
Fortran API describes dataspace with the first dimension being N, the
second dimension being M, the third dimension being K (as we would do
it in C and any other language). But HDF5 Fortran API also knows
that the fastest changing dimension has size N (i.e. we have
column-major order). Therefore HDF5 Fortran library instructs C
library to store K,M,N values in the dataspace object header instead
of N,M,K, since N is the size of the fastest changing dimension.

So, if we read matrix A(N,M,K) ((i.e. N x M x K x sizeof(datatype)
blob) written from Fortran by a C application, we will read it to
the matrix B(K,M,N) ( C API that requests sizes of the first, second
and third dimensions will return values K,M,N stored in the dataspace
header)

If we read matrix A(N,M,K) written from Fortran by Fortran
application, we will read it once again into B(N,M,K) ( Fortran API
that requests sizes of the first, second and third dimension will
flip an array K,M,N stored in the file and return N,M,K)

In other words: HDF5 library stores information about how to
interpret data. Interpretation follows C storage convention: the last
dimension specified for the dataspace object is the fastest changing
one. It is the responsibility of the application (in this case
FORTRAN HDF5 library) to interpret correctly the order of dimensions
and pass to/ from the HDF5 C library.

Please notice that there is no need to transpose data itself: one
only has to pass a correct interpretation of the data to the HDF5 C
Library and to make sure it is done according to the HDF5 C library
convention - the first value stored in the dataspace header
corresponds to the slowest changing dimension, ...., the last value
stored in the dataspace header corresponds to the fastest changing
dimension).

Pradeep_Jha1 · March 5, 2013, 11:19am

Thank you for the response.

···

2013/3/5 <brtnfld@hdfgroup.org>

On 2013-02-27 21:01, Pradeep Jha wrote:

Thanks for the response. So from what I understand, the HD5 fortran
wrapper automatically transposes the matrix to store it in the C
storage conventions. So puttings "dims(1) = Nx" and "dims(3) = Nz" is
correct. HDF5 fortran wrapper is just transposing the data inside the
program before storing it in the final h5 format.

But this is confusing me about something.

I convert my original unformatted data written by fortran to h5
format so that I can visualize the original data using a software
(Paraview). Does that mean that the data I will visualize using
Paraview and the h5 data file will be a transposed data of what I
originally intended to visualize?

Yes, if you are writing a multidimensional array using the fortran APIs
and you want to read the data back using the same dimensional array then
you need to use the fortran APIs. If you were to use a C program to read
the hdf5 data back then you need to account for the transposing in the C
program, or handle it when writing from fortran.

Here is an explanation from the archives:

HDF5 is a "self-describing" format, which means that HDF5 metadata

stored in a dataset object header allows the HDF5 C library and any
other non-C applications built on top of it, to retrieve a raw data
(i.e. elements of a multidimensional array) in the correct order.

(Let's for a second forget about HDF5, C and Fortran, Python and
Matlab )

If we have a matrix A(N,M,K), we usually count dimensions from left
to right saying that the first dimension has size N, the second
dimension has size M, the third dimension has size K, and so on.

(Now let's talk about HDF5 but without referring to any language.)

When we describe a matrix using HDF5 datatspace object, we use the
same convention (i.e. specifying dimensions from left to right): the
first dimension has size N, the second dimension has size M, the
third dimension has size K. (Aside: Please notice that this
description is valid for both C and Fortran HDF5 applications, i.e. C
and Fortran dims array needed by H5Screate_simple
(h5screate_simple_f) will have the values dims = {N,M,K}).

The question is: how does HDF5 know how to interpret a blob of {N x
M x K x by sizeof(datatype)} bytes of dataset raw data stored in the
file? Was A(N,M,K) stored? Or was it A(K,N,M) stored? Or any other
permutation of (K,N,M)?

HDF5 file has no clue about matrices and their dimensions, and the
languages they were written from. This is application's
responsibility to interpret data correctly and pass the correct
interpretation to the HDF5 C library to store in a file.

As it was mentioned above, dimensions of the matrix are described
using HDF5 dataspace object and are stored in the file. d integers
P1, ..., Pd, where d is a rank of a matrix, are stored in a dataspace
object header according to the following convention: the last value
- Pd is the size of the FASTEST changing dimension of the matrix,
i.e. HDF5 file spec and HDF5 C library follow C storage convention
(no wonder, it is a C library :-). Therefore there is no ambiguity in
interpreting {N x M x K x sizeof(datatype)} bytes, and HDF5 file has
enough information to interpret data correctly by any "row-major" or
"column-major" application (including bypassing HDF5 C library and
reading directly from the HDF5 file!)

Here is what is happening when HDF5 Fortran library is used:

Suppose we want to write A(N,M,K) matrix to the HDF5 file. HDF5
Fortran API describes dataspace with the first dimension being N, the
second dimension being M, the third dimension being K (as we would do
it in C and any other language). But HDF5 Fortran API also knows
that the fastest changing dimension has size N (i.e. we have
column-major order). Therefore HDF5 Fortran library instructs C
library to store K,M,N values in the dataspace object header instead
of N,M,K, since N is the size of the fastest changing dimension.

So, if we read matrix A(N,M,K) ((i.e. N x M x K x sizeof(datatype)
blob) written from Fortran by a C application, we will read it to
the matrix B(K,M,N) ( C API that requests sizes of the first, second
and third dimensions will return values K,M,N stored in the dataspace
header)

If we read matrix A(N,M,K) written from Fortran by Fortran
application, we will read it once again into B(N,M,K) ( Fortran API
that requests sizes of the first, second and third dimension will
flip an array K,M,N stored in the file and return N,M,K)

In other words: HDF5 library stores information about how to
interpret data. Interpretation follows C storage convention: the last
dimension specified for the dataspace object is the fastest changing
one. It is the responsibility of the application (in this case
FORTRAN HDF5 library) to interpret correctly the order of dimensions
and pass to/ from the HDF5 C library.

Please notice that there is no need to transpose data itself: one
only has to pass a correct interpretation of the data to the HDF5 C
Library and to make sure it is done according to the HDF5 C library
convention - the first value stored in the dataspace header
corresponds to the slowest changing dimension, ...., the last value
stored in the dataspace header corresponds to the fastest changing
dimension).

______________________________**_________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/**mailman/listinfo/hdf-forum_**hdfgroup.org<http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org>

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

HD5dump