binary to hdf, using h5import tool

Jam · June 3, 2008, 6:39am

I have a binary file containing scanned data from radar. I would like to
convert it into hdf format. However, I don´t know from where to start. After
going through the tutorials and other related information from hdf website,
I found that h5import tool can convert binary data into h5 format. Can
anyone tell me if I am proceeding in right direction? Or should I required
to write a code in fortran or C, for converting the binary data into hdf
format?

Jam, India.

···

--
The time will come when diligent research over long periods will bring to
light things which now lie hidden. A single lifetime, even though entirely
devoted to the sky, would not be enough for the investigation of so vast a
subject...And so this knowledge will be unfolded only through long
successive ages. There will come a time when our descendants will be amazed
that we did not know things that are so plain to them...Many discoveries are
reserved for ages still to come, when memory of us will have been effaced.
Our universe is a sorry little affair unless it has in it something for
every age to investigate...Nature does not reveal her mysteries once and for
all.

Albert_Strasheim · June 3, 2008, 6:42am

Hello,

I have a binary file containing scanned data from radar. I would like to
convert it into hdf format. However, I don´t know from where to start. After
going through the tutorials and other related information from hdf website,
I found that h5import tool can convert binary data into h5 format. Can

I don't know the h5import tool very well. If you only want to convert
a few files, it might be a good solution.

anyone tell me if I am proceeding in right direction? Or should I required
to write a code in fortran or C, for converting the binary data into hdf
format?

If you think you'll be converting many files in future, I would
recommend you invest some time writing a Python+PyTables or MATLAB
script to convert your data.

Regards,

Albert

···

On Tue, Jun 3, 2008 at 8:39 AM, Srinivasa Ramanujam <sramanujam.k@gmail.com> wrote:

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Albert_Strasheim · June 3, 2008, 7:06am

Hello,

hi albert,
thank you for your quick response.

If you think you'll be converting many files in future

As of now, I am interested only to convert only one sample binary file.
However, the required task is to convert volume of binary data into hdf
format.

I would recommend you invest some time writing a Python+PyTables or MATLAB
script to convert your data.

Coding in MATLAB is new to me as I have been only using Compaq visual
fortran, and Lahey Fortran. However Python is totally unfamiliar for me. So
I will make an attempt to code using MATLAB, and will engage in further
discussion, in case I face any difficulty.

Here are some random notes on using MATLAB. I've been using MATLAB
with HDF5 for a few months and I'm mostly loving it.

You want to look at the hdf5write function first. Here's a little
trick if you want to truncate your output file and then write multiple
datasets to it:

hdfout = 'yourfile.h5';
hdf5write(hdfout,'/',); % truncates the output file
hdf5write(hdfout,'/group/dataset1',x1,'WriteMode','append');
hdf5write(hdfout,'/group/dataset2',x2,'WriteMode','append');

You can also get at the raw HDF5 API using MATLAB, but I've managed to
do most things using their easy-to-use hdf5read and hdf5write
functions.

To read your binary data, you want to look at fopen and fread. Also
remember to close the file handle with fclose.

Type help <functionname> to get some quick help.

Finally, watch out for C vs Fortran ordering issues (as discussed on
this list in the last few days). To store your data in C order using
MATLAB, you should transpose your matrices as you write them to the
HDF file.

Sometimes you might also find it useful to "normalize" vectors you
read from the HDF file.

x = hdf5read(hdfin,'/group/dataset1');
x = x(:); % x is now always a column vector

Regards,

Albert

···

On Tue, Jun 3, 2008 at 8:56 AM, Srinivasa Ramanujam <sramanujam.k@gmail.com> wrote:

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Albert_Strasheim · June 3, 2008, 9:50am

Use fread.

···

On Tue, Jun 3, 2008 at 11:24 AM, Srinivasa Ramanujam <sramanujam.k@gmail.com> wrote:

hi,

As I have already mentioned in one of my previous mails, my input is in raw
binary data format, which is not the recognized data format for MATLAB to
import! Is there an alternate?

Jam, India.

----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.

Pedro_Vicente2 · June 4, 2008, 7:32pm

hello, Jam

I have a binary file containing scanned data from radar. I would like to convert it into hdf format. However, I donÂ´t know from where to start. After going through the tutorials and other related information from hdf website, I found that h5import tool can convert binary data into h5 format. Can anyone tell me if I am proceeding in right direction? Or should I required to write a code in fortran or C, for converting the binary data into hdf format?

Jam, India.

--
The time will come when diligent research over long periods will bring to light things which now lie hidden. A single lifetime, even though entirely devoted to the sky, would not be enough for the investigation of so vast a subject...And so this knowledge will be unfolded only through long successive ages. There will come a time when our descendants will be amazed that we did not know things that are so plain to them...Many discoveries are reserved for ages still to come, when memory of us will have been effaced. Our universe is a sorry little affair unless it has in it something for every age to investigate...Nature does not reveal her mysteries once and for all.

Yes, h5import reads binary data and converts it to HDF5

Here's an example of use

Save your binary data to a file. Here's an example in C code, that saves a matrix of 2 rows and 3 columns

{
        char bin8[2][3] = {1,2,3,4,5,6};
        int nrow = 2, ncol = 3, i, j;

#ifdef WIN32
        sp = fopen("bin8w.bin", "wb");
#else
        sp = fopen("bin8w.bin", "w");
#endif
        for (i = 0; i < nrow; i++)
        {
            for (j = 0; j < ncol; j++)
            {
                char c = bin8[i][j];
                if ( fwrite( &c, sizeof(char), 1, sp) != 1 )
                    printf("error writing file\n");
            }
        }
        fclose(sp);

    }

in the HDF5 distribution, go to /tools/h5import.

h5import uses a configuration file that tells you basic things about your binary file to import

on your favorite text editor, enter this for the configuration file

INPUT-CLASS IN
INPUT-SIZE 8
RANK 2
DIMENSION-SIZES 2 3

these are h5import "keywords". This tells h5import that we are importing 8bit integers, with dimensions of 2 X 3

Save this to a file, for example named bin8w.conf

Now do

./h5import bin8w.bin -c bin8w.conf -o bin8w.h5

Where bin8w.bin is the name of the binary data file, bin8w.conf the configuration file and bin8w.h5 the output HDF5 that I attached in this email

To see the HDF5 file, do, in /tools/h5dump

./h5dump bin8w.h5

Or if you prefer a graphical interface, you can use HDF Explorer to see its contents

http://www.space-research.org/

If you do

./h5import -h

You get a rather lengthy usage message where other configuration file keywords are explained.

The usage is

Name:

/home/pvn/kagiso/build_hdf5/tools/h5import/.libs/lt-h5import

          TOOL NAME:
           /home/pvn/kagiso/build_hdf5/tools/h5import/.libs/lt-h5import
           SYNTAX:
           /home/pvn/kagiso/build_hdf5/tools/h5import/.libs/lt-h5import -h[elp], OR
           /home/pvn/kagiso/build_hdf5/tools/h5import/.libs/lt-h5import <infile> -c[onfig] <configfile> [<infile> -c[config] <configfile>...] -o[utfile] <outfile>

           PURPOSE:
                To convert data stored in one or more ASCII or binary files
                into one or more datasets (in accordance with the
                user-specified type and storage properties) in an existing
                or new HDF5 file.

           DESCRIPTION:
                The primary objective of the utility is to convert floating
                point or integer data stored in ASCII text or binary form
                into a data-set according to the type and storage properties
                specified by the user. The utility can also accept ASCII
                text files and store the contents in a compact form as an
                array of one-dimensional strings.

                The input data to be written as a data-set can be provided
                to the utility in one of the following forms:
                1. ASCII text file with numeric data (floating point or
                integer data).
                2. Binary file with native floating point data (32-bit or
                64-bit)
                3. Binary file with native integer (signed or unsigned)
                data (8-bit or 16-bit or 32-bit or 64-bit).
                4. ASCII text file containing strings (text data).

                Every input file is associated with a configuration file
                also provided as an input to the utility. (See Section
                "CONFIGURATION FILE" to know how it is to be organized).
                The class, size and dimensions of the input data is
                specified in this configuration file. A point to note is
                that the floating point data in the ASCII text file may be
                organized in the fixed floating form (for example 323.56)
                or in a scientific notation (for example 3.23E+02). A
                different input-class specification is to be used for both
                forms.

                The utility extracts the input data from the input file
                according to the specified parameters and saves it into
                an H5 dataset.

                The user can specify output type and storage properties in
                the configuration file. The user is requited to specify the
                path of the dataset. If the groups in the path leading to
                the data-set do not exist, the groups will be created by the
                utility. If no group is specified, the dataset will be
                created under the root group.

                In addition to the name, the user is also required to
                provide the class and size of output data to be written to
                the dataset and may optionally specify the output-architecure,
                and the output-byte-order. If output-architecture is not
                specified the default is NATIVE. Output-byte-orders are fixed
                for some architectures and may be specified only if output-
                architecture is IEEE, UNIX or STD.

                Also, layout and other storage properties such as
                compression, external storage and extendible data-sets may be
                optionally specified. The layout and storage properties
                denote how raw data is to be organized on the disk. If these
                options are not specified the default is Contiguous layout
                and storage.

                The dataset can be organized in any of the following ways:
                1. Contiguous.
                2. Chunked.
                3. External Storage File (has to be contiguous)
                4. Extendible data sets (has to be chunked)
                5. Compressed. (has to be chunked)
                6. Compressed & Extendible (has to be chunked)

                If the user wants to store raw data in a non-HDF file then
                the external storage file option is to be used and the name
                of the file is to be specified.

If the user wants the dimensions of the data-set to be
unlimited, the extendible data set option can be chosen.

                The user may also specify the type of compression and the
                level to which the data set must be compresses by setting
                the compressed option.

           SYNOPSIS:
          h5import -h[elp], OR
          h5import <infile> -c[onfig] <configfile> [<infile> -c[config] <confile2>...] -o[utfile] <outfile>

-h[elp]:
Prints this summary of usage, and exits.

           <infile(s)>:
                   Name of the Input file(s), containing a
                        single n-dimensional floating point or integer array
                        in either ASCII text, native floating point(32-bit
                        or 64-bit) or native integer(8-bit or 16-bit or
                        32-bit or 64-bit). Data to be specified in the order
                        of fastest changing dimensions first.

                -c[config] <configfile>:
                        Every input file should be associated with a
                        configuration file and this is done by the -c option.
                        <configfile> is the name of the configuration file.
                        (See Section "CONFIGURATION FILE")

           -o[utfile] <outfile>:
                   Name of the HDF5 output file. Data from one or more
                        input files are stored as one or more data sets in
                        <outfile>. The output file may be an existing file or
                        it maybe new in which case it will be created.

           CONFIGURATION FILE:
                The configuration file is an ASCII text file and must be
                organized as "CONFIG-KEYWORD VALUE" pairs, one pair on each
                line.

The configuration file may have the following keywords each
followed by an acceptable value.

                Required KEYWORDS:
                        PATH
                        INPUT-CLASS
                        INPUT-SIZE
                        RANK
                        DIMENSION-SIZES
                        OUTPUT-CLASS
                        OUTPUT-SIZE

                Optional KEYWORDS:
                        OUTPUT-ARCHITECTURE
                        OUTPUT-BYTE-ORDER
                        CHUNKED-DIMENSION-SIZES
                        COMPRESSION-TYPE
                        COMPRESSION-PARAM
                        EXTERNAL-STORAGE
                        MAXIMUM-DIMENSIONS

                Values for keywords:
                        PATH:
                                Strings separated by spaces to represent
                                the path of the data-set. If the groups in
                                the path do no exist, they will be created.
                                For example,
                                        PATH grp1/grp2/dataset1
                                        PATH: keyword
                                        grp1: group under the root. If
                                              non-existent will be created.
                                        grp2: group under grp1. If
                                              non-existent will be created
                                              under grp1.
                                        dataset1: the name of the data-set
                                                  to be created.

                       INPUT-CLASS:
                                String denoting the type of input data.
                                ("TEXTIN", "TEXTFP", "TEXTFPE", "FP", "IN",
                                "STR", "TEXTUIN", "UIN").
                                INPUT-CLASS "TEXTIN" denotes an ASCII text
                                file with signed integer data in ASCII form,
                                INPUT-CLASS "TEXTUIN" denotes an ASCII text
                                file with unsigned integer data in ASCII form,
                                "TEXTFP" denotes an ASCII text file containing
                                floating point data in the fixed notation
                                (325.34),
                                "TEXTFPE" denotes an ASCII text file containing
                                floating point data in the scientific notation
                                (3.2534E+02),
                                "FP" denotes a floating point binary file,
                                "IN" denotes a signed integer binary file,
                                "UIN" denotes an unsigned integer binary file,
                                 & "STR" denotes an ASCII text file the
                                contents of which should be stored as an 1-D
                                array of strings.
                                If INPUT-CLASS is "STR", then RANK,
                                DIMENSION-SIZES, OUTPUT-CLASS, OUTPUT-SIZE,
                                OUTPUT-ARCHITECTURE and OUTPUT-BYTE-ORDER
                                will be ignored.

                        INPUT-SIZE:
                                Integer denoting the size of the input data
                                (8, 16, 32, 64).

                                For floating point,
                                INPUT-SIZE can be 32 or 64.
                                For integers (signed and unsigned)
                                INPUT-SIZE can be 8, 16, 32 or 64.

RANK:
Integer denoting the number of dimensions.

                        DIMENSION-SIZES:
                                Integers separated by spaces to denote the
                                dimension sizes for the no. of dimensions
                                determined by rank.

                        OUTPUT-CLASS:
                                String dentoting data type of the dataset to
                                be written ("IN","FP", "UIN")

                        OUTPUT-SIZE:
                                Integer denoting the size of the data in the
                                output dataset to be written.
                                If OUTPUT-CLASS is "FP", OUTPUT-SIZE can be
                                32 or 64.
                                If OUTPUT-CLASS is "IN" or "UIN", OUTPUT-SIZE
                                can be 8, 16, 32 or 64.

                        OUTPUT-ARCHITECTURE:
                                STRING denoting the type of output
                                architecture. Can accept the following values
                                STD
                                IEEE
                                INTEL
                                CRAY
                                MIPS
                                ALPHA
                                NATIVE (default)
                                UNIX

                        OUTPUT-BYTE-ORDER:
                                String denoting the output-byte-order. Ignored
                                if the OUTPUT-ARCHITECTURE is not specified or
                                if it is IEEE, UNIX or STD. Can accept the
                                following values.
                                BE (default)
                                LE

                        CHUNKED-DIMENSION-SIZES:
                                Integers separated by spaces to denote the
                                dimension sizes of the chunk for the no. of
                                dimensions determined by rank. Required field
                                to denote that the dataset will be stored with
                                chunked storage. If this field is absent the
                                dataset will be stored with contiguous storage.

                        COMPRESSION-TYPE:
                                String denoting the type of compression to be
                                used with the chunked storage. Requires the
                                CHUNKED-DIMENSION-SIZES to be specified. The only
                                currently supported compression method is GZIP.
                                Will accept the following value
                                GZIP

                        COMPRESSION-PARAM:
                                Integer used to denote compression level and
                                this option is to be always specified when
                                the COMPRESSION-TYPE option is specified. The
                                values are applicable only to GZIP
                                compression.
                                Value 1-9: The level of Compression.
                                        1 will result in the fastest
                                        compression while 9 will result in
                                        the best compression ratio. The default
                                        level of compression is 6.

                        EXTERNAL-STORAGE:
                                String to denote the name of the non-HDF5 file
                                to store data to. Cannot be used if CHUNKED-
                                DIMENSIONS or COMPRESSION-TYPE or EXTENDIBLE-
                                DATASET is specified.
                                Value <external-filename>: the name of the
                                external file as a string to be used.

                        MAXIMUM-DIMENSIONS:
                                Integers separated by spaces to denote the
                                maximum dimension sizes of all the
                                dimensions determined by rank. Requires the
                                CHUNKED-DIMENSION-SIZES to be specified. A value of
                                -1 for any dimension implies UNLIMITED
                                DIMENSION size for that particular dimension.

EXAMPLES:
1. Configuration File may look like:

                        PATH work h5 pkamat First-set
                        INPUT-CLASS TEXTFP
                        RANK 3
                        DIMENSION-SIZES 5 2 4
                        OUTPUT-CLASS FP
                        OUTPUT-SIZE 64
                        OUTPUT-ARCHITECTURE IEEE
                        OUTPUT-BYTE-ORDER LE
                        CHUNKED-DIMENSION-SIZES 2 2 2

                The above configuration will accept a floating point array
                (5 x 2 x 4) in an ASCII file with the rank and dimension sizes
                specified and will save it in a chunked data-set (of pattern
                2 X 2 X 2) of 64-bit floating point in the little-endian order
                and IEEE architecture. The dataset will be stored at
                "/work/h5/pkamat/First-set"

2. Another configuration could be:

                        PATH Second-set
                        INPUT-CLASS IN
                        RANK 5
                        DIMENSION-SIZES 6 3 5 2 4
                        OUTPUT-CLASS IN
                        OUTPUT-SIZE 32
                        CHUNKED-DIMENSION-SIZES 2 2 2 2 2
                        EXTENDIBLE-DATASET 1 3
                        COMPRESSION-TYPE GZIP
                        COMPRESSION-PARAM 7

                The above configuration will accept an integer array
                (6 X 3 X 5 x 2 x 4) in a binary file with the rank and
                dimension sizes specified and will save it in a chunked data-set
                (of pattern 2 X 2 X 2 X 2 X 2) of 32-bit floating point in
                native format (as output-architecture is not specified). The
                first and the third dimension will be defined as unlimited. The
                data-set will be compressed using GZIP and a compression level
                of 7.
                The dataset will be stored at "/Second-set"

bin8w.h5 (2.01 KB)

···

At 01:39 AM 6/3/2008, Srinivasa Ramanujam wrote:

--------------------------------------------------------------
Pedro Vicente (T) 217.265-0311
pvn@hdfgroup.org
The HDF Group. 1901 S. First. Champaign, IL 61820

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

binary to hdf, using h5import tool