hello, Jam
I have a binary file containing scanned data from radar. I would like to convert it into hdf format. However, I don´t know from where to start. After going through the tutorials and other related information from hdf website, I found that h5import tool can convert binary data into h5 format. Can anyone tell me if I am proceeding in right direction? Or should I required to write a code in fortran or C, for converting the binary data into hdf format?
Jam, India.
--
The time will come when diligent research over long periods will bring to light things which now lie hidden. A single lifetime, even though entirely devoted to the sky, would not be enough for the investigation of so vast a subject...And so this knowledge will be unfolded only through long successive ages. There will come a time when our descendants will be amazed that we did not know things that are so plain to them...Many discoveries are reserved for ages still to come, when memory of us will have been effaced. Our universe is a sorry little affair unless it has in it something for every age to investigate...Nature does not reveal her mysteries once and for all.
Yes, h5import reads binary data and converts it to HDF5
Here's an example of use
Save your binary data to a file. Here's an example in C code, that saves a matrix of 2 rows and 3 columns
{
char bin8[2][3] = {1,2,3,4,5,6};
int nrow = 2, ncol = 3, i, j;
#ifdef WIN32
sp = fopen("bin8w.bin", "wb");
#else
sp = fopen("bin8w.bin", "w");
#endif
for (i = 0; i < nrow; i++)
{
for (j = 0; j < ncol; j++)
{
char c = bin8[i][j];
if ( fwrite( &c, sizeof(char), 1, sp) != 1 )
printf("error writing file\n");
}
}
fclose(sp);
}
in the HDF5 distribution, go to /tools/h5import.
h5import uses a configuration file that tells you basic things about your binary file to import
on your favorite text editor, enter this for the configuration file
INPUT-CLASS IN
INPUT-SIZE 8
RANK 2
DIMENSION-SIZES 2 3
these are h5import "keywords". This tells h5import that we are importing 8bit integers, with dimensions of 2 X 3
Save this to a file, for example named bin8w.conf
Now do
./h5import bin8w.bin -c bin8w.conf -o bin8w.h5
Where bin8w.bin is the name of the binary data file, bin8w.conf the configuration file and bin8w.h5 the output HDF5 that I attached in this email
To see the HDF5 file, do, in /tools/h5dump
./h5dump bin8w.h5
Or if you prefer a graphical interface, you can use HDF Explorer to see its contents
http://www.space-research.org/
If you do
./h5import -h
You get a rather lengthy usage message where other configuration file keywords are explained.
The usage is
Name:
/home/pvn/kagiso/build_hdf5/tools/h5import/.libs/lt-h5import
TOOL NAME:
/home/pvn/kagiso/build_hdf5/tools/h5import/.libs/lt-h5import
SYNTAX:
/home/pvn/kagiso/build_hdf5/tools/h5import/.libs/lt-h5import -h[elp], OR
/home/pvn/kagiso/build_hdf5/tools/h5import/.libs/lt-h5import <infile> -c[onfig] <configfile> [<infile> -c[config] <configfile>...] -o[utfile] <outfile>
PURPOSE:
To convert data stored in one or more ASCII or binary files
into one or more datasets (in accordance with the
user-specified type and storage properties) in an existing
or new HDF5 file.
DESCRIPTION:
The primary objective of the utility is to convert floating
point or integer data stored in ASCII text or binary form
into a data-set according to the type and storage properties
specified by the user. The utility can also accept ASCII
text files and store the contents in a compact form as an
array of one-dimensional strings.
The input data to be written as a data-set can be provided
to the utility in one of the following forms:
1. ASCII text file with numeric data (floating point or
integer data).
2. Binary file with native floating point data (32-bit or
64-bit)
3. Binary file with native integer (signed or unsigned)
data (8-bit or 16-bit or 32-bit or 64-bit).
4. ASCII text file containing strings (text data).
Every input file is associated with a configuration file
also provided as an input to the utility. (See Section
"CONFIGURATION FILE" to know how it is to be organized).
The class, size and dimensions of the input data is
specified in this configuration file. A point to note is
that the floating point data in the ASCII text file may be
organized in the fixed floating form (for example 323.56)
or in a scientific notation (for example 3.23E+02). A
different input-class specification is to be used for both
forms.
The utility extracts the input data from the input file
according to the specified parameters and saves it into
an H5 dataset.
The user can specify output type and storage properties in
the configuration file. The user is requited to specify the
path of the dataset. If the groups in the path leading to
the data-set do not exist, the groups will be created by the
utility. If no group is specified, the dataset will be
created under the root group.
In addition to the name, the user is also required to
provide the class and size of output data to be written to
the dataset and may optionally specify the output-architecure,
and the output-byte-order. If output-architecture is not
specified the default is NATIVE. Output-byte-orders are fixed
for some architectures and may be specified only if output-
architecture is IEEE, UNIX or STD.
Also, layout and other storage properties such as
compression, external storage and extendible data-sets may be
optionally specified. The layout and storage properties
denote how raw data is to be organized on the disk. If these
options are not specified the default is Contiguous layout
and storage.
The dataset can be organized in any of the following ways:
1. Contiguous.
2. Chunked.
3. External Storage File (has to be contiguous)
4. Extendible data sets (has to be chunked)
5. Compressed. (has to be chunked)
6. Compressed & Extendible (has to be chunked)
If the user wants to store raw data in a non-HDF file then
the external storage file option is to be used and the name
of the file is to be specified.
If the user wants the dimensions of the data-set to be
unlimited, the extendible data set option can be chosen.
The user may also specify the type of compression and the
level to which the data set must be compresses by setting
the compressed option.
SYNOPSIS:
h5import -h[elp], OR
h5import <infile> -c[onfig] <configfile> [<infile> -c[config] <confile2>...] -o[utfile] <outfile>
-h[elp]:
Prints this summary of usage, and exits.
<infile(s)>:
Name of the Input file(s), containing a
single n-dimensional floating point or integer array
in either ASCII text, native floating point(32-bit
or 64-bit) or native integer(8-bit or 16-bit or
32-bit or 64-bit). Data to be specified in the order
of fastest changing dimensions first.
-c[config] <configfile>:
Every input file should be associated with a
configuration file and this is done by the -c option.
<configfile> is the name of the configuration file.
(See Section "CONFIGURATION FILE")
-o[utfile] <outfile>:
Name of the HDF5 output file. Data from one or more
input files are stored as one or more data sets in
<outfile>. The output file may be an existing file or
it maybe new in which case it will be created.
CONFIGURATION FILE:
The configuration file is an ASCII text file and must be
organized as "CONFIG-KEYWORD VALUE" pairs, one pair on each
line.
The configuration file may have the following keywords each
followed by an acceptable value.
Required KEYWORDS:
PATH
INPUT-CLASS
INPUT-SIZE
RANK
DIMENSION-SIZES
OUTPUT-CLASS
OUTPUT-SIZE
Optional KEYWORDS:
OUTPUT-ARCHITECTURE
OUTPUT-BYTE-ORDER
CHUNKED-DIMENSION-SIZES
COMPRESSION-TYPE
COMPRESSION-PARAM
EXTERNAL-STORAGE
MAXIMUM-DIMENSIONS
Values for keywords:
PATH:
Strings separated by spaces to represent
the path of the data-set. If the groups in
the path do no exist, they will be created.
For example,
PATH grp1/grp2/dataset1
PATH: keyword
grp1: group under the root. If
non-existent will be created.
grp2: group under grp1. If
non-existent will be created
under grp1.
dataset1: the name of the data-set
to be created.
INPUT-CLASS:
String denoting the type of input data.
("TEXTIN", "TEXTFP", "TEXTFPE", "FP", "IN",
"STR", "TEXTUIN", "UIN").
INPUT-CLASS "TEXTIN" denotes an ASCII text
file with signed integer data in ASCII form,
INPUT-CLASS "TEXTUIN" denotes an ASCII text
file with unsigned integer data in ASCII form,
"TEXTFP" denotes an ASCII text file containing
floating point data in the fixed notation
(325.34),
"TEXTFPE" denotes an ASCII text file containing
floating point data in the scientific notation
(3.2534E+02),
"FP" denotes a floating point binary file,
"IN" denotes a signed integer binary file,
"UIN" denotes an unsigned integer binary file,
& "STR" denotes an ASCII text file the
contents of which should be stored as an 1-D
array of strings.
If INPUT-CLASS is "STR", then RANK,
DIMENSION-SIZES, OUTPUT-CLASS, OUTPUT-SIZE,
OUTPUT-ARCHITECTURE and OUTPUT-BYTE-ORDER
will be ignored.
INPUT-SIZE:
Integer denoting the size of the input data
(8, 16, 32, 64).
For floating point,
INPUT-SIZE can be 32 or 64.
For integers (signed and unsigned)
INPUT-SIZE can be 8, 16, 32 or 64.
RANK:
Integer denoting the number of dimensions.
DIMENSION-SIZES:
Integers separated by spaces to denote the
dimension sizes for the no. of dimensions
determined by rank.
OUTPUT-CLASS:
String dentoting data type of the dataset to
be written ("IN","FP", "UIN")
OUTPUT-SIZE:
Integer denoting the size of the data in the
output dataset to be written.
If OUTPUT-CLASS is "FP", OUTPUT-SIZE can be
32 or 64.
If OUTPUT-CLASS is "IN" or "UIN", OUTPUT-SIZE
can be 8, 16, 32 or 64.
OUTPUT-ARCHITECTURE:
STRING denoting the type of output
architecture. Can accept the following values
STD
IEEE
INTEL
CRAY
MIPS
ALPHA
NATIVE (default)
UNIX
OUTPUT-BYTE-ORDER:
String denoting the output-byte-order. Ignored
if the OUTPUT-ARCHITECTURE is not specified or
if it is IEEE, UNIX or STD. Can accept the
following values.
BE (default)
LE
CHUNKED-DIMENSION-SIZES:
Integers separated by spaces to denote the
dimension sizes of the chunk for the no. of
dimensions determined by rank. Required field
to denote that the dataset will be stored with
chunked storage. If this field is absent the
dataset will be stored with contiguous storage.
COMPRESSION-TYPE:
String denoting the type of compression to be
used with the chunked storage. Requires the
CHUNKED-DIMENSION-SIZES to be specified. The only
currently supported compression method is GZIP.
Will accept the following value
GZIP
COMPRESSION-PARAM:
Integer used to denote compression level and
this option is to be always specified when
the COMPRESSION-TYPE option is specified. The
values are applicable only to GZIP
compression.
Value 1-9: The level of Compression.
1 will result in the fastest
compression while 9 will result in
the best compression ratio. The default
level of compression is 6.
EXTERNAL-STORAGE:
String to denote the name of the non-HDF5 file
to store data to. Cannot be used if CHUNKED-
DIMENSIONS or COMPRESSION-TYPE or EXTENDIBLE-
DATASET is specified.
Value <external-filename>: the name of the
external file as a string to be used.
MAXIMUM-DIMENSIONS:
Integers separated by spaces to denote the
maximum dimension sizes of all the
dimensions determined by rank. Requires the
CHUNKED-DIMENSION-SIZES to be specified. A value of
-1 for any dimension implies UNLIMITED
DIMENSION size for that particular dimension.
EXAMPLES:
1. Configuration File may look like:
PATH work h5 pkamat First-set
INPUT-CLASS TEXTFP
RANK 3
DIMENSION-SIZES 5 2 4
OUTPUT-CLASS FP
OUTPUT-SIZE 64
OUTPUT-ARCHITECTURE IEEE
OUTPUT-BYTE-ORDER LE
CHUNKED-DIMENSION-SIZES 2 2 2
The above configuration will accept a floating point array
(5 x 2 x 4) in an ASCII file with the rank and dimension sizes
specified and will save it in a chunked data-set (of pattern
2 X 2 X 2) of 64-bit floating point in the little-endian order
and IEEE architecture. The dataset will be stored at
"/work/h5/pkamat/First-set"
2. Another configuration could be:
PATH Second-set
INPUT-CLASS IN
RANK 5
DIMENSION-SIZES 6 3 5 2 4
OUTPUT-CLASS IN
OUTPUT-SIZE 32
CHUNKED-DIMENSION-SIZES 2 2 2 2 2
EXTENDIBLE-DATASET 1 3
COMPRESSION-TYPE GZIP
COMPRESSION-PARAM 7
The above configuration will accept an integer array
(6 X 3 X 5 x 2 x 4) in a binary file with the rank and
dimension sizes specified and will save it in a chunked data-set
(of pattern 2 X 2 X 2 X 2 X 2) of 32-bit floating point in
native format (as output-architecture is not specified). The
first and the third dimension will be defined as unlimited. The
data-set will be compressed using GZIP and a compression level
of 7.
The dataset will be stored at "/Second-set"
bin8w.h5 (2.01 KB)
···
At 01:39 AM 6/3/2008, Srinivasa Ramanujam wrote:
--------------------------------------------------------------
Pedro Vicente (T) 217.265-0311
pvn@hdfgroup.org
The HDF Group. 1901 S. First. Champaign, IL 61820