INPUT-BYTE-ORDER has no effect.

chinoune.mehdi · February 10, 2020, 7:43pm

I tried to convert a big-endian binary file to an hdf5 file using h5import using this configuration parameters

PATH h5
INPUT-BYTE-ORDER BE
INPUT-CLASS IN
INPUT-SIZE 16
RANK 2
DIMENSION-SIZES 3601 3601
OUTPUT-CLASS IN
OUTPUT-SIZE 16

First: It is written as a required keyword which is not correct, so please fix your help messages.
Second: It has no effect as it doesn’t convert big-endian to little-endian.

gnwiii · February 11, 2020, 5:09pm

chinoune.mehdi
      [Chinoune Mehdi](https://forum.hdfgroup.org/u/chinoune.mehdi)




    February 10
I tried to convert a big-endian binary file to an hdf5 file using h5import using this configuration parameters

PATH h5

INPUT-BYTE-ORDER BE

INPUT-CLASS IN

INPUT-SIZE 16

RANK 2

DIMENSION-SIZES 3601 3601

OUTPUT-CLASS IN

OUTPUT-SIZE 16

First: It is written as a required keyword which is not correct, so please fix your help messages.

Second: It has no effect as it doesn’t convert big-endian to little-endian.

If you omit “OUTPUT-BYTE-ORDER LE” you should get the default (BE) output. Since it is

necessary to have the input byte order, failure to supply this information should generate

an error. It could be useful to mention the HDF5 version you have.

chinoune.mehdi · February 11, 2020, 7:27pm

Well
1- h5import doesn’t generate any error if I omit both INPUT-BYTE-ORDER and OUPUT-BYTE-ORDER. So none of them is required.
2- If I omit “OUTPUT-BYTE-ORDER LE” I get a little-endian output file.
h5import N34E000.hgt -c conf.txt -o N34E000.h5
h5dump -H N34E000.h5
HDF5 “N34E000.h5” { GROUP “/” { DATASET “h5” {
DATATYPE H5T_STD_I16LE
DATASPACE SIMPLE { ( 3601, 3601 ) / ( 3601, 3601 ) }
}
} }

3- I can confirm with h5dump that it reads data as little-endian.

gnwiii · February 11, 2020, 11:18pm

For hdf5-1.10.5-5.fc31, “man h5import” does not mention “INPUT-BYTE-ORDER”, so it assumes

the native byte order. You can use “dd conv=swab …” to convert the BE file to LE:

$ od -x int16_be.raw
0000000 0000 0100 0200 0300 0400 0500 0600 0700
0000020 0800 0900 0a00
0000026

$ dd conv=swab if=int16_be.raw of=int16_le.raw

$ od -x int16_le.raw

0000000 0000 0001 0002 0003 0004 0005 0006 0007

0000020 0008 0009 000a
0000026

$ cat be.conf

PATH indgen11
INPUT-CLASS IN
INPUT-SIZE 16
RANK 1
DIMENSION-SIZES 11
OUTPUT-BYTE-ORDER BE
OUTPUT-CLASS IN
OUTPUT-SIZE 16

$ h5import int16.raw -c be.conf -outfile indgen11.h5

$ h5dump indgen11.h5
HDF5 “indgen11.h5” {
GROUP “/” {
DATASET “indgen11” {
DATATYPE H5T_STD_I16BE
DATASPACE SIMPLE { ( 11 ) / ( 11 ) }
DATA {
(0): 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
}
}
}
}

chinoune.mehdi · February 12, 2020, 2:38pm

h5import -h

Name:

h5import

  TOOL NAME:
   h5import
   SYNTAX:
   h5import -h[elp], OR
   h5import <infile> -c[onfig] <configfile> [<infile> -c[config] <configfile>...]				      -o[utfile] <outfile>

   PURPOSE:
   To convert data stored in one or more ASCII or binary files
  into one or more datasets (in accordance with the 
  user-specified type and storage properties) in an existing 
  or new HDF5 file.

   DESCRIPTION:
  The primary objective of the utility is to convert floating
  point or integer data stored in ASCII text or binary form 
  into a data-set according to the type and storage properties
  specified by the user. The utility can also accept ASCII
  text files and store the contents in a compact form as an
  array of one-dimensional strings.

  The input data to be written as a data-set can be provided
  to the utility in one of the following forms:
  1. ASCII text file with numeric data (floating point or 
  integer data). 
  2. Binary file with native floating point data (32-bit or 
  64-bit) 
  3. Binary file with native integer (signed or unsigned)
  data (8-bit or 16-bit or 32-bit or 64-bit). 
  4. ASCII text file containing strings (text data).
    
  Every input file is associated with a configuration file 
  also provided as an input to the utility. (See Section 
  "CONFIGURATION FILE" to know how it is to be organized).
  The class, size and dimensions of the input data is 
  specified in this configuration file. A point to note is
  that the floating point data in the ASCII text file may be
  organized in the fixed floating form (for example 323.56)
  or in a scientific notation (for example 3.23E+02). A 
  different input-class specification is to be used for both
  forms.

  The utility extracts the input data from the input file 
  according to the specified parameters and saves it into 
  an H5 dataset. 

  The user can specify output type and storage properties in 
  the configuration file. The user is required to specify the 
  path of the dataset. If the groups in the path leading to 
  the data-set do not exist, the groups will be created by the
  utility. If no group is specified, the dataset will be
  created under the root group.

  In addition to the name, the user is also required to 
  provide the class and size of output data to be written to 
  the dataset and may optionally specify the output-architecture,
  and the output-byte-order. If output-architecture is not 
  specified the default is NATIVE. Output-byte-orders are fixed
  for some architectures and may be specified only if output-
  architecture is IEEE, UNIX or STD.

   Also, layout and other storage properties such as 
  compression, external storage and extendible data-sets may be
  optionally specified.  The layout and storage properties 
  denote how raw data is to be organized on the disk. If these 
  options are not specified the default is Contiguous layout 
  and storage.

  The dataset can be organized in any of the following ways:
  1. Contiguous.
  2. Chunked.
  3. External Storage File    (has to be contiguous)
  4. Extendible data sets     (has to be chunked)
  5. Compressed.        (has to be chunked)
  6. Compressed & Extendible  (has to be chunked)

  If the user wants to store raw data in a non-HDF file then 
  the external storage file option is to be used and the name 
  of the file is to be specified. 

  If the user wants the dimensions of the data-set to be
  unlimited, the extendible data set option can be chosen. 

  The user may also specify the type of compression and the 
  level to which the data set must be compresses by setting 
  the compressed option.

   SYNOPSIS:
  h5import -h[elp], OR
  h5import <infile> -c[onfig] <configfile>                     [<infile> -c[config] <confile2>...] -o[utfile] <outfile>

   -h[elp]:
           Prints this summary of usage, and exits.

   <infile(s)>:
           Name of the Input file(s), containing a 
    single n-dimensional floating point or integer array 
    in either ASCII text, native floating point(32-bit 
    or 64-bit) or native integer(8-bit or 16-bit or 
    32-bit or 64-bit). Data to be specified in the order
    of fastest changing dimensions first.

  -c[config] <configfile>:
    Every input file should be associated with a 
    configuration file and this is done by the -c option.
    <configfile> is the name of the configuration file.
    (See Section "CONFIGURATION FILE")

   -o[utfile] <outfile>:
           Name of the HDF5 output file. Data from one or more 
    input files are stored as one or more data sets in 
    <outfile>. The output file may be an existing file or 
    it maybe new in which case it will be created.


   CONFIGURATION FILE:
  The configuration file is an ASCII text file and must be 
  the ddl formatted file (without data values) produced by h5dump 
  when used with the options '-o outfilename -b' of a single dataset (-d) 
  OR organized as "CONFIG-KEYWORD VALUE" pairs, one pair on each 
  line.

   The configuration file may have the following keywords each 
   followed by an acceptable value.

  Required KEYWORDS:
    PATH
    INPUT-CLASS
    INPUT-SIZE
    INPUT-BYTE-ORDER
    RANK
    DIMENSION-SIZES
    OUTPUT-CLASS
    OUTPUT-SIZE

  Optional KEYWORDS:
    OUTPUT-ARCHITECTURE
    OUTPUT-BYTE-ORDER
    CHUNKED-DIMENSION-SIZES
    COMPRESSION-TYPE
    COMPRESSION-PARAM
    EXTERNAL-STORAGE
    MAXIMUM-DIMENSIONS


    Values for keywords:
    PATH:
      Strings separated by spaces to represent
      the path of the data-set. If the groups in
      the path do not exist, they will be created. 
      For example,
        PATH grp1/grp2/dataset1
        PATH: keyword
        grp1: group under the root. If
              non-existent will be created.
        grp2: group under grp1. If 
              non-existent will be created 
              under grp1.
        dataset1: the name of the data-set 
            to be created.

               INPUT-CLASS:
      String denoting the type of input data.
      ("TEXTIN", "TEXTFP", "FP", "IN", 
      "STR", "TEXTUIN", "UIN"). 
      INPUT-CLASS "TEXTIN" denotes an ASCII text 
      file with signed integer data in ASCII form,
      INPUT-CLASS "TEXTUIN" denotes an ASCII text 
      file with unsigned integer data in ASCII form,
      "TEXTFP" denotes an ASCII text file containing
      floating point data in the fixed notation
      (325.34),
      "FP" denotes a floating point binary file,
      "IN" denotes a signed integer binary file,
      "UIN" denotes an unsigned integer binary file,
       & "STR" denotes an ASCII text file the 
      contents of which should be stored as an 1-D 
      array of strings.
      If INPUT-CLASS is "STR", then RANK, 
      DIMENSION-SIZES, OUTPUT-CLASS, OUTPUT-SIZE, 
      OUTPUT-ARCHITECTURE and OUTPUT-BYTE-ORDER 
      will be ignored.


    INPUT-SIZE:
      Integer denoting the size of the input data 
      (8, 16, 32, 64). 

      For floating point,
      INPUT-SIZE can be 32 or 64.
      For integers (signed and unsigned)
      INPUT-SIZE can be 8, 16, 32 or 64.

    RANK:
      Integer denoting the number of dimensions.

    DIMENSION-SIZES:
            Integers separated by spaces to denote the 
      dimension sizes for the no. of dimensions 
      determined by rank.

    OUTPUT-CLASS:
      String dentoting data type of the dataset to 
      be written ("IN","FP", "UIN")

    OUTPUT-SIZE:
      Integer denoting the size of the data in the 
      output dataset to be written.
      If OUTPUT-CLASS is "FP", OUTPUT-SIZE can be 
      32 or 64.
      If OUTPUT-CLASS is "IN" or "UIN", OUTPUT-SIZE
      can be 8, 16, 32 or 64.

    OUTPUT-ARCHITECTURE:
      STRING denoting the type of output 
      architecture. Can accept the following values
      STD
      IEEE
      INTEL
      CRAY
      MIPS
      ALPHA
      NATIVE (default)
      UNIX

    OUTPUT-BYTE-ORDER:
      String denoting the output-byte-order. Ignored
      if the OUTPUT-ARCHITECTURE is not specified or
      if it is IEEE, UNIX or STD. Can accept the 
      following values.
      BE (default)
      LE

    CHUNKED-DIMENSION-SIZES:
      Integers separated by spaces to denote the 
      dimension sizes of the chunk for the no. of 
      dimensions determined by rank. Required field
      to denote that the dataset will be stored with
      chunked storage. If this field is absent the
      dataset will be stored with contiguous storage.

    COMPRESSION-TYPE:
      String denoting the type of compression to be
      used with the chunked storage. Requires the
      CHUNKED-DIMENSION-SIZES to be specified. The only 
      currently supported compression method is GZIP. 
      Will accept the following value
      GZIP

    COMPRESSION-PARAM:
      Integer used to denote compression level and 
      this option is to be always specified when 
      the COMPRESSION-TYPE option is specified. The
      values are applicable only to GZIP 
      compression.
      Value 1-9: The level of Compression. 
        1 will result in the fastest 
        compression while 9 will result in 
        the best compression ratio. The default
        level of compression is 6.

    EXTERNAL-STORAGE:
      String to denote the name of the non-HDF5 file 
      to store data to. Cannot be used if CHUNKED-
      DIMENSIONS or COMPRESSION-TYPE or EXTENDIBLE-
      DATASET is specified.
      Value <external-filename>: the name of the 
      external file as a string to be used.

    MAXIMUM-DIMENSIONS:
      Integers separated by spaces to denote the 
      maximum dimension sizes of all the 
      dimensions determined by rank. Requires the
      CHUNKED-DIMENSION-SIZES to be specified. A value of 
      -1 for any dimension implies UNLIMITED 
      DIMENSION size for that particular dimension.

   EXAMPLES:
  1. Configuration File may look like:

    PATH work h5 pkamat First-set
    INPUT-CLASS TEXTFP
    RANK 3
    DIMENSION-SIZES 5 2 4
    OUTPUT-CLASS FP
    OUTPUT-SIZE 64
    OUTPUT-ARCHITECTURE IEEE
    OUTPUT-BYTE-ORDER LE
      CHUNKED-DIMENSION-SIZES 2 2 2 

  The above configuration will accept a floating point array 
  (5 x 2 x 4)  in an ASCII file with the rank and dimension sizes 
  specified and will save it in a chunked data-set (of pattern 
  2 X 2 X 2) of 64-bit floating point in the little-endian order 
  and IEEE architecture. The dataset will be stored at
  "/work/h5/pkamat/First-set"

  2. Another configuration could be:

    PATH Second-set
    INPUT-CLASS IN  
    RANK 5
    DIMENSION-SIZES 6 3 5 2 4
    OUTPUT-CLASS IN
    OUTPUT-SIZE 32
      CHUNKED-DIMENSION-SIZES 2 2 2 2 2
    EXTENDIBLE-DATASET 1 3 
    COMPRESSION-TYPE GZIP
    COMPRESSION-PARAM 7


  The above configuration will accept an integer array 
  (6 X 3 X 5 x 2 x 4)  in a binary file with the rank and 
  dimension sizes specified and will save it in a chunked data-set
  (of pattern 2 X 2 X 2 X 2 X 2) of 32-bit floating point in 
  native format (as output-architecture is not specified). The 
  first and the third dimension will be defined as unlimited. The 
  data-set will be compressed using GZIP and a compression level 
  of 7.
  The dataset will be stored at "/Second-set"

gnwiii · February 12, 2020, 11:49pm

chinoune.mehdi

      [Chinoune Mehdi](https://forum.hdfgroup.org/u/chinoune.mehdi)




    February 12

h5import -h

Name:




  h5import TOOL NAME: h5import SYNTAX: h5import -h[elp], OR  The primary objective of the utility is to convert floating

  [...] The input data to be written as a data-set can be provided to the utility in one of the following forms: 1. ASCII text file with numeric data (floating point or integer data). 2. Binary file with **native** floating point data (32-bit or 64-bit) 3. Binary file with **native** integer (signed or unsigned) data (8-bit or 16-bit or 32-bit or 64-bit). 4. ASCII text file containing strings (text data).

For the Fedora 31 package, “h5input -h” and “man h5input” differ (see below).


   [...] <infile(s)>: Name of the Input file(s), containing a single n-dimensional floating point or integer array in either ASCII text, **native** floating point(32-bit or 64-bit) or **native** integer(8-bit or 16-bit or 32-bit or 64-bit). Data to be specified in the order of fastest changing dimensions first. [...] The configuration file may have the following keywords each followed by an acceptable value. Required KEYWORDS: PATH INPUT-CLASS INPUT-SIZE INPUT-BYTE-ORDER

The INPUT-BYTE-ORDER keyword conflicts with the “native” binary input requirement. “h5import -h” hdf5-10.x packages

mentions this keyword, but it isn’t mentioned in hdf5-1.8.1[89] on in the Fedora man page.

Lookiing at the source for hdf5-1.8.19, h5import.h has:

char keytable[NUM_KEYS][30] = {

    "PATH",

    "INPUT-CLASS",

    "INPUT-SIZE",

    "RANK",

    "DIMENSION-SIZES",

    "OUTPUT-CLASS",

    "OUTPUT-SIZE",

    "OUTPUT-ARCHITECTURE",

    "OUTPUT-BYTE-ORDER",

    "CHUNKED-DIMENSION-SIZES",

    "COMPRESSION-TYPE",

    "COMPRESSION-PARAM",

    "EXTERNAL-STORAGE",

    "MAXIMUM-DIMENSIONS"

};

[...]

I need to look at the hdf5-10.x source to see what it does with INPUT-BYTE-ORDER.

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

INPUT-BYTE-ORDER has no effect.