Dear James, Chris and all HDF users
This is a RFC (Request For Comments) regarding a new feature in h5dump.
As you might know, h5dump has an option to display several properties of the dataset creation property list. If you specify this option at the command line
-p, --properties Print dataset filters, storage layout and fill value
h5dump prints several properties regarding filters, storage layout and fill value
for example
./h5dump -H -p -d deflate tfilters.h5
produces the output
HDF5 "tfilters.h5" {
DATASET "deflate" {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 20, 10 ) / ( 20, 10 ) }
STORAGE_LAYOUT {
CHUNKED ( 10, 5 )
SIZE 385
}
FILTERS {
COMPRESSION DEFLATE { LEVEL 9 }
}
FILLVALUE {
FILL_TIME H5D_FILL_TIME_IFSET
VALUE 0
}
ALLOCATION_TIME {
H5D_ALLOC_TIME_INCR
}
}
}
There was a request to display a "compression ratio" in cases where compression filters are present
The values to compare are
A = theoretical maximum size of a dataset, obtained by multiplying the number of elements in a dataset by the size in bytes of each element. For example, for a dataset with 25 elements with integer type of 4 bytes, this size is 100
B = size obtained by the HDF5 function H5Dget_storage_size, that returns the amount of storage required for a dataset. If the dataset has compression filters this number is typically smaller than A
http://www.hdfgroup.org/HDF5/doc/RM/RM_H5D.html#Dataset-GetStorageSize
Note: for the moment, we assume that all chunks are written. For cases where this is not true, a new function H5Dget_chunk_info is being developed, that will provide a better measure for this.
In our view, this "compression ratio" could be achieved in 2 ways.
1) a simple ratio, e.g , B/A
2) a percentage, e.g, (A-B)/B.
So, what we are asking is if you have any preferred way to achieve this, either one of the above formulations or any other means to express it that you would like to suggest.
We propose to do the printing of this value after the SIZE information, for example, in the case above
SIZE 385 (51.9%COMPRESSION)
The final print would look like
HDF5 "tfilters.h5" {
DATASET "deflate" {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 20, 10 ) / ( 20, 10 ) }
STORAGE_LAYOUT {
CHUNKED ( 10, 5 )
SIZE 385 (51.9%COMPRESSION)
}
FILTERS {
COMPRESSION DEFLATE { LEVEL 9 }
}
FILLVALUE {
FILL_TIME H5D_FILL_TIME_IFSET
VALUE 0
}
ALLOCATION_TIME {
H5D_ALLOC_TIME_INCR
}
}
}
Here's the complete h5dump usage for your reference
usage: h5dump [OPTIONS] file
OPTIONS
-h, --help Print a usage message and exit
-n, --contents Print a list of the file contents and exit
-B, --superblock Print the content of the super block
-H, --header Print the header only; no data is displayed
-A, --onlyattr Print the header and value of attributes
-i, --object-ids Print the object ids
-r, --string Print 1-byte integer datasets as ASCII
-e, --escape Escape non printing characters
-V, --version Print version number and exit
-a P, --attribute=P Print the specified attribute
-d P, --dataset=P Print the specified dataset
-y, --noindex Do not print array indices with the data
-p, --properties Print dataset filters, storage layout and fill value
-f D, --filedriver=D Specify which driver to open the file with
-g P, --group=P Print the specified group and all members
-l P, --soft-link=P Print the value(s) of the specified soft link
-o F, --output=F Output raw data into file F
-b B, --binary=B Binary file output, of form B
-t P, --datatype=P Print the specified named datatype
-w N, --width=N Set the number of columns of output
-q Q, --sort_by=Q Sort groups and attributes by index Q
-z Z, --sort_order=Z Sort groups and attributes by order Z
-x, --xml Output in XML using Schema
-u, --use-dtd Output in XML using DTD
-D U, --xml-dtd=U Use the DTD or schema at U
-X S, --xml-ns=S (XML Schema) Use qualified names n the XML
":": no namespace, default: "hdf5:"
E.g., to dump a file called `-f', use h5dump -- -f
Subsetting is available by using the following options with a dataset
attribute. Subsetting is done by selecting a hyperslab from the data.
Thus, the options mirror those for performing a hyperslab selection.
The START and COUNT parameters are mandatory if you do subsetting.
The STRIDE and BLOCK parameters are optional and will default to 1 in
each dimension.
-s L, --start=L Offset of start of subsetting selection
-S L, --stride=L Hyperslab stride
-c L, --count=L Number of blocks to include in selection
-k L, --block=L Size of block in hyperslab
D - is the file driver to use in opening the file. Acceptable values
are "sec2", "family", "split", "multi", "direct", and "stream". Without
the file driver flag, the file will be opened with each driver in
turn and in the order specified above until one driver succeeds
in opening the file.
F - is a filename.
P - is the full path from the root group to the object.
N - is an integer greater than 1.
L - is a list of integers the number of which are equal to the
number of dimensions in the dataspace being queried
U - is a URI reference (as defined in [IETF RFC 2396],
updated by [IETF RFC 2732])
B - is the form of binary output: MEMORY for a memory type, FILE for the
file type, LE or BE for pre-existing little or big endian types.
Must be used with -o (output file) and it is recommended that
-d (dataset) is used
Q - is the sort index type. It can be "creation_order" or "name" (default)
Z - is the sort order type. It can be "descending" or "ascending" (default)
Examples:
1) Attribute foo of the group /bar_none in file quux.h5
h5dump -a /bar_none/foo quux.h5
2) Selecting a subset from dataset /foo in file quux.h5
h5dump -d /foo -s "0,1" -S "1,1" -c "2,3" -k "2,2" quux.h5
3) Saving dataset 'dset' in file quux.h5 to binary file 'out.bin'
using a little-endian type
h5dump -d /dset -b LE -o out.bin quux.h5
···
--------------------------------------------------------------
Pedro Vicente (T) 217.265-0311
pvn@hdfgroup.org
The HDF Group. 1901 S. First. Champaign, IL 61820
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.