Improve or alternative to --string option for h5dump/h5ls

h5ls and h5dump have a --string option. I expected it to have the effect of printing a dataste of type string (or a dataset of type 1 byte integers as a single, contiguous string. Instead, it prints the dataset as a comma separated sequence of quoted characters as in case_2 below

/case_2                  Dataset {22/22}
Location:  1:1696
Links:     1
Storage:   22 logical bytes, 22 allocated bytes, 100.00% utilization
Type:      1-byte null-terminated ASCII string
Data:
    (0) "t", "h", "i", "s", " ", "i", "s", " ", "m", "y", " ", "{", "}", " ", "s", "t", "r", "i", "n", "g", "!", ""
/case_3                  Dataset {SCALAR}
Location:  1:4192
Links:     1
Storage:   30 logical bytes, 30 allocated bytes, 100.00% utilization
Type:      30-byte space-padded ASCII string
Data:
    (0) "this is my {} string!        \000"

So --string doesn’t really do anything stringwise. I don’t think its the best name for this option. What it does is interpret the values in the dataset as characters (in this case ascii characters…I imagine one might like wide characters or utf8 charcterst too) The point is, all --string is doing is changing the interpretation of the integral values but is otherwise printing each one, independently, not at all like a string.

It seems there are a couple of desireable features here…

  • One is to control the interpretation of the individual values in the dataset
  • Another is to control whether those values are catenated together into a longer, unseparated sequence or “string”

I would like to see h5ls/h5dump offer separate controls for both. In fact, I would think defaulting to utf8 interpretation and stringifying for 8 bit integral data would be a good idea and the user would have to take action to prevent that.

Mark,

Something is wrong here. We will need to look into the issue.

String option should display a string and not an array of characters and I know it worked in the past. Also, you have a fixed-length string, it is usually displayed as a string and not as an array of bytes interpreted as characters.

What happens when you don’t specify the string option?

Which version of HDF5 you are using?

I I’ve just filed a bug report
HDFFV-10932
so we can keep track of the issue.

Thank you!

Elena

I have a vague recollection this worked as desired in the past too. But, I am getting old enough to know to not trust what I think I remember :wink:

If I do not specify --string option, I get the same behavior…it interprets individual values as ascii and displays each value, comma separated with quotes.

I am using version 1.10.4 for these tests.

Our collective old memory is not failing yet!

It is a bug!

Thank you!

Elena

@miller86 Hi Mark,
I made up a dataset, containing an array of 1-byte ASCII characters:
‘t’, ‘h’, ‘i’, ‘s’, ’ ', ‘i’, ‘s’, ’ ', ‘m’, ‘y’, ’ ', ‘s’, ‘t’, ‘r’, ‘i’, ‘n’, ‘g’, ‘\0’
and dumped it using 1.10.4 h5dump without and with --string as below:

DATASET “Test” {
DATATYPE H5T_STD_I8LE
DATASPACE SIMPLE { ( 18 ) / ( 18 ) }
DATA {
(0): 116, 104, 105, 115, 32, 105, 115, 32, 109, 121, 32, 115, 116, 114,
(14): 105, 110, 103, 0
}

DATASET “Test” {
DATATYPE H5T_STD_I8LE
DATASPACE SIMPLE { ( 18 ) / ( 18 ) }
DATA {
“this is my string\000”
}

Obviously, I didn’t recreate the same data as what you had, so I’m checking to see if you could send us your file or the program you used to create that file. Thanks!