h5ls and h5dump have a --string option. I expected it to have the effect of printing a dataste of type string (or a dataset of type 1 byte integers as a single, contiguous string. Instead, it prints the dataset as a comma separated sequence of quoted characters as in case_2 below
/case_2 Dataset {22/22}
Location: 1:1696
Links: 1
Storage: 22 logical bytes, 22 allocated bytes, 100.00% utilization
Type: 1-byte null-terminated ASCII string
Data:
(0) "t", "h", "i", "s", " ", "i", "s", " ", "m", "y", " ", "{", "}", " ", "s", "t", "r", "i", "n", "g", "!", ""
/case_3 Dataset {SCALAR}
Location: 1:4192
Links: 1
Storage: 30 logical bytes, 30 allocated bytes, 100.00% utilization
Type: 30-byte space-padded ASCII string
Data:
(0) "this is my {} string! \000"
So --string doesn’t really do anything stringwise. I don’t think its the best name for this option. What it does is interpret the values in the dataset as characters (in this case ascii characters…I imagine one might like wide characters or utf8 charcterst too) The point is, all --string is doing is changing the interpretation of the integral values but is otherwise printing each one, independently, not at all like a string.
It seems there are a couple of desireable features here…
- One is to control the interpretation of the individual values in the dataset
- Another is to control whether those values are catenated together into a longer, unseparated sequence or “string”
I would like to see h5ls/h5dump offer separate controls for both. In fact, I would think defaulting to utf8 interpretation and stringifying for 8 bit integral data would be a good idea and the user would have to take action to prevent that.