Wide character string types in HDF5

Hi all,

Some of my users have been asking about storing UTF-16 or UTF-32
fixed-length strings in HDF5. Are there currently any plans to
support wide character datatypes? Note this is a slightly different
thing than UTF-8 support, which results in variable-length data; for
example, NumPy has a Unicode string datatype consisting of a fixed
length of UTF-32 code points.

Thanks!
Andrew

Hello,

I would also like to request HDF5 to please consider supporting
UTF-32. One benefit of UTF-32 is that it is not a variable-length
encoding. Indexing the code points is a constant-time operation, as
opposed to the sequential access requirement in variable-length
encodings. The scientific Python community, which is large and
growing, is in the process of migrating from python-2 to python-3. All
strings in Python-3 are Unicode, and as Andrew mentioned, NumPy
(Python's array package) addresses the need for storing fixed-length
Unicode strings in the most general way: a Unicode string datatype
consisting of fixed-length of UTF-32 code points. But there doesn't
appear to be a way to store this datatype in HDF5. Would you please
consider adding support for this datatype in a future version of HDF5?

Thank you,
Darren

···

On Mon, Oct 10, 2011 at 9:29 PM, Andrew Collette <andrew.collette@gmail.com> wrote:

Hi all,

Some of my users have been asking about storing UTF-16 or UTF-32
fixed-length strings in HDF5. Are there currently any plans to
support wide character datatypes? Note this is a slightly different
thing than UTF-8 support, which results in variable-length data; for
example, NumPy has a Unicode string datatype consisting of a fixed
length of UTF-32 code points.

Thanks!
Andrew

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Please excuse me for "bumping", I was just wondering if anyone at the
HDF5 group saw this request. I probably should not have sent it right
before the holidays.

Best regards,
Darren

···

On Tue, Dec 20, 2011 at 9:49 AM, Darren Dale <dsdale24@gmail.com> wrote:

Hello,

I would also like to request HDF5 to please consider supporting
UTF-32. One benefit of UTF-32 is that it is not a variable-length
encoding. Indexing the code points is a constant-time operation, as
opposed to the sequential access requirement in variable-length
encodings. The scientific Python community, which is large and
growing, is in the process of migrating from python-2 to python-3. All
strings in Python-3 are Unicode, and as Andrew mentioned, NumPy
(Python's array package) addresses the need for storing fixed-length
Unicode strings in the most general way: a Unicode string datatype
consisting of fixed-length of UTF-32 code points. But there doesn't
appear to be a way to store this datatype in HDF5. Would you please
consider adding support for this datatype in a future version of HDF5?

Thank you,
Darren

On Mon, Oct 10, 2011 at 9:29 PM, Andrew Collette > <andrew.collette@gmail.com> wrote:

Hi all,

Some of my users have been asking about storing UTF-16 or UTF-32
fixed-length strings in HDF5. Are there currently any plans to
support wide character datatypes? Note this is a slightly different
thing than UTF-8 support, which results in variable-length data; for
example, NumPy has a Unicode string datatype consisting of a fixed
length of UTF-32 code points.

Thanks!
Andrew

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Darren and Andrew, and All,

Happy New Year!

The request is in our issues database.

Unfortunately right now we don't have any resources to work on it. We understand that HDF5 interoperability with Python is very important to our users and will try to address the issue, but it may take some time. We will be more than happy to accept a patch if it becomes available.

Elena

···

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On Jan 3, 2012, at 9:03 AM, Darren Dale wrote:

Please excuse me for "bumping", I was just wondering if anyone at the
HDF5 group saw this request. I probably should not have sent it right
before the holidays.

Best regards,
Darren

On Tue, Dec 20, 2011 at 9:49 AM, Darren Dale <dsdale24@gmail.com> wrote:

Hello,

I would also like to request HDF5 to please consider supporting
UTF-32. One benefit of UTF-32 is that it is not a variable-length
encoding. Indexing the code points is a constant-time operation, as
opposed to the sequential access requirement in variable-length
encodings. The scientific Python community, which is large and
growing, is in the process of migrating from python-2 to python-3. All
strings in Python-3 are Unicode, and as Andrew mentioned, NumPy
(Python's array package) addresses the need for storing fixed-length
Unicode strings in the most general way: a Unicode string datatype
consisting of fixed-length of UTF-32 code points. But there doesn't
appear to be a way to store this datatype in HDF5. Would you please
consider adding support for this datatype in a future version of HDF5?

Thank you,
Darren

On Mon, Oct 10, 2011 at 9:29 PM, Andrew Collette >> <andrew.collette@gmail.com> wrote:

Hi all,

Some of my users have been asking about storing UTF-16 or UTF-32
fixed-length strings in HDF5. Are there currently any plans to
support wide character datatypes? Note this is a slightly different
thing than UTF-8 support, which results in variable-length data; for
example, NumPy has a Unicode string datatype consisting of a fixed
length of UTF-32 code points.

Thanks!
Andrew

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org