Non English Characters in HDF5 file name


#1

The HDF5 doesn’t supports non english characters(e.g: Japanese chars) in the file name(H5).

Means H5F.Create,Open & Close functions fails when the file name in non english characters. I know its supports UTF-8 characters in the group name,Dataset, attributes names and values. But its not supporting with the file name.

Is there any other way to work with it. I need the file name in japanese character.


#2

For what it’s worth: We at the HDF Group have long-considered whether full Unicode / UTF-8 support (including filenames) should be put on the roadmap, but it’s generally been the classic challenge of:

  1. Fairly significant effort
  2. Historically low community interest

It would be interesting to hear from other folks in the HDF community to see if there’s strong interest in adding Unicode support throughout the library. Is this something that the international HDF users would be open to crowdfunding?

Cheers!

– Dave


#3

Hi David!

19.07.2018 3:16, david.pearah пишет:

For what it’s worth: We at the HDF Group have long-considered whether
full Unicode / UTF-8 support (including filenames) should be put on the
roadmap, but it’s generally been the classic challenge of:

  1. Fairly significant effort
  2. Historically low community interest

It would be interesting to hear from other folks in the HDF community
to see if there’s strong interest in adding Unicode support throughout
the library. Is this something that the international HDF users would be
open to crowdfunding?

I’m not sure about monetary funding, but hopefully public issue tracker
provides a nice place for technical discussions and pull requests, soon!
At least, current solutions/workarounds could be summarized/collected
there, for anyone’s reference.

For now, some pretty valuable information is available at


(please also follow the links)

Best wishes,
Andrey Paramonov


#4

I believe that it will work if you pass a path to H5Fcreate that uses system text encoding (also called “system locale”, “language for non-Unicode programs”, “current code page”), and if all of the characters you are using can be represented in system text encoding. If your path is originally in UTF-8 and you are running on a Japanese system, then you would need to convert from UTF-8 to Shift-JIS using UTF-16 as an intermediary.

To do this, first use MultiByteToWideChar to convert from UTF-8 (CP_UTF8) to UTF-16. Then use WideCharToMultiByte to convert from UTF-16 to the current code page (CP_ACP). If the characters in the original path can be represented in the current code page (i.e., system locale), this will succeed and give you a path that should work with H5Fcreate.


#5

We recently ran into this issue–specifically with Unicode characters in the filename. We are working exclusively on Windows-based systems so this probably doesn’t apply to other environments. What I’m doing to work around the issue is passing the Windows short file path (8.3 format) to the H5open. You can get the short name from an already existing path using GetShortPathName in the kernel32.dll. Since the path needs to already exist, it would not be possible for the filename to contain Unicode characters, only the path. The didn’t matter for us since the filename is always the same, only the path is chosen by the user.