Non English Characters in HDF5 file name

The HDF5 doesn’t supports non english characters(e.g: Japanese chars) in the file name(H5).

Means H5F.Create,Open & Close functions fails when the file name in non english characters. I know its supports UTF-8 characters in the group name,Dataset, attributes names and values. But its not supporting with the file name.

Is there any other way to work with it. I need the file name in japanese character.

1 Like

For what it’s worth: We at the HDF Group have long-considered whether full Unicode / UTF-8 support (including filenames) should be put on the roadmap, but it’s generally been the classic challenge of:

  1. Fairly significant effort
  2. Historically low community interest

It would be interesting to hear from other folks in the HDF community to see if there’s strong interest in adding Unicode support throughout the library. Is this something that the international HDF users would be open to crowdfunding?

Cheers!

– Dave

Hi David!

19.07.2018 3:16, david.pearah пишет:

For what it’s worth: We at the HDF Group have long-considered whether
full Unicode / UTF-8 support (including filenames) should be put on the
roadmap, but it’s generally been the classic challenge of:

  1. Fairly significant effort
  2. Historically low community interest

It would be interesting to hear from other folks in the HDF community
to see if there’s strong interest in adding Unicode support throughout
the library. Is this something that the international HDF users would be
open to crowdfunding?

I’m not sure about monetary funding, but hopefully public issue tracker
provides a nice place for technical discussions and pull requests, soon!
At least, current solutions/workarounds could be summarized/collected
there, for anyone’s reference.

For now, some pretty valuable information is available at


(please also follow the links)

Best wishes,
Andrey Paramonov

I believe that it will work if you pass a path to H5Fcreate that uses system text encoding (also called “system locale”, “language for non-Unicode programs”, “current code page”), and if all of the characters you are using can be represented in system text encoding. If your path is originally in UTF-8 and you are running on a Japanese system, then you would need to convert from UTF-8 to Shift-JIS using UTF-16 as an intermediary.

To do this, first use MultiByteToWideChar to convert from UTF-8 (CP_UTF8) to UTF-16. Then use WideCharToMultiByte to convert from UTF-16 to the current code page (CP_ACP). If the characters in the original path can be represented in the current code page (i.e., system locale), this will succeed and give you a path that should work with H5Fcreate.

We recently ran into this issue–specifically with Unicode characters in the filename. We are working exclusively on Windows-based systems so this probably doesn’t apply to other environments. What I’m doing to work around the issue is passing the Windows short file path (8.3 format) to the H5open. You can get the short name from an already existing path using GetShortPathName in the kernel32.dll. Since the path needs to already exist, it would not be possible for the filename to contain Unicode characters, only the path. The didn’t matter for us since the filename is always the same, only the path is chosen by the user.

1 Like

@david.pearah,

Could you please split this issue into two parts, file names for open/create, and internal storage of file names; and give some priority to open/create? It seems that this part should be easy, and will alleviate the bulk of this problem for international Windows users.

There have been several patch contributions over the years, most recently this one from Christian Seiler which is well considered. Can something like this be included in the next release?

UTF-8 file name compatibility for HDF5 is impacting other communities, such as test failures when building netCDF inside a user account named with non-English characters.

Thanks for your consideration.

Dave,

Unfortunately, those two issues cannot be cleanly separated.

HDF5 file stores names of the files used by, for example, external links, VDS and split VFD to name a few. Those also have to be open with the same underlying functions. We have to have a comprehensive solution and not just a patch for H5Fopen on Windows.

We understand the pain and will revisit the topic of a patch internally, but the right thing will be for all of us to work together to fix the issue once and for all.

All,

I am wondering… How many people on the FORUM are willing to contribute their time and knowledge, and work with The HDF Group developers on this feature?

Thank you!

Elena

I’d like to see this fixed.
It often comes up with new users of our software. We moved to using HDF5 from our proprietary file format so this became an issue when we changed, prompting lots of questions such as “why can’t I save files to my ‘Users’ directory anymore? I used to be able to.” It would seem like fixing this should be an important issue to the increase in adoption of HDF5 especially in enterprise.

Thumbs up for the previous comments.
I would also appreciate to see this topic fixed: it is quite tricky to understand why in some case hdf5 files aren’t saved. It is paticularly annoying in French due to the accents and specific characters such as “ç”.

Thanks you for your consideration.

I would be thrilled to fix this issue. The problem is that it’s a huge amount of effort with no obvious funding source. Everyone wants this problem to be fixed but nobody wants it fixed so badly that they are willing to pay for an engineer to spend the better part of a year fixing it properly. A lot of people seem to think that we just need to tweak the “open file” code, but that isn’t true. So much stuff in the library is affected by Unicode file names on Windows and doing a hasty job will risk dramatically increasing our technical debt and bug count.

1 Like

For anyone who comes across this old thread, we’ve added enough support to get Unicode working, as well as code pages (common in Japan, from what I understand). This is difficult for us to test as we don’t have a bunch of international Windows VMs set up, so please report failures to us so we can fix things.

This is not comprehensive support, but should at least allow people to open files.