H5Fcreate and H5Fopen with chinese filename (wide char)


#1

Hi,

On windows, i want to create a hdf5 filename with Chinese characters.

Filename is a wide string (wchar_t *wcfilename)

H5Fcreate and H5Fopen seems to accept only ansi characters.

How to do you ?

Thanks you

···

_________________________________________________________________
Windows Live: Keep your friends up to date with what you do online.
http://www.microsoft.com/middleeast/windows/windowslive/see-it-in-action/social-network-basics.aspx?ocid=PID23461::T:WLMTAGL:ON:WL:en-xm:SI_SB_1:092010


Wide-string equivalent of H5Fcreate or H5Fopen?
#2

This is a recently exposed problem - currently there is no unicode support in the library. We plan to correct this once we identify all the locations which need to handle this properly.

Allen

···

Hi,

On windows, i want to create a hdf5 filename with Chinese characters.

Filename is a wide string (wchar_t *wcfilename)

H5Fcreate and H5Fopen seems to accept only ansi characters.

How to do you ?

Thanks you

_________________________________________________________________
Windows Live: Keep your friends up to date with what you do online.
http://www.microsoft.com/middleeast/windows/windowslive/see-it-in-action/social-network-basics.aspx?ocid=PID23461::T:WLMTAGL:ON:WL:en-xm:SI_SB_1:092010


#3

This is a recently exposed problem - currently there is no unicode support
in the library. We plan to correct this once we identify all the locations
which need to handle this properly.

Allen

>
> Hi,
>
> On windows, i want to create a hdf5 filename with Chinese characters.
>
> Filename is a wide string (wchar_t *wcfilename)
>
> H5Fcreate and H5Fopen seems to accept only ansi characters.
>
> How to do you ?
>
> Thanks you
>

In an emergency, you could try "modified UTF-8". That is UTF-8 without any
null characters. See wikipedia article "UTF-8".

···

On Thu, Oct 8, 2009 at 6:52 AM, Allen D Byrne <byrn@hdfgroup.org> wrote:

>
> _________________________________________________________________
> Windows Live: Keep your friends up to date with what you do online.
>
http://www.microsoft.com/middleeast/windows/windowslive/see-it-in-action/social-network-basics.aspx?ocid=PID23461::T:WLMTAGL:ON:WL:en-xm:SI_SB_1:092010

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


#4

Hi,

TEN years after, does this bug is fixed or not ?

Thanks


#5

AFAIK: for Windows you have to call MultiByteToWideChar - WideCharToMultiByte or you could check out third party projects to convert UTF8 to wide characters and back.

H5CPP is an alternative header only library introduced last year in Chicago C++ users Group. The library is as modern as it gets, layered directly on top of HDF5 CAPI. The attached example demonstrates UTF8 – this example is uploaded to the projects github page: examples/utf.
To use H5CPP copy the MIT licences header files and link against hdf5 library and its dependencies. Popular linear algebra libraries are supported, as well as LLVM based compiler assisted reflection added for arbitrary deep POD structs. See examples for details.

best wishes: steven


#6

Hi,
Thanks for your reply.

But it is not the answer of my question :wink:

On Windows try to create a .h5 file with chinese characters.
Try this example to create or open a file named 漢字.h5

h5::open(“漢字.h5”, H5F_ACC_RDWR)

File created on ntfs or fat32 will not have “漢字.h5” as name but: 漢字.h5

H5Fopen or H5Fcreate support should support wide-string as filename …

Can you confirm this issue ?

see also

Thanks


#7

Unfortunately I can’t confirm this issue for you, I do not speak for the HDFGroup. I can confirm that UTF8 works fine, and in my short example demonstrated UTF8 with various international glyphs and an updated example for file-open. Therefore we can agree: there is UTF support in HDF5 CAPI – but not necessary the version you want to see/use.

In my former article you find the pointers to conversion functions specific to Windows OS and a third party library to help you. If you work with modern C++ it is straightforward to implement the necessary automatic conversion templates between UTF8 and UTF32/UTF16.

In case of C you have to do more work: here is a link to an expert article, and read on the comments as well.

The only reason to avoid conversion within your library is when you have data in a massive scale. Considering filenames rarely is the case. Then again I may be mistaken.

On H5CPP: you are correct, currently I do not support Windows/wide character set yet – this will change soon – I just got a windows machine recently. If you are a user/consumer for H5CPP let me know and will add the required shim and will do the right thing – otherwise will be added in May or so with all the additional windows support: LLVM reflection tool, installer, etc.

hope it helps: steven


#8

H5Fopen( const char *name, unsigned flags, hid_t fapl_id ) need to have a wchar_t * version
H5FopenW( const wchar_t *name, unsigned flags, hid_t fapl_id )

H5Fopen uses H5F_open and uses _open
https://support.hdfgroup.org/ftp//HDF5/prev-releases/hdf5-1.8/hdf5-1.8.21/src/unpacked/src/H5win32defs.h

#define HDopen(S,F,M) _open(S,F|_O_BINARY,M)

It will be require to add
#define HDopenW(S,F,M) _wopen(S,F|_O_BINARY,M)

see
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/open-wopen?view=vs-2017


#9

Good catch! – as I see this macro is called multiple times, which means this tiny change will fan out resulting a complex code change by providing a wide character support to UTF8. Let’s see the first level CAPI changes:

static H5FD_t * H5FD__core_open(const char *name, unsigned flags, hid_t fapl_id, haddr_t maxaddr){  }
static H5FD_t * H5FD_direct_open(const char *name, unsigned flags, hid_t fapl_id, haddr_t maxaddr)
static herr_t H5D__efl_read(const H5O_efl_t *efl, const H5D_t *dset, haddr_t addr, size_t size, uint8_t *buf)
static herr_t H5D__efl_write(const H5O_efl_t *efl, const H5D_t *dset, haddr_t addr, size_t size, const uint8_t *buf)
static H5FD_t * H5FD_sec2_open(const char *name, unsigned flags, hid_t fapl_id, haddr_t maxaddr)
static H5FD_t * H5FD_log_open(const char *name, unsigned flags, hid_t fapl_id, haddr_t maxaddr)

All these must take wide chars, or alternatively an additional flag if the passed argument is such. Let’s see if anyone bites on this non-trivial modification.

Or did I make any mistake in my estimate and on a given system UTF8 and UTF16 is mutually exclusive? If so conditionally redefining the macro is viable and correct solution?


#10

In addition: from this MS windows reading only NTFS is using UTF-16 and doesn’t care much of its content therefore it seems there are two cases:

  1. FAT12/16/32 with OEM character tables, inviting an implementer for a great adventure.

  2. NTFS takes anything two-by-two speaking in bytes. If this is the only case then all input file names may be safely mapped to UTF-16 from UTF-8 then call the wide character _wopen. This cooperative strategy should result in other windows UTF-16 compliant software able to read the filenames back from NTFS as they were meant to be.

The advantage of this proposal is simplicity: only the #define needs to be conditionally expanded. The downside is ignoring the possibility when NTFS files are accessed on a non-Windows host. The Linux NTFS kernel module has an additional nls=utf8 mount flag to able to read unicode filenames, but a quick read didn’t tell me what happens to UTF-16 characters.

With this approach on the client side there is only UTF-8, which works for non-windows hosts, and for windows there is UTF-16 + undefined behaviour when accessing files on FAT 12/16/32 format. (<-- not sure if this statment is true)

The alternative to previous proposal is to add H5Fcreate_w and H5Fopen_w – as suggested – exported calls, in the implementation redefine HDopen to #define HDopenW(S,F,M) _wopen(S,F|_O_BINARY,M) then just before exist point clean up the change with #undef and roll back to whatever was defined. This strategy provides a method and delegates its application to clients. The downside is to add more calls to the otherwise extensive library of HDF5 CAPI.

I wonder how others see this? would it work? sane enough to make it happen? alternatives I didn’t think of?

best: steven