Wide-string equivalent of H5Fcreate or H5Fopen?


#1

Hi,

hdf5 C/C++ api seems to not manage wide string for H5Fcreate or H5Fopen

It is a important feature for windows users …

Can you confirm it ?

Thanks

see also this 10 old year report


#2

see also with issue about matio and hdf5


#3

Hello!

I confirm that HDF5 still cannot handle wide strings for H5Fcreate/open on Windows. We hope to address this particular issue for Windows by providing Windows File Driver that is in works now.

In the past we received several patches but we couldn’t accept them.

Thank you!

Elena


#4

Hi,

Have you an roadmap for an official patch ?

This problem was already reported 10 years ago …

Thanks


#5

Hi Elena

Not sure which patches you refer to. I am using my patch productively for years, in thousands of user installations without a problem and I would happily share it. It’s not even a patch - I hook your functions.

Besides, hdf5>1.8 cannot support utf8 because you’re violating your own design and use the path name in the core library.

Kind regards

Dimitris


#6

Hi, Dimitris,

Have you a public git repository with your patch applied for years ?
It could be interesting to have a maintened fork until HDF5 support officialy unicode for filenames.

Thanks


#7

Hi Dimitris,

I am not sure what do you mean when saying “violating your own design…”.

1.6 and 1.8 versions have features like external dataset storage, multi-file driver, external links that require storage of file names.

The patches I referred to were, for example, the patches that mixed Win32 and POSIX calls; they also didn’t work with the features mentioned above.

Here is the outline of what it would take to make HDF5 library to be “UTF-8”-ed (recall that only raw data, attribute and link names can have UTF-8 encoding now):

  1. Enhance file format to store string encoding along with the string (may be we can avoid it ?)
  2. Implement changes in the library to handle UTF-8 strings.
  3. Enable UTF-8 encoding for the following features:
  • VFDs (split/multi/family/log)
  • VDS
  • External links
  • Names of the fields in compound datatype
  • Names of enums
  • File mounting
  • External storage for datasets

It is a pretty big job, but we hope to get to it at some point. We would be more than happy to work with community members to implement this change.

Thank you!

Elena


#8

Hi Nelson

No I don’t…

Kind regards

Dimitris


#9

Hi Elena!

Here is the outline of what it would take to make HDF5 library to be
“UTF-8”-ed (recall that only raw data, attribute and link names can have
UTF-8 encoding now):

  1. Enhance file format to store string encoding along with the string
    (may be we can avoid it ?)

Given the brilliant property of ASCII compatibility of UTF-8, the
question is: may all remaining ASCII strings just start to be considered
UTF-8 in 1.12? Or equivalently: which string are currently allowed to be
“extended ASCII”/ANSI, not just plain ASCII?

I believe at least

  • Names of the fields in compound datatype
  • Names of enums
    and maybe
  • External storage for datasets
    are currently ASCII-only – is it right?

Best wishes,
Andrey Paramonov


#10

Hi Elena

I have already filed a bug with Barbara on this. Quoting my mail

The problem is that your interpretation of H5_build_extpath and H5F_build_actual_name fails on Windows because you are using the wrong system functions to do the job. I quote your documentation:

Filenames

Since file access is a system issue, filenames do not fall within the scope of HDF5’s UTF-8 capabilities; filenames are encoded at the system level.

Linux and Mac OS systems normally handle UTF-8 encoded filenames correctly while Windows systems generally do not.

https://support.hdfgroup.org/HDF5//doc/Advanced/UsingUnicode/index.html

so please provide an open end to do this on the VFL side. In many cases paths can be much more complex and depending on whatever VFL I choose to implement for a specific backend. It can be very simple:

/* Formulate the absolute path for later search of target file for external links */

file->extpath = lf->get_path(name);

if(!file->extpath)

if(H5_build_extpath(name, &file->extpath) < 0)

HGOTO_ERROR(H5E_FILE, H5E_CANTINIT, NULL, “unable to build extpath”)

/* Formulate the actual file name, after following symlinks, etc. */

file->actual_name = lf->get_actual_name(name);

if(!file->actual_name)

if(H5F_build_actual_name(file, a_plist, name, &file->actual_name) < 0)

HGOTO_ERROR(H5E_FILE, H5E_CANTINIT, NULL, “unable to build actual name”)

Where get_path and get_actual_name
are simple hooks on the file driver that return the appropriate paths and names or fall back to your current implementation. It would be a huge improvement for all of us if you could do that. Currently we are unable to upgrade to any newer HDF5 because of this (or I need to patch your code).

and later

Please note: resolving the name at such a high level API and not delegating it to the driver stack is against the fundamental premise that the library does not deal with system issues. So far I could comment out these lines and everything worked, now these paths are used to identify groups and datasets!

In a nutshell: you’re using system level functions relating to file paths outside the VFL driver which is by definition wrong

Kind regards

Dimitris

Στις Τρί, 12 Μαρ 2019 στις 9:54 π.μ., ο/η Andrey Paramonov noreply@forum.hdfgroup.org έγραψε: