Hello!
Using the C library on a POSIX platform, when I call H5Lunpack_elink_val()
, what is the character set of the filename returned in the filename
argument?
Thanks!
Hello!
Using the C library on a POSIX platform, when I call H5Lunpack_elink_val()
, what is the character set of the filename returned in the filename
argument?
Thanks!
Since file access is a system issue, filenames do not fall within the scope of HDF5’s UTF-8 capabilities; filenames are encoded at the system level.
Linux and Mac OS systems normally handle UTF-8 encoded filenames correctly while Windows systems generally do not.
Reference: Using UTF-8 Encoding in HDF5 Applications | The HDF Group Support Site
Thanks for your reply! So, really whoever creates an HDF5 file with external links would have to document what encoding was used for the filename in the external links. Since the encoding is not part of the HDF5 standard (e.g., the standard doesn’t say that all filenames in external links are encoded as UTF-8) and the encoding is not stored in the HDF5 itself, the only option I can see is for whoever created the HDF5 file to state (e.g., in documentation) the encoding of the filename in external links. I doubt many, if any, do this.
Hi, @jlmuir!
I think you can run
$ h5dump --xml <your file>
and see what’s stored and what’s not.
Here’s an example output from a file created by h5serv/test/integ/setupdata.py
<?xml version="1.0" encoding="UTF-8"?>
<hdf5:HDF5-File xmlns:hdf5="http://hdfgroup.org/HDF5/XML/schema/HDF5-File.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://hdfgroup.org/HDF5/XML/schema/HDF5-File http://www.hdfgroup.org/HDF5/XML/schema/HDF5-File.xsd">
<hdf5:RootGroup OBJ-XID="xid_96" H5Path="/">
<hdf5:ExternalLink LinkName="external_link1" OBJ-XID="xid_18446744073709551614" H5SourcePath="/external_link1" TargetFilename="tall.h5" TargetPath="g1/g1.1" Parents="xid_96" H5ParentPaths="/" />
<hdf5:ExternalLink LinkName="external_link2" OBJ-XID="xid_18446744073709551613" H5SourcePath="/external_link2" TargetFilename="tall" TargetPath="g1/g1.1" Parents="xid_96" H5ParentPaths="/" />
<hdf5:ExternalLink LinkName="external_link3" OBJ-XID="xid_18446744073709551612" H5SourcePath="/external_link3" TargetFilename="tall.test.hdfgroup.org" TargetPath="g1/g1.1" Parents="xid_96" H5ParentPaths="/" />
<hdf5:ExternalLink LinkName="external_link4" OBJ-XID="xid_18446744073709551611" H5SourcePath="/external_link4" TargetFilename="/content/tall.h5" TargetPath="g1/g1.1" Parents="xid_96" H5ParentPaths="/" />
<hdf5:ExternalLink LinkName="external_link5" OBJ-XID="xid_18446744073709551610" H5SourcePath="/external_link5" TargetFilename="tall.subdir.test.hdfgroup.org" TargetPath="g1/g1.1" Parents="xid_96" H5ParentPaths="/" />
<hdf5:ExternalLink LinkName="external_link6" OBJ-XID="xid_18446744073709551609" H5SourcePath="/external_link6" TargetFilename="tall.subdir" TargetPath="g1/g1.1" Parents="xid_96" H5ParentPaths="/" />
<hdf5:ExternalLink LinkName="external_link7" OBJ-XID="xid_18446744073709551608" H5SourcePath="/external_link7" TargetFilename="subdir/tall.h5" TargetPath="g1/g1.1" Parents="xid_96" H5ParentPaths="/" />
<hdf5:Group Name="g1" OBJ-XID="xid_800" H5Path="/g1" Parents="xid_96" H5ParentPaths="/" >
<hdf5:Group Name="g1.1" OBJ-XID="xid_1832" H5Path="/g1/g1.1" Parents="xid_800" H5ParentPaths="/g1" >
</hdf5:Group>
</hdf5:Group>
<hdf5:SoftLink LinkName="soft_link" OBJ-XID="xid_800" H5SourcePath="/soft_link" TargetPath="g1" TargetObj="xid_800" Parents="xid_96" H5ParentPaths="/" />
</hdf5:RootGroup>
</hdf5:HDF5-File>
Well, are you saying that I should look at the encoding
attribute of the xml
element? If so, I don’t think I can trust that. You said upthread that it’s not part of the spec nor possible to specify in the HDF5 file the encoding of the filename
in an external link. Because of that, there’s no way that h5dump
can know the encoding of the filename
in an external link. My guess is that it just assumes UTF-8, but based on what you said, that’s just an assumption, and it could be wrong. For example, I could create an HDF5 file with an external link with a filename
encoded in an encoding that is not UTF-8, and yet, if I’m understanding things correctly, I bet h5dump
would incorrectly assume that the encoding is UTF-8.
Hi, @jlmuir, you may want to search (Ctrl+F) “UTF-8” in the following document:
HDF5 File Format Specification Version 3.0
I hope the above document can answer your question.
I haven’t tested anything related to filename character set and external link.
So, I’m very curious how your test HDF5 file will look like in XML.
Regards,