What is the character set of an external link filename?

Hello!

Using the C library on a POSIX platform, when I call H5Lunpack_elink_val(), what is the character set of the filename returned in the filename argument?

Thanks!

Filenames

Since file access is a system issue, filenames do not fall within the scope of HDF5’s UTF-8 capabilities; filenames are encoded at the system level.

Linux and Mac OS systems normally handle UTF-8 encoded filenames correctly while Windows systems generally do not.

Reference: Using UTF-8 Encoding in HDF5 Applications | The HDF Group Support Site

Thanks for your reply! So, really whoever creates an HDF5 file with external links would have to document what encoding was used for the filename in the external links. Since the encoding is not part of the HDF5 standard (e.g., the standard doesn’t say that all filenames in external links are encoded as UTF-8) and the encoding is not stored in the HDF5 itself, the only option I can see is for whoever created the HDF5 file to state (e.g., in documentation) the encoding of the filename in external links. I doubt many, if any, do this.

1 Like

Hi, @jlmuir!

I think you can run

$ h5dump --xml <your file>

and see what’s stored and what’s not.

Here’s an example output from a file created by h5serv/test/integ/setupdata.py

<?xml version="1.0" encoding="UTF-8"?>
<hdf5:HDF5-File xmlns:hdf5="http://hdfgroup.org/HDF5/XML/schema/HDF5-File.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://hdfgroup.org/HDF5/XML/schema/HDF5-File http://www.hdfgroup.org/HDF5/XML/schema/HDF5-File.xsd">
<hdf5:RootGroup OBJ-XID="xid_96" H5Path="/">
   <hdf5:ExternalLink LinkName="external_link1" OBJ-XID="xid_18446744073709551614" H5SourcePath="/external_link1" TargetFilename="tall.h5"  TargetPath="g1/g1.1"  Parents="xid_96" H5ParentPaths="/" />
   <hdf5:ExternalLink LinkName="external_link2" OBJ-XID="xid_18446744073709551613" H5SourcePath="/external_link2" TargetFilename="tall"  TargetPath="g1/g1.1"  Parents="xid_96" H5ParentPaths="/" />
   <hdf5:ExternalLink LinkName="external_link3" OBJ-XID="xid_18446744073709551612" H5SourcePath="/external_link3" TargetFilename="tall.test.hdfgroup.org"  TargetPath="g1/g1.1"  Parents="xid_96" H5ParentPaths="/" />
   <hdf5:ExternalLink LinkName="external_link4" OBJ-XID="xid_18446744073709551611" H5SourcePath="/external_link4" TargetFilename="/content/tall.h5"  TargetPath="g1/g1.1"  Parents="xid_96" H5ParentPaths="/" />
   <hdf5:ExternalLink LinkName="external_link5" OBJ-XID="xid_18446744073709551610" H5SourcePath="/external_link5" TargetFilename="tall.subdir.test.hdfgroup.org"  TargetPath="g1/g1.1"  Parents="xid_96" H5ParentPaths="/" />
   <hdf5:ExternalLink LinkName="external_link6" OBJ-XID="xid_18446744073709551609" H5SourcePath="/external_link6" TargetFilename="tall.subdir"  TargetPath="g1/g1.1"  Parents="xid_96" H5ParentPaths="/" />
   <hdf5:ExternalLink LinkName="external_link7" OBJ-XID="xid_18446744073709551608" H5SourcePath="/external_link7" TargetFilename="subdir/tall.h5"  TargetPath="g1/g1.1"  Parents="xid_96" H5ParentPaths="/" />
   <hdf5:Group Name="g1" OBJ-XID="xid_800" H5Path="/g1" Parents="xid_96" H5ParentPaths="/" >
      <hdf5:Group Name="g1.1" OBJ-XID="xid_1832" H5Path="/g1/g1.1" Parents="xid_800" H5ParentPaths="/g1" >
      </hdf5:Group>
   </hdf5:Group>
   <hdf5:SoftLink LinkName="soft_link" OBJ-XID="xid_800" H5SourcePath="/soft_link" TargetPath="g1" TargetObj="xid_800" Parents="xid_96" H5ParentPaths="/" />
</hdf5:RootGroup>
</hdf5:HDF5-File>

Well, are you saying that I should look at the encoding attribute of the xml element? If so, I don’t think I can trust that. You said upthread that it’s not part of the spec nor possible to specify in the HDF5 file the encoding of the filename in an external link. Because of that, there’s no way that h5dump can know the encoding of the filename in an external link. My guess is that it just assumes UTF-8, but based on what you said, that’s just an assumption, and it could be wrong. For example, I could create an HDF5 file with an external link with a filename encoded in an encoding that is not UTF-8, and yet, if I’m understanding things correctly, I bet h5dump would incorrectly assume that the encoding is UTF-8.

Hi, @jlmuir, you may want to search (Ctrl+F) “UTF-8” in the following document:

HDF5 File Format Specification Version 3.0

I hope the above document can answer your question.

I haven’t tested anything related to filename character set and external link.
So, I’m very curious how your test HDF5 file will look like in XML.

Regards,