Creating a dataset through external link does not allow HDF5View to see resulting dataset from root file

That title isn’t the greatest, but here’s the general sense of it.

I have a bunch of HDF5 files all linked together through a root file. We’ll call the root file Root.h5 and the others File1.h5, File2.h5, … (The reasons for splitting it up basically come down to some specific use case stuff).

What I do is create Root.h5 and (I.E) File1.h5. The code to do so is the same for both and as shown below.

h5file = H5Fcreate(path.c_str(), H5F_ACC_EXCL, H5P_DEFAULT, H5P_DEFAULT);

I then create an external link from Root.h5:["/some/path/here"] to the root node of File1.h5. The idea being that now I can open Root.h5 and interact with it as if it were one monolithic file even though the data actually lives in another file.

hid_t lcpl_id = H5Pcreate(H5P_LINK_CREATE);
H5Pset_create_intermediate_group(lcpl_id, 1);
status = H5Lcreate_external(relative_output_sub_path.c_str(), "/", root_h5file, exported_group_path.c_str(), lcpl_id, H5P_DEFAULT);
status = H5Lcreate_external(relative_output_sub_proc_path.c_str(), "/", root_h5file, proc_group_path.c_str(), lcpl_id, H5P_DEFAULT);

A little bit later, I use the Root.h5 file handle to write my dataset to a subgroup of “here”, setting the flag to create intermediate groups in the process. I.E Root.h5:["/some/path/here/subgroup/data"]

hid_t lcpl_id = H5Pcreate(H5P_LINK_CREATE);
H5Pset_create_intermediate_group(lcpl_id, 1);

herr_t status;
hid_t dcpl_id = H5Pcreate(H5P_DATASET_CREATE);
// Set compression, chunking, etc
// Create the dataspace
hid_t dataspace = H5Screate_simple(dims.size(), dims.data(), NULL);
// "file_id" is Root.h5, and path is "/some/path/here/subgroup/data"
target_dset = H5Dcreate(file_id, path.c_str(), H5T_IEEE_F64LE, dataspace, lcpl_id, dcpl_id, H5P_DEFAULT);
// Now write the data
status = H5Dwrite(target_dset , H5T_NATIVE_FLOAT, H5S_ALL, H5S_ALL, H5P_DEFAULT, &data[0]);

Now then - this all succeeds just fine. The data is written to the correct file and if I open that subfile in HDFView the data is there. What’s weird, however, is if I open the root file in HDFView and traverse that same link, I can only see up to “subgroup” - “data” is invisible when viewed this way, even though it’s actually there (“subgroup” appears to have no children). If I go back and just write “/some/path/here/data” (I.E to the root of File1.h5 rather than a subgroup) then the data is visible when traversing through Root.h5

The data is accessible as expected as well. Opening Root.h5 in some other program (in my case, Julia) and requesting “/some/path/here/subgroup/data” returns the correct information. Additionally, making modifications to either the root element or “subgroup” using HDFView, such as adding attributes, by directly accessing “File1.h5” are reflected in Root.h5, or by accessing “subgroup” from Root.h5 are reflected in File1.h5. Additionally, opening Root.h5 in Julia and writing some data to “/some/path/here/subgroup/otherdata” is accessible as expected.

I’m guessing I must have some sort of error in how I’m setting my creation properties, but reading through the documentation I can’t see anything I might’ve forgotten to set, especially since creating data through Julia works fine. Am I missing something here, or could this be a bug?

Sounds like a bug in HDFView. Give us an MWE and we’ll get on the case! G.

So, after some more messing about it seems like this may be more complicated than I initially thought.

I did some debugging of my own. The issue seems to arise when creating intermediate groups from the root file. I.E - the groups created by

hid_t lcpl_id = H5Pcreate(H5P_LINK_CREATE);
H5Pset_create_intermediate_group(lcpl_id, 1);

If I write the data to the root group of the external file, I.E

Root.h5:["/some/path/here/data"]

This works fine. However, if I later create data in a subgroup where that subgroup is created automatically, I.E

Root.h5:["/some/path/here/subgroup_that_doesnt_exist/data"]

the data within subgroup_that_doesnt_exist is not visible within HDFView though it is accessible via Julia. Interestingly, further subgroups are visible in HDFView, though the data within them remains not viewable. Additionally, attributes set on any element of the secondary file aren’t visible and also have an apparent tendency to go missing, though I’m not really sure why.

I’ve confirmed the subgroup issue occurs when creating data using either C or Julia where subgroup(s) are automatically created, but I can create data manually using HDFView that remain visible.

I will post a MWE in my next post.

Huh. Well - now that I’ve posted that, I’m unable to recreate the issue in a MWE - at least off the bat. I’m having to write to the files in chunks, so I wonder if I’m messing something up in how I’m doing that or something…

Well, if I do get an MWE that shows the issue, I’ll be sure to post back!

@gheber - I finally got an MWE to demonstrate this issue. It seems to have to do with whether or not a dataset is written at the root of an external link. If no dataset is written, the link misbehaves. It’s worth mentioning that, in my experience, this seems to cause some actual issues with the HDF5 file - losing attributes at random, etc, so I’m not sure if this is limited to HDFView or not. I may well be doing something wrong here :stuck_out_tongue:

The MWE is below:

#include <iostream>
#include <vector>
#include "hdf5.h"

int main()
{
	/****************************************************/
	/* This creates two HDF5 files - one called root.h5 */
	/* and the other called subfile.h5. An external     */
	/* link is created from root.h5 to subfile.h5 and   */
	/* data is written to subgroups of subfile.h5 by    */
	/* traversing that link.                            */
	/* If the section below is left commented out       */
	/* accessing root.h5 in HDFView will show that      */
	/* there is only one group in the file and it is    */
	/* empty. However, opening subfile.h5 will show the */
	/* data is present as expected. Uncommenting the    */
	/* section below acts to write a dataset to the     */
	/* root of the subfile through the link, which      */
	/* allows the root file to display as expected in   */
	/* HDFView.                                         */
	/****************************************************/
	// Relevant file paths
	std::string root_file_path = "./root.h5";
	std::string sub_file_path = "./subfile.h5";
	// Location where the external link will be attached in the root file
	std::string sub_group_path = "/path/to/external";

	// The names and paths of various datasets that we will use to create data by traversing the link
	std::string first_dset_name = "root_dset";
	std::string first_group_name = "sub_group/other/groups/here";
	std::string subgroup_dset_name = "subgroup_dset";
	std::string attr_name;

	// File handles, etc
	hid_t root_h5file, sub_h5file;
	hid_t a_space = H5Screate(H5S_SCALAR);
	herr_t status;

	// Create both the root file and the sub file
	std::cout << "Creating root and sub hdf5 files";
	root_h5file = H5Fcreate(root_file_path.c_str(), H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
	sub_h5file = H5Fcreate(sub_file_path.c_str(), H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

	if ((root_h5file < 0) || (sub_h5file < 0)) {
		std::cout << "Failed to create one of the HDF5 files!" << std::endl;
		return -1;
	}
	// We don't need sub_h5file any more since all we needed was to just create the file. The rest of our access will be by traversing the link in root_h5file
	H5Fclose(sub_h5file);

	// Now attach an external (soft) link
	hid_t lcpl_id = H5Pcreate(H5P_LINK_CREATE);
	H5Pset_create_intermediate_group(lcpl_id, 1);
	status = H5Lcreate_external(sub_file_path.c_str(), "/", root_h5file, sub_group_path.c_str(), lcpl_id, H5P_DEFAULT);

	// Now let's add some data to the subfile via the root file
	// This is just some random data to write to the file
	std::vector<float> root_data({ 1.0, 2.0, 3.0, 4.0 });
	std::vector<float> sub_data({ 5.0, 6.0, 7.0, 8.0 });
	hsize_t dset_dims[2] = { 2, 2 };
	hid_t dset_space = H5Screate_simple(2, dset_dims, NULL);

	/***********************************************************************/
	/* UNCOMMENT THESE LINES TO HAVE THE DATA IN THE SUBFILE BE ACCESSIBLE */
	/* LEAVE THESE LINES COMMENTED TO HAVE DATASETS BE INACCESSIBLE        */
	/***********************************************************************/
	//std::cout << "Writing root dataset" << std::endl;
	//hid_t first_dset = H5Dcreate(root_h5file, (sub_group_path + "/" + first_dset_name).c_str(), H5T_IEEE_F64LE, dset_space, lcpl_id, H5P_DEFAULT, H5P_DEFAULT);
	//status = H5Dwrite(first_dset, H5T_NATIVE_FLOAT, H5S_ALL, dset_space, H5P_DEFAULT, &root_data[0]);
	
	// This tests writing data to a dataset located several subgroups down from the root of the file where none of the subgroups exist yet (and so should be created by lcpl_id)
	// Additionally, in that case, any groups created after the first are not visible from HDFView
	// The issue persists if writing to a group only one level down, I.E "/subgroup/data"
	std::cout << "Writing subdataset" << std::endl;
	hid_t sub_dset = H5Dcreate(root_h5file, (sub_group_path + "/" + first_group_name + "/" + subgroup_dset_name).c_str(), H5T_IEEE_F64LE, dset_space, lcpl_id, H5P_DEFAULT, H5P_DEFAULT);
	status = H5Dwrite(sub_dset, H5T_NATIVE_FLOAT, H5S_ALL, dset_space, H5P_DEFAULT, &sub_data[0]);

	// This is testing writing some attributes to the various datasets and groups we've created.
	std::cout << "Writing attributes to root group and subgroup" << std::endl;
	int test_attr[1] = { 1 };

	/* Uncomment these to create attributes on the dataset that is created on the root group.     *.
	/* Should be commented out if the above section is also commented as this dataset won't exist */
	//attr_name = "root_dset_attr";
	//hid_t attrib = H5Acreate(first_dset, attr_name.c_str(), H5T_NATIVE_INT, a_space, H5P_DEFAULT, H5P_DEFAULT);
	//status = H5Awrite(attrib, H5T_NATIVE_INT, &test_attr[0]);

	// Create an attribute on the link from the root file to the subfile (So either on the group that makes up the link, or the subfile's root group depending on how you look at it)
	hid_t sgrp = H5Gopen(root_h5file, sub_group_path.c_str(), H5P_DEFAULT);
	attr_name = "root_attr";
	hid_t attrib = H5Acreate(sgrp, attr_name.c_str(), H5T_NATIVE_INT, a_space, H5P_DEFAULT, H5P_DEFAULT);
	status = H5Awrite(attrib, H5T_NATIVE_INT, &test_attr[0]);

	// Now write one to the dataset that is located several groups down from the root of the subfile
	attrib = H5Acreate(sub_dset, attr_name.c_str(), H5T_NATIVE_INT, a_space, H5P_DEFAULT, H5P_DEFAULT);
	status = H5Awrite(attrib, H5T_NATIVE_INT, &test_attr[0]);

	std::cout << "Closing file handles" << std::endl;
	//H5Dclose(first_dset); // Uncomment this if you uncomment the above lines too
	H5Pclose(lcpl_id);
	H5Fclose(root_h5file);
	H5Dclose(sub_dset);
	H5Sclose(a_space);
	H5Aclose(attrib);
	H5Gclose(sgrp);
	H5Sclose(dset_space);
	std::cout << "Done.";
	return 0;
	
}

Let me know if I can provide other input, or if you can see what’s wrong here!

I haven’t tried HDFView, but h5dump and h5ls are both happy, and I can see what’s been created. You don’t properly close the attrib handles in your program, but otherwise, I don’t see anything wrong with your MWE. With all the trimmings, h5dump shows this:

gerd@guix ~/scratch/run$ h5dump  root.h5 
HDF5 "root.h5" {
GROUP "/" {
   GROUP "path" {
      GROUP "to" {
         EXTERNAL_LINK "external" {
            TARGETFILE "./subfile.h5"
            TARGETPATH "/"
               GROUP "/" {
                  ATTRIBUTE "root_attr" {
                     DATATYPE  H5T_STD_I32LE
                     DATASPACE  SCALAR
                     DATA {
                     (0): 1
                     }
                  }
                  DATASET "root_dset" {
                     DATATYPE  H5T_IEEE_F64LE
                     DATASPACE  SIMPLE { ( 2, 2 ) / ( 2, 2 ) }
                     DATA {
                     (0,0): 1, 2,
                     (1,0): 3, 4
                     }
                     ATTRIBUTE "root_dset_attr" {
                        DATATYPE  H5T_STD_I32LE
                        DATASPACE  SCALAR
                        DATA {
                        (0): 1
                        }
                     }
                  }
                  GROUP "sub_group" {
                     GROUP "other" {
                        GROUP "groups" {
                           GROUP "here" {
                              DATASET "subgroup_dset" {
                                 DATATYPE  H5T_IEEE_F64LE
                                 DATASPACE  SIMPLE { ( 2, 2 ) / ( 2, 2 ) }
                                 DATA {
                                 (0,0): 5, 6,
                                 (1,0): 7, 8
                                 }
                                 ATTRIBUTE "root_attr" {
                                    DATATYPE  H5T_STD_I32LE
                                    DATASPACE  SCALAR
                                    DATA {
                                    (0): 1
                                    }
                                 }
                              }
                           }
                        }
                     }
                  }
               }
         }
      }
   }
}
}

and

gerd@guix ~/scratch/run$ h5dump subfile.h5 
HDF5 "subfile.h5" {
GROUP "/" {
   ATTRIBUTE "root_attr" {
      DATATYPE  H5T_STD_I32LE
      DATASPACE  SCALAR
      DATA {
      (0): 1
      }
   }
   DATASET "root_dset" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SIMPLE { ( 2, 2 ) / ( 2, 2 ) }
      DATA {
      (0,0): 1, 2,
      (1,0): 3, 4
      }
      ATTRIBUTE "root_dset_attr" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 1
         }
      }
   }
   GROUP "sub_group" {
      GROUP "other" {
         GROUP "groups" {
            GROUP "here" {
               DATASET "subgroup_dset" {
                  DATATYPE  H5T_IEEE_F64LE
                  DATASPACE  SIMPLE { ( 2, 2 ) / ( 2, 2 ) }
                  DATA {
                  (0,0): 5, 6,
                  (1,0): 7, 8
                  }
                  ATTRIBUTE "root_attr" {
                     DATATYPE  H5T_STD_I32LE
                     DATASPACE  SCALAR
                     DATA {
                     (0): 1
                     }
                  }
               }
            }
         }
      }
   }
}
}

I believe both are in order.

Which version of HDFView are you using and what’s the OS?

I am using HDF5 V1.12.0, and HDFView 3.1.1 on Windows 10 (Not my preference but where I’m stuck for the moment :P)

I have confirmed that h5dump also gives the expected output:

C:\<snip>\HDF5-1.12.0-win64\bin>h5dump.exe ..\..\root.h5
HDF5 "..\..\root.h5" {
GROUP "/" {
   GROUP "path" {
      GROUP "to" {
         EXTERNAL_LINK "external" {
            TARGETFILE "./subfile.h5"
            TARGETPATH "/"
               GROUP "/" {
                  ATTRIBUTE "root_attr" {
                     DATATYPE  H5T_STD_I32LE
                     DATASPACE  SCALAR
                     DATA {
                     (0): 1
                     }
                  }
                  GROUP "sub_group" {
                     GROUP "other" {
                        GROUP "groups" {
                           GROUP "here" {
                              DATASET "subgroup_dset" {
                                 DATATYPE  H5T_IEEE_F64LE
                                 DATASPACE  SIMPLE { ( 2, 2 ) / ( 2, 2 ) }
                                 DATA {
                                 (0,0): 5, 6,
                                 (1,0): 7, 8
                                 }
                                 ATTRIBUTE "root_attr" {
                                    DATATYPE  H5T_STD_I32LE
                                    DATASPACE  SCALAR
                                    DATA {
                                    (0): 1
                                    }
                                 }
                              }
                           }
                        }
                     }
                  }
               }
         }
      }
   }
}
}

C:\<snip>\HDF5-1.12.0-win64\bin>h5dump.exe ..\..\subfile.h5
HDF5 "..\..\subfile.h5" {
GROUP "/" {
   ATTRIBUTE "root_attr" {
      DATATYPE  H5T_STD_I32LE
      DATASPACE  SCALAR
      DATA {
      (0): 1
      }
   }
   GROUP "sub_group" {
      GROUP "other" {
         GROUP "groups" {
            GROUP "here" {
               DATASET "subgroup_dset" {
                  DATATYPE  H5T_IEEE_F64LE
                  DATASPACE  SIMPLE { ( 2, 2 ) / ( 2, 2 ) }
                  DATA {
                  (0,0): 5, 6,
                  (1,0): 7, 8
                  }
                  ATTRIBUTE "root_attr" {
                     DATATYPE  H5T_STD_I32LE
                     DATASPACE  SCALAR
                     DATA {
                     (0): 1
                     }
                  }
               }
            }
         }
      }
   }
}
}

Although HDFView gives the incorrect output

image

If I uncomment the lines in my MWE (not changing anything else), it does give the correct output

image

Would you mind trying HDF5 3.1.2? I don’t see anything in the release notes that seems related to this issue (except an obscure reference “Fixed a bug with changing the working directory.” ???), but let’s try this before it’s time for plan B.

G.

The issue persists with HDFView 3.1.2.

Without root dataset:
image

With root dataset:
image

Version info:
image

I just wanted to follow up on this. Has there been any progress/a bug ticket that I could follow?

Keep and eye on https://jira.hdfgroup.org/browse/HDFVIEW-267
Feel free to add comments or correct my description.

Thanks, G.

It appears that I do not have permission to view that bug? I get an error saying it’s either been deleted or I don’t have permission to see it, and searching using the dashboard doesn’t turn it up either.

There was a permissions issue. Would you please try that link again? G.

Yep, I can see it now, and I can confirm that your description of the issue looks accurate. Thank you! I’ll keep an eye on that bug for further updates.