Proposal to change return value behavior of H5Fclose


#1

This is just an informal survey of HDF5 users regarding a request in changing default behavior of serial H5Fclose() to H5F_CLOSE_SEMI. As you may or may not know, by default H5Fclose() can return non-negative exit status (indicating success), yet still leave the file on disk actually open. This happens if the caller has (accidentally) left any objects in the file open. I would like to see this default behavior change and instead have H5Fclose() return negative exit status (indicating error), under these conditions. Either way, the file on disk is still left open. The difference is whether the caller gets any indication that something may have gone wrong with closing the file.

My understanding is that this could negatively impact libraries that use HDF5 as middleware (e.g. CGNS, netCDF, Silo, etc.) and where multiple different layers of software may have called H5Fopen() on the same file system path. Each call returns a unique hid_t HDF5 file handle and a later H5Fclose() by default, currently returns non-negative (success) because it has merely freed the associated hid_t resource and has not necessarily actually closed the file. However, I worry this use case is far less common than the one described above, where caller accidentally leaves open objects in the file but believes its been successfully closed because H5Fclose() returned non-negative (success) status.


#2

For me that is the widely used behavior to “extract” HDF5 id’s from various objects in a file, and to H5Fclose() it as soon as it has been iterated over initially. Then the application only works with the HDF5 id’s, regardless to which file they belong, and it can always retrieve data based on HDF5 id’s, basically forgetting about the “concept of a file” entirely, since those id’s can stem from one or many files, and it does not matter at all. So in my case, the ID’s are intentionally left open.

However, I wouldn’t have much of a problem with setting the H5Pset_fclose_degree() explicitly to H5F_CLOSE_WEAK . The question remains whether such habits should be encouraged to the HDF5 users… I consider it a feature, not a bug, that HDF5 id’s remain open after file closure. It allows for abstraction of objects beyond files.


#3

In my interpretation a successful close means all resources are freed in reverse topological order of all dependencies if any, restoring the state to the one before open was called. Any other behaviour should be consider a soft error indicating that the close operation didn’t succeed. The strength of this interpretation is considering the context the operator executed in: clean state -> success, rest is error.

An alternative interpretation of the close operation signalling error is if something went wrong during the execution of the subroutine. In this interpretation the call succeeded, there was no error during the runtime of the routine. The weakness of this interpretation is it doesn’t consider the state of the context it operates within and leaving allocated resources behind is not an error.

I am in favour of the first interpretation – supporting @miller86 request but I also understand the possibility of negative impact. From H5CPP library perspective the impact is minimal, if any.


#4

The idea that I mentioned to Mark earlier in the week is along the lines of Werner’s comment: Calling H5Fclose on a file ID when there are open objects in the file should succeed, as that file ID has successfully been closed. Access to the file may continue through other file or object IDs, since they have not been closed (yet). The itself will be closed when the ref count on it, from all open IDs, drops to zero. This is the behavior that best supports possible simultaneous access to a file from the application and from “third-party” libraries, like netCDF, etc.


#5

"The itself will be closed…” => “The file itself will be closed…”


#6

I see multiple problems with this intepretation. First, the equivalent argument is not made for H5Fcreate/H5Fopen. HDF5 library interface relies upon many metaphors to previous file API technologies (links, symbolic and hard, directories or groups with paths delimited by / chars, mounting and unmounting files within files, etc.) Extending that metaphor, wouldn’t we expect H5Fopen/H5Fclose to behave similarly to open/close or fopen/fclose? The problem(s) come in when the library allows (even encourages) callers to open a file multiple times. At that beginning of interacting with a file, the notion of successfully obtaining a file id is bound with the second notion of actually having an open file. There the interface does nothing to suggest those two things are somehow independent steps. Only in the close operation is this possibility suggested. Finally, while I do find earlier remarks about what the current behavior makes possible in the way of abstraction interesting, those sound to me like only curiosities which are not actually based any real/actual uses of the library by typical users. I think the interface should make the common things easy and the uncommon or hard things possible. The current behavior IMHO makes the uncommon case center stage leaving the majority of users to easily trip over this odd behavior. I’ve had the problem many times myself and I frequently encounter other new users have the same issue. I also recall that for many years and versions I would frequently see H5Fclose report some kind of an error about encountering an inf. loop trying to close all open objects with H5F_CLOSE_STRONG which suggests it was a very brittle bit of code.

An additional option/part of this change could be to by default disallow the multiple-open case (which also think is the more common use case) and callers needing that behavior would have to exlicitly set a property to enable it. Yes, existing code relying on current default would break. But, it could do so in a very obvious way alerting them of the change in behavior and the need to adjust their code.


#7

I prefer to think about HDF5 as a container for objects, rather than a traditional, sequential file - and the current behavior encourages that kind of thinking. Like when traversing a file system with a file explorer, I don’t need to keep the file explorer open for a parent directory when entering a subfolder; the parent directory is just there as means to access the subfolder and files, same as the HDF5 file merely needs to be opened to access the objects in there. But with all the features such as linking and mounting external files, various VFD and VOL’s, the notion of an “physical file” sounds like an outdated concept of the 1970’s anyway and thinking in terms of reference-counted objects, pretty common in modern C++, is more up to date. I like it that HDF5 encourages this way of thinking about its objects even on the C level, where admittedly it is unusual. I agree it is annoying in C to keep track of all resources, and to equip each construction with destruction manually. So it is convenient that the HDF5 library offers a mode to just close all resources implicitly automatically as well; but also C coders “should” free their resources after allocating them.


#8

@miller86, I am confused by the documentation for H5Fclose() and H5Pset_fclose_degree(). Precisely what is the current default setting of the file close degree property, fc_degree, that you propose to change? Is it H5F_CLOSE_WEAK, H5F_CLOSE_DEFAULT, or something else? I do not see this as clearly spelled out anywhere in the current on-line documentation.


#9

@dave.allured…to be honest, I think your question hightlites another serious issue with the current state of affairs. Not only is the default behavior bizarre…the documentation about it is not sufficiently organized to ensure most readers will be alerted.

To answer your question…the current default behavior is H5F_CLOSE_WEAK. So, the return value of H5Fclose() can indicate success yet the file on disk has not actually been closed. This will happen when any object in the file is left open, a fairly common mistake (IMHO it is a mistake. However, @werner description of desired behavior, above, suggests its one way to abstract away from the notion of a file.


#10

@miller86, thanks, that answers my question. Consider that a side issue, and also a request to improve the documentation for this one detail, namely the default behavior of the file close degree property.

I favor your proposed change to H5F_CLOSE_SEMI as the default. I think that is the most conservative strategy. This goes to protecting the naive user from obscure bugs at file closing time, as well as to exposing hidden problems in sophisticated code.

I do not see a serious problem for existing code and wrapper libraries. Expert usage can set H5F_CLOSE_WEAK at file open time, when the previous behavior is needed. However, the change to H5F_CLOSE_SEMI could cause a backward behavioral incompatibility for some existing code. A warning in the release notes would be in order.


#11

As another point, we also found this to be an issue years ago and wrote our own wrapper to close files.

inline herr_t closeFile(hid_t& fileID)
{
H5SUPPORT_MUTEX_LOCK()

herr_t err = 1;
if(fileID < 0) // fileID isn’t open
{
return 1;
}

// Get the number of open identifiers of all types
// except files
ssize_t numOpen = H5Fget_obj_count(fileID, H5F_OBJ_DATASET | H5F_OBJ_GROUP | H5F_OBJ_DATATYPE | H5F_OBJ_ATTR | H5F_OBJ_LOCAL);
if(numOpen > 0)
{
std::cout << “WARNING: Some IDs weren’t closed. Closing them.” << std::endl;
std::vector<hid_t> attributeIDs(numOpen, 0);
H5Fget_obj_ids(fileID, H5F_OBJ_DATASET | H5F_OBJ_GROUP | H5F_OBJ_DATATYPE | H5F_OBJ_ATTR, numOpen, attributeIDs.data());
std::array<char, 1024> name;
for(const auto& id : attributeIDs)
{
name.fill(0);
ssize_t charsRead = H5Iget_name(id, name.data(), name.size());
if(charsRead < 0)
{
std::cout << “Error Trying to get the name of an hdf object that was not closed. This is probably pretty bad. " << FILE << “(” << LINE << “)” << std::endl;
return -1;
}
std::cout << “H5 Object left open. Id=” << id << " Name=’” << name.data() << “’” << std::endl;
H5Utilities::closeHDF5Object(id);
}
}

err = H5Fclose(fileID);
if(err < 0)
{
std::cout << "Error Closing HDF5 File. " << err << std::endl;
}
fileID = -1;
return err;
}

Not the cleanest or most elegant code in the world but does protect us from the issue.


#12

Clever code. Thanks for sharing. The VisIt project did something vaguely similar. We have 130+ database plugins (shared libs), many of which interact with HDF5 files. But, coders of those plugins invariably run into this issue. The trouble is, one poor behaving database plugin in VisIt can ruin it for every other plugin so we adopted an approach that uses the C Preprocessor to override their calls to H5Fopen and forces H5F_CLOSE_SEMI behavior if it hadn’t been coded that way.


#13

I would caution against this idea. If you look at file access within C and C++ if you “fclose()” or “stream.close()” then access to that file is not allowed (or would throw errors). Since HDF5 since the beginning has been a File Format where we use HDF5 like a high level way to write files I would not deviate from the norm of these often used concepts. Just my 2 cents.


#14

Thanks for your comments @mike.jackson. But, I honestly cannot determine from your response if you are in favor of the proposed change (which would mean that HDF5 behaves more like the C/C++ examples you cite) or are you against this change?


#15

A change that acts more like C/C++ standard IO functions. I guess I was not really clear.