Hello,
Currently, we have written a library on top of the HDF5 library for store large models. Recently, we have implemented support for Read-Only S3 (ROS3) in this library by leveraging the ROS3 feature of HDF5 library. So far this works well. However, since ROS3 has its limitations a nature step is to switch to HSDS, which allows us to directly write data to S3. This requires that we need to make use of VOL-REST connector. So first step is we need to make our library compatible with this connector.
Looking at github repo, I found the VOL-REST user guide: vol-rest/docs/users_guide.pdf at master · HDFGroup/vol-rest (github.com). Is this guide still accurate regarding the supported and unsupported API functions? If so, do you have any advise about how to replace the function H5Fis_accessible() and H5Dset_extent() with something that is compatible with the VOL-REST connector?
Best regards,
Jan-Willem
The users’ guide is out of date at the moment. Everything it says is supported is still supported, but new functionality has been added. H5Fis_accessible
is now supported and can be used directly.
H5Dset_extent
is still unsupported in the REST VOL, and in ROS3 to my knowledge. Until it’s implemented, one workaround may be to create datasets with higher sizes initially, depending on your specific use case.
Great that now H5Fis_accessible()
is supported. Most likely, this has saved a lot of time thinking and implementing a workaround for it. Thanks for your suggestion about the workaround for H5Dset_extent()
Actually, there are 4 more functions we use that are listed as unsupported in the users guide. They are less important but it would be nice if they are currently supported. Is it difficult to find in the source of VOL-REST
the list of supported and/or unsupported API functions? If not, can you give me a hint of where I can find this list?
It’s easy to see if an API call is unsupported in the source code by searching the API function name. For example, here’s the source code that indicates H5Dflush
is currently unsupported:
/* H5Dflush */
case H5VL_DATASET_FLUSH:
FUNC_GOTO_ERROR(H5E_DATASET, H5E_UNSUPPORTED, FAIL, "H5Dflush is unsupported");
break;
All of the unsupported API calls follow this approximate pattern.
2 Likes
Hello,
@mlarson, thanks your help. With this I was able to make a short list of functions we use in our library which are unsupported by VOL-REST
:
H5Dset_extent
H5Ocopy
H5Lmove
H5Dget_storage_size
I noticed that mattjala (Github) make an Github issue for H5Dset_extent
. Thanks for putting it on the list. Would it be possible to also implement to other listed functions or do some of them make no sense in the context of a REST API? By the way, the listed functions are order by the priority.
Furthermore, I experienced it is not so straightforward to include VOL-REST
in a CMake project using for example find_package()
. I will take a closer look at current VOL-REST
CMake configuration and try to improve it. My plan is to upstream these improvements to you again.
2 Likes
H5Dset_extent
makes sense and is planned for implementation. Its only limitation (on HSDS’s side) is that will not be able to decrease the size of a dataset.
H5Ocopy
and H5Lmove
should make sense, and I’ll add issues for them.
H5Dget_storage_size
may be more complicated to implement. Do you have a use case for this function that isn’t captured by using H5Sget_simple_extent_npoints()
and H5Tget_size()
to figure out the amount of memory needed to work with a dataset?
There is a known CMake issue where the build will fail if the HDF5 library isn’t prebuilt. There’s currently a PR to fix this, which makes the build_vol_cmake.sh
script useable. I haven’t tried including the VOL as a package in other projects before, so I can’t speak to the issues you may encounter there. Thank you for contributing!
Thanks, for putting those functions on the list.
My library also does not support decreasing a size of a dataset. This is actually very tricky and not trivial at all. Often it means one needs to move data to some different position in the dataset which may result in all kind of performance issues.
The support of H5Dset_storage_size()
is more a nice to have support. Our use case is that our users would like to know the size of the dataset on disc. Without compression it is more or less similar as H5Sget_simple_extent_npoints()
in combination with H5Tget_size()
but with compression it can be quite different. From those numbers one can compute a compression factor which is of interest for our users as well as for our HPC engineers. Talking about compression I assume HSDS supports compressed HDF5 datasets.
Regarding the encountered CMake issues. No worries, I will figure it out and will upstream the improvements. I have done as similar thing for the H5Z-ZFP filter and with the help of you, the HDF group, it is now a full-fledged CMake build system included tests using CTest.
1 Like
I’ve added issues to HSDS to support H5Dset_extent and H5Ocopy:
We’ll try to get these into the next release.
I think it should be possible to add support for H5Dget_storage_size and H5Lmove in the REST VOL without any HSDS changes.
@jreadey Thanks for adding these items to the HSDS GitHub issue list.
I do have a small question. The issue states reducing the dataset shape extent. May I assume that increasing the extent will also be supported? Or is this already possible?
Increasing the size is supported in HSDS. Not sure if this is piped through in the REST VOL. Matt - can you verify?
Yes, the REST VOL’s implementation of H5Dset_extent
supports increasing the size.