select_construct_projection in the VOL plugin

Hello,
I am developing a VOL plugin for Parallax LSM KV store (GitHub - CARV-ICS-FORTH/parallax: A persistent key-value store that is embeddable and optimized for fast storage.).

In dataset write/read operations, the memory dataspace and the file dataspace may have different ranks as long as the corresponding selections have the same elements.

The default implementation of the write-to-dataset operation of HDF5 uses the private function H5S_select_construct_projection to create an equivalent array to the file dataspace and perform the operation. My question is 1) Is there any equivalent function from the public API to do the transformation, and 2) if not, what the projection logic should be?
Thanks,
Giorgos

Hi Giorgos,

Sorry, I’m not familiar enough with library internals to answer your question (hopefully one of the library developers can address), but am interested to hear you are developing a key value store vol.

You might not be aware we have a fairly complete schema for an HDF5 KV store. See: https://github.com/HDFGroup/hsds/blob/master/docs/design/obj_store_schema/obj_store_schema_v2.md. This is the schema used by HSDS to store data for S3 and other object storage system.

Would it be possible to use the same schema with Parallax? I suspect you can reuse much of the REST VOL code (the JSON in the http requests that the REST VOL sends are similar to what get’s stored in the schema).

Anyway, let me know if you have any questions about this approach!

Are you trying to implement chunked dataset I/O or contiguous? For contiguous datasets there’s no need to transform the dataspaces, you can just iterate over each using H5Ssel_iter_get_seq_list() to get the offsets and lengths you need.

Chunked I/O is a somewhat difficult problem since there’s currently no one stop shop API function to handle this. The best resource right now is probably to look at how the DAOS VOL handles it in the function H5_daos_get_selected_chunk_info() in daos_vol_dset.c: https://github.com/HDFGroup/vol-daos/blob/master/src/daos_vol_dset.c

This function uses two paths: “shapesame” and the general case. As you noticed, there’s no public version of H5S_select_construct_projection(), which is used by the internal library in the shapesame case, so the DAOS VOL only uses the shapesame path if the ranks are the same, in which case it can construct the chunk memory space using H5Sselect_copy() (on the file chunk space) and H5Sselect_adjust() (to account for differences in selection offset between memory and file). It checks if the shapes are the same using H5Sselect_shape_same().

For the general case, the DAOS VOL uses H5Sselect_project_intersection() with the file space, memory space, and a selection of the entire chunk in the file space. The function handles the general case of the transformation you may have been thinking of. The internal library instead iterates element by element, but you probably don’t want to do that.

2 Likes