On Aug 23, 2012, at 4:23 AM, Biddiscombe, John A. wrote:
The new H5Dgather looks good, but I am concerned about one issue. The fact that we assume a common datatype between buffers. I am accustomed to ignoring datatype issues when working with hdf5 and simply get on with float/double etc and can happily read double into float or vice versa.
One project coming along will be using BlueGene and requires interoperability with another cluster attached which will be x86. Here we will have big/little endian issues and it’d be nice if some of the internals of hdf5 which handle this could be leveraged.
Is there any way we can use the routines for gather scatter and iteration that you have proposed in the last few messages in conjunction with datatype changes such as float/double/ long/int or big/little endian conversion?
From: firstname.lastname@example.org [mailto:email@example.com] On Behalf Of Quincey Koziol
Sent: 22 August 2012 20:27
Cc: HDF5 Virtual Object Layer (VOL) Discussions; firstname.lastname@example.org
Subject: Re: [H5vol] [Hdf5lib] RFC for new Dataspace routines
Mohamad and I kicked around another pair of routines that should meet your goals for the iterative scatter/gather routines, and I've tried to describe them below. Let me know what you think.
herr_t H5Dgather(hid_t src_space_id, const void *src_buf, hid_t type, size_t dst_buf_size, void *dst_buf, H5D_gather_func_t op, void *op_data);
typedef herr_t (*H5D_gather_func_t)(const void *dst_buf, size_t dst_buf_bytes_used, void *op_data);
The H5Dgather routine would gather [at most] dst_buf_size bytes from the source buffer, according to the selection in the source dataspace and [common] datatype, into the destination buffer, then call the application's op_data callback, giving the application a chance to "drain" the destination buffer. If more than dst_buf_size bytes worth of data are available in the source selection, H5Dgather would repeatedly call the application's callback routine.
herr_t H5Dscatter(size_t src_buf_size, void *src_buf, hid_t type, hid_t dst_space_id, void *dst_buf, H5D_scatter_func_t op, void *op_data);
typedef herr_t (*H5D_scatter_func_t)(void *src_buf, size_t *src_buf_bytes_used, void *op_data);
Similar to the H5Dgather routine, the H5Dscatter routine would call the application's op_data callback to fill up the source buffer with data (returning the number of bytes used in the source buffer through the src_buf_bytes_used parameter), and scatter those values into the destination buffer, according to the destination selection and the [common] datatype. Repeated calls to the application callback will be made if more than src_buf_size bytes worth of data is needed to fill the destination selection.
On Aug 22, 2012, at 8:56 AM, Biddiscombe, John A. wrote:
herr_t H5Dtransfer (hid_t src_space_id, const void *src_buf, H5T_t type, hid_t dst_space_id, /*out*/void *dst_buf);
I’m not certain that this will be any use on its own. If a selection is gigabytes and the VOL layer only has a small buffer available, then it needs to be able to make these transfers in pieces, performing puts/sends or whatever as appropriate. The function would be more useful if it was re-entrant and had a buffer size_type so that the selection could be copied by the first N, then next N, then next N until exhausted.
herr_t H5Sselect_iterate (hid_t dataspace_id, H5S_select_iterator_t op, void *op_data);
This function would be great and should be higher priority than the first because you can do the first using this second one - and don’t have the problem of the limited buffer size. The user can iterate over the dataspace and copy as much or as little on each entry to the callback function as desired and maintain their own book keeping of where they left off. If the selection has huge contiguous chunks, the user callback can break these into pieces and perform the appropriate copies as substeps. If the selection is very sparse, then an internal buffer can be filled and acted upon as the iterations progress.
the callback function
herr_t (*H5S_select_iterate_t)(hsize_t *offset_coords, hsize_t length, void * op_data);
could be improved by adding a user callback void *pointer so that when you call iterate – you pass a pointer to the function - and also a user pointer to a data structure of the user’s choice, which is passed to the callback as a user parameter. This way we can track intermediate transfer objects (like if we only partially transferred data or are filling an internal buffer) and in the case of multiple threads acting on these iterations, we can make sure each thread has its own data pointer an avoid static/global objects which will not be safe.
I just wrote this off the top of my head, so criticism welcome.
From: email@example.com [mailto:firstname.lastname@example.org] On Behalf Of Mohamad Chaarawi
Sent: 21 August 2012 23:45
To: HDF5 Virtual Object Layer (VOL) Discussions; email@example.com; firstname.lastname@example.org
Subject: [H5vol] RFC for new Dataspace routines
Please find attached an RFC that describes a couple of dataspace routines that we plan to add to the HDF5 API in the near future. If you have the time, please give it a read and feel free to send us comments.
We would like to hear from you if you see that you could benefit from those routines but would like to change something or would like us to consider adding other routines. It would be great, in either case, if you could include your use case.
Hdf5lib mailing list
H5vol mailing list