h5sort?

Hi HDF5!

The HDF5 tool set is extremely useful (
http://hdf.ncsa.uiuc.edu/products/hdf5_tools/#h5dist). Recently, I've
found the need to sort the elements of one-dimensional datasets. I am
currently using sort on an ASCII file and then importing the sorted contents
into HDF5 files. I'd like to know what you think about developing an h5sort
tool.

Has anyone implemented a sorting algorithm on top of the HDF5 library? I'd
bet HDF5's highly-optimized memory management and slicing capabilities could
sort datasets faster than raw text files. An h5sort could exploit type
information whereas ASCII sorting is inherently type less.

GNU's sort utility can sort files larger than available memory by using
temporary disk space. What's the best way to deal with this in HDF5?
Memory usage depends on the sorting algorithm, whether the algorithm sorts
in place or not, and on the ordering of the input data. It seems that,
provided with a hint on how sorted the data is, we could choose to use
quicksort, mergesort or even smoothsort (O(nlogn) max, O(n) for
nearly-sorted) (
http://en.wikibooks.org/wiki/Algorithm_implementation/Sorting/Smoothsort).

-Igor

Hello Igor,

I'm glad you find the tool set useful. I'm not aware of anyone who has implemented a sorting algorithm on the HDF5 library, but it's possible that someone has. I hope readers of the forum will let us know if they are familiar with such capabilities on top of HDF5.

The HDF Group would be interested in developing an h5sort tool, but would need funding to do so. If you're interested, feel free to contact me to discuss further.

Finally, I encourage people to use the URL for The HDF Group, rather than going through the redirection of the NCSA website that you posted, as we can't guarantee that link will be valid forever. The preferred URL for the tools is http://hdfgroup.org/products/hdf5_tools/#h5dist

-Ruth

ยทยทยท

On Jul 14, 2008, at 10:24 PM, Igor Sylvester wrote:

Hi HDF5!

The HDF5 tool set is extremely useful (http://hdf.ncsa.uiuc.edu/products/hdf5_tools/#h5dist). Recently, I've found the need to sort the elements of one-dimensional datasets. I am currently using sort on an ASCII file and then importing the sorted contents into HDF5 files. I'd like to know what you think about developing an h5sort tool.

Has anyone implemented a sorting algorithm on top of the HDF5 library? I'd bet HDF5's highly-optimized memory management and slicing capabilities could sort datasets faster than raw text files. An h5sort could exploit type information whereas ASCII sorting is inherently type less.

GNU's sort utility can sort files larger than available memory by using temporary disk space. What's the best way to deal with this in HDF5? Memory usage depends on the sorting algorithm, whether the algorithm sorts in place or not, and on the ordering of the input data. It seems that, provided with a hint on how sorted the data is, we could choose to use quicksort, mergesort or even smoothsort (O(nlogn) max, O(n) for nearly-sorted) (http://en.wikibooks.org/wiki/Algorithm_implementation/Sorting/Smoothsort).

-Igor

------------------------------------------------------------
Ruth Aydt
The HDF Group
1901 South First Street, Suite C-2
Champaign, IL 61820

aydt@hdfgroup.org
(217)265-7837 (office) (217)333-9049 (fax)
------------------------------------------------------------