Help Needed For Finding Minimum/Maximum

EXI-Manoharan_Dhilip · October 4, 2014, 12:27am

All,

I need help in finding minimum/maximum of values that are stored in a dataset.
I currently do it by reading the entire dataset and finding the minimum/maximum which is really slow when the datavalues are more(say more than 2000000).
Is there any way to find them using H5 java API?
Appreciating your help in this.

Thanks,
Dhilipan M
Boeing FTCS
Desk:2066626488
Mob:2066694758

miller86 · October 4, 2014, 1:54am

A few thoughts come to mind. But, I don't know if they'll be useful.

First, I assume you are doing this for a read-mostly (or maybe read-only) scenario. That is, the datasets are already in the file and now you want to find mins/maxs.

If I/O is your bottleneck (e.g. cause for the slowness), then I can't imagine anything going faster than a single H5Dread (C interface) and scan of all the data values in one fell swoop. And, I cannot imagine H5 Java API would do any better.

Is the dataset compressed and/or chunked in the file? If not, maybe you can adjust the data-producer to ensure that it is. Compressed data would be read faster.

If I/O is NOT the bottleneck, then its just the compute time spent finding the min/max. This would be a very simple operation to multi-thread (or GPU-ize) though. Is that an option?

If you have control over how the data is initially written, why not compute the min/max whe the dataset is written using a filter much like the checksum filter? It can scan all values on write, compute min/max and then store them as metadata with the dataset. Then, finding them later during read is of course trivial. And, it would avoid you doing the min/max scan repeateadly for different readers, etc.

Don't know if any of that might be useful but maybe it triggers some ideas for better strategies.

Mark

···

From: <EXI-Manoharan>, Dhilipan <dhilipan.manoharan@boeing.com<mailto:dhilipan.manoharan@boeing.com>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Friday, October 3, 2014 5:27 PM
To: "hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>" <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] Help Needed For Finding Minimum/Maximum

All,

I need help in finding minimum/maximum of values that are stored in a dataset.
I currently do it by reading the entire dataset and finding the minimum/maximum which is really slow when the datavalues are more(say more than 2000000).
Is there any way to find them using H5 java API?
Appreciating your help in this.

Thanks,
Dhilipan M
Boeing FTCS
Desk:2066626488
Mob:2066694758

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

Help Needed For Finding Minimum/Maximum