paging approaches

Hm no. I just used HDF5 as it is, as the application needs to be
cross-platform Linux, MacOS, Windows. Does the HDF5 API already support
using such hints? It seemed ok to have two levels of caching in my
case (such as application-internal and HDF5/OS internal), as the entire
system was supposed to stay at 30% overall RAM usage anyway to leave space
for other applications running on the same machine. Probably those direct I/O
parameters could help to optimize this somewhat and make more efficient
usage of the available memory, but so far there was no urgent need for it.
Might be worth consideration on future improvements, though.

  Werner

···

On Tue, 07 Dec 2010 21:59:41 +0100, Mark Miller <miller86@llnl.gov> wrote:

On Tue, 2010-12-07 at 12:53 -0800, Werner Benger wrote:

I've been facing similar issues, like scanning through 500GB HDF5
files with 32GB available RAM.
That works ok with HDF5, but I've implemented my own memory management
strategy to tell
it which parts to keep in memory and which parts to unload (of course,
using random access to
the datasets). Turned out that the "remove the least-used-object"
strategy is not necessarily
the best one (as the OS would follow with pages), but some
classification on similarity of objects
that are kept or to be discarded from memory seems much more
efficient.

Out of curiosity, did you consider/need to use
posix_fadvise/posix_madvise to give the OS a hint that the application
is managing the memory/caching and so the OS should NOT attempt to?
(another way of doing this I guess is using 'direct I/O' -- O_DIRECT,
though that is not available very many places).

Mark

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

I just came upon this project, I was wondering if anyone has any experience
with it good or bad:

STXXL: Standard Template Library for Extra Large Data sets
http://stxxl.sourceforge.net/

Its goals are quite different from HDF5 certainly, but it seems to have more
of a "paging engine" builtin than HDF5 does. So if I don't care so much
about portability or sharing my data, but just want a fast runtime which can
page, I wonder if this is worth considering.

I'm open to building something simple on top of HDF5 as well, but just
looking at options.

-Philip

···

On Wed, Dec 8, 2010 at 4:16 AM, Francesc Alted <faltet@pytables.org> wrote:

A Tuesday 07 December 2010 21:37:51 Philip Winston escrigué:
> > One final piece of warning: when you use mmap beyond the extend of
> > your RAM, you will end swapping out many data (shared libraries,
> > other processes) that might be important for the performance of
> > your computer.
>
> That is too bad you can't tell the OS to page-back your own file
> instead. That is if I have 24GB of RAM with 12GB available, and I
> mmap a 100GB file, I'd like it kind of churn through that 12GB of
> RAM with my stuff and leave everyone else alone. But I guess you
> don't have that control.

Exactly, but I also think that you cannot control this.

--
Francesc Alted

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Does the HDF5 API already support
using such hints? It seemed ok to have two levels of caching in my
case (such as application-internal and HDF5/OS internal), as the entire
system was supposed to stay at 30% overall RAM usage anyway to leave space
for other applications running on the same machine. Probably those direct
I/O
parameters could help to optimize this somewhat and make more efficient
usage of the available memory, but so far there was no urgent need for it.
Might be worth consideration on future improvements, though.

Well, I thought the posix_fadvise and O_DIRECT stuff in theory should
make the I/O itself FASTER as it eliminates the extra level of caching.

No, HDF5 lib proper doesn't offer controls for
posix_fadvise/posix_madvise. You have to handle via VFD's.

···

On Tue, 2010-12-07 at 13:33 -0800, Werner Benger wrote:

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511