mmap might not be the best way to go in a many-core environment.
Mmap-ing a file means that page tables have to be updated which stops
all cores.
In the Table System of the casacore package it is possible to use mmap
or normal IO (using a cache like HDF5's chunk cache). For truely random
IO mmap outperforms normal IO because the OS keeps pages as long as
possible. But for known access patterns normal IO wins because it is
easier to optimize (e.g. by using asynchronous IO).
Note that HDF5 can be slow for accesses using small hyperslabs. I've
done some tests where I created a 3D chunked dataset. HDF5 used a lot of
user time when stepping line by line (either x, y, or z) through that
dataset. Casacore, which can also chunk large data sets, was much faster
in such a case.
So it makes sense to access larger chunks of data in HDF5, even though
HDF5 does the IO a chunk at a time.
Also note that IO can degrade severely when not accessing the data
sequentially. We found in some cases it was better to reorder the data
temporarily than to leapfrog through the file because the disk seeks
were killing performance. This was for files of several tens of GBytes
under Linux. If the file fits in the kernel's file cache, problems are
much smaller. In such a case it is better to access the file
sequentially once and thereafter access randomly.
Cheers,
Ger
Philip Winston <pwinston@gmail.com> 12/7/2010 3:52 AM >>>
I mean the code I gave you mmaps the file as a
whole, not individual datasets in the file. But, it nonetheless mmaps
UNDERNEATH the explicit reads/writes (e.g. H5Dread/H5Dwrite calls)
made
by the application. So, I am thinking this is nowhere near the
paradigm
you were hoping for.
I was hoping for a true mmap model. But now I see perhaps that is
impossible. mmap only works if what is in memory is identical to what's
on disk, for HDF5 endianness alone can break this assumption right? Plus
lots of other things like chunked datasets.
So for my situation one option is keep HDF5 around for interchange, but
for runtime "optimize" to a simple binary format where I can mmap the
entire dataset. Then I can just read/write anywhere and the OS takes
care of everything.
It's tempting to me, coming from a situation where everything is in RAM
today, it seems like the least work to continue to access randomly and
let the OS figured it out. But I don't know how smart it is. Maybe it is
kind of a red herring, like that would work but it would perform
horribly. Maybe coming from a situation where everything is in RAM, we
have to rethink things a lot to make it work off disk, to organize stuff
for coherence, so we can read big chunks instead of single rows.
My experience is that for simple queries (give me this hyperslab of
data), products like HDF5 are going to give better I/O performance
than
some RDBMS. But, if you are really talking about highly sophisticated
queries where future reads/writes depend upon other parts of the query
and the datasets being queried, that sounds more like an RDBMS than an
I/O library sort of thing. Just my two cents. Good luck.
Our data is essentially a tabular representation of a tree. Every row
is a node in the tree. There are 2-10 values in a row, but tens of
millions of rows. So in a sense our queries do depend on values as we
read them, because for example we'll read a value, find the children of
a node, read those values, etc. etc.
I imagine HDF5 being best for reading large amounts of data each time.
We would generally always be reading 1 row at a time. Set up one
hyperslab, tiny read, new hyperslab, tiny read.
We have other uses in mind for HDF5 but this particular type of a data
I wonder, maybe it's just not a good fit.
-Philip
···
On Mon, Dec 6, 2010 at 3:21 PM, Mark Miller <miller86@llnl.gov> wrote:
I am not sure if you got an answer to this email and so I
thought I
would pipe up.
Yes, you can do mmap if you'd like. I took HDF5's sec2 Virtual
File
Driver (VFD) and tweeked it to use mmap instead just to test
how
something like this would work. I've attached the (hacked)
code. To use
it, you are going to have to learn a bit about HDF5 VFDs.
Learn about
them in File Access Property lists,
http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html, as well as
http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html
It is something to start with. I don't know if HDF5 has plans
for
writing an mmap based VFD but they really ought to and it is
something
that is definitely lacking from their supported VFDs
currently.
Mark
On Fri, 2010-12-03 at 17:02, Philip Winston wrote:
> We just added HDF5 support in our application. We are using
the C
> API. Our datasets are 1D and 2D arrays of integers, a pretty
simple
> structure on disk. Today we have about 5GB of data and we
load the
> whole thing into RAM, do somewhat random reads, make
changes, then
> overwrite the old .h5 file.
>
> I only learned a very minimum amount of the HDF5 API to
accomplish the
> above, and it was pretty easy. Now we are looking at
supporting much
> larger datasets, such that it will no longer be practical to
have the
> whole thing in memory. This is where I'm confused on
exactly what
> HDF5 offers vs. what is up to the application, and on what's
the best
> way to do things in the application.
>
> Ideally in my mind what I want is an mmap like interface,
just a raw
> pointer which "magically" pages stuff off disk in response
to reads,
> and writes stuff back to disk in response to writes. Does
HDF5 have
> something like this, or can/do people end up writing
something like
> this on top of HDF5? Today our datasets our contiguous and
I assuming
> we'd want chunked datasets instead, but it's not clear to me
how much
> "paging" functionality chunked buys you and how much you
have to
> implement.
>
> Thanks for any ideas or pointers.
>
> -Philip
--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org