paging approaches

We just added HDF5 support in our application. We are using the C API. Our
datasets are 1D and 2D arrays of integers, a pretty simple structure on
disk. Today we have about 5GB of data and we load the whole thing into RAM,
do somewhat random reads, make changes, then overwrite the old .h5 file.

I only learned a very minimum amount of the HDF5 API to accomplish the
above, and it was pretty easy. Now we are looking at supporting much larger
datasets, such that it will no longer be practical to have the whole thing
in memory. This is where I'm confused on exactly what HDF5 offers vs. what
is up to the application, and on what's the best way to do things in the
application.

Ideally in my mind what I want is an mmap like interface, just a raw pointer
which "magically" pages stuff off disk in response to reads, and writes
stuff back to disk in response to writes. Does HDF5 have something like
this, or can/do people end up writing something like this on top of HDF5?
Today our datasets our contiguous and I assuming we'd want chunked datasets
instead, but it's not clear to me how much "paging" functionality chunked
buys you and how much you have to implement.

Thanks for any ideas or pointers.

-Philip

I am not sure if you got an answer to this email and so I thought I
would pipe up.

Yes, you can do mmap if you'd like. I took HDF5's sec2 Virtual File
Driver (VFD) and tweeked it to use mmap instead just to test how
something like this would work. I've attached the (hacked) code. To use
it, you are going to have to learn a bit about HDF5 VFDs. Learn about
them in File Access Property lists,
http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html, as well as

http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html

It is something to start with. I don't know if HDF5 has plans for
writing an mmap based VFD but they really ought to and it is something
that is definitely lacking from their supported VFDs currently.

Mark

H5FDsec2_w_mmap.c (31.6 KB)

···

On Fri, 2010-12-03 at 17:02, Philip Winston wrote:

We just added HDF5 support in our application. We are using the C
API. Our datasets are 1D and 2D arrays of integers, a pretty simple
structure on disk. Today we have about 5GB of data and we load the
whole thing into RAM, do somewhat random reads, make changes, then
overwrite the old .h5 file.

I only learned a very minimum amount of the HDF5 API to accomplish the
above, and it was pretty easy. Now we are looking at supporting much
larger datasets, such that it will no longer be practical to have the
whole thing in memory. This is where I'm confused on exactly what
HDF5 offers vs. what is up to the application, and on what's the best
way to do things in the application.

Ideally in my mind what I want is an mmap like interface, just a raw
pointer which "magically" pages stuff off disk in response to reads,
and writes stuff back to disk in response to writes. Does HDF5 have
something like this, or can/do people end up writing something like
this on top of HDF5? Today our datasets our contiguous and I assuming
we'd want chunked datasets instead, but it's not clear to me how much
"paging" functionality chunked buys you and how much you have to
implement.

Thanks for any ideas or pointers.

-Philip

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

Thanks for the info and code!

Given this mmap VFD isn't yet part of the library, I'm wondering does anyone
do what we're talking today, with the existing HDF5 library? To summarize
we have a dataset that doesn't fit in memory. We want to "randomly" perform
reads, reading only portions into RAM. Then we make changes in RAM. Then
we want to write out only the changed portions.

I'm guessing a chunked file is the starting point here, but what else is
needed? Is there a layer on top to coordinate things? To hold a list of
modified chunks?

Is it even a good idea to attempt this usage model with HDF5? I read one
person suggest using HDF5 is good for bulk read-only data but that he would
use a database for "complex" data that requires changes. I wonder our
situation is just better suited to a database?

Where do people draw the line? What do you consider appropriate usage model
for HDF5 vs. a database or something else? Thanks for any input we have
"adopted" HDF5 but really we don't understand it that well yet.

-Philip

···

On Mon, Dec 6, 2010 at 3:21 PM, Mark Miller <miller86@llnl.gov> wrote:

I am not sure if you got an answer to this email and so I thought I
would pipe up.

Yes, you can do mmap if you'd like. I took HDF5's sec2 Virtual File
Driver (VFD) and tweeked it to use mmap instead just to test how
something like this would work. I've attached the (hacked) code. To use
it, you are going to have to learn a bit about HDF5 VFDs. Learn about
them in File Access Property lists,
http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html, as well as

http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html

It is something to start with. I don't know if HDF5 has plans for
writing an mmap based VFD but they really ought to and it is something
that is definitely lacking from their supported VFDs currently.

Mark

On Fri, 2010-12-03 at 17:02, Philip Winston wrote:
> We just added HDF5 support in our application. We are using the C
> API. Our datasets are 1D and 2D arrays of integers, a pretty simple
> structure on disk. Today we have about 5GB of data and we load the
> whole thing into RAM, do somewhat random reads, make changes, then
> overwrite the old .h5 file.
>
> I only learned a very minimum amount of the HDF5 API to accomplish the
> above, and it was pretty easy. Now we are looking at supporting much
> larger datasets, such that it will no longer be practical to have the
> whole thing in memory. This is where I'm confused on exactly what
> HDF5 offers vs. what is up to the application, and on what's the best
> way to do things in the application.
>
> Ideally in my mind what I want is an mmap like interface, just a raw
> pointer which "magically" pages stuff off disk in response to reads,
> and writes stuff back to disk in response to writes. Does HDF5 have
> something like this, or can/do people end up writing something like
> this on top of HDF5? Today our datasets our contiguous and I assuming
> we'd want chunked datasets instead, but it's not clear to me how much
> "paging" functionality chunked buys you and how much you have to
> implement.
>
> Thanks for any ideas or pointers.
>
> -Philip
--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Thanks for the info and code!

You're welcome.

Given this mmap VFD isn't yet part of the library, I'm wondering does
anyone do what we're talking today, with the existing HDF5 library?

So, I guess I could be totally confused here. Unfortunately, my week is
so busy I won't have time to discuss/debate all the good questions
you've asked. Hopefully someone else might.

In the interim, I don't think you can avoid explicitly reading/writing
parts of your data. I mean the code I gave you mmaps the file as a
whole, not individual datasets in the file. But, it nonetheless mmaps
UNDERNEATH the explicit reads/writes (e.g. H5Dread/H5Dwrite calls) made
by the application. So, I am thinking this is nowhere near the paradigm
you were hoping for.

You can do partial reads and writes WITHOUT resorting to chunked
datasets. You would need chunked datasets ONLY if you expect the size of
the dataset to vary over time and/or you are using various filters upon
it during I/O (e.g. compression, checksumming). At the same time, there
may be no harm in chunking your dataset.

I don't know if chunking the dataset and then optimizing your partial
reads/writes around the chunk structure would be 'better' than NOT
chunking it and just relying upon HDF5's partial I/O capabilities on the
unchunked dataset.

  To summarize we have a dataset that doesn't fit in memory. We want
to "randomly" perform reads, reading only portions into RAM. Then we
make changes in RAM. Then we want to write out only the changed
portions.

I'm guessing a chunked file is the starting point here, but what else
is needed? Is there a layer on top to coordinate things? To hold a
list of modified chunks?

Is it even a good idea to attempt this usage model with HDF5? I read
one person suggest using HDF5 is good for bulk read-only data but that
he would use a database for "complex" data that requires changes. I
wonder our situation is just better suited to a database?

Where do people draw the line? What do you consider appropriate usage
model for HDF5 vs. a database or something else? Thanks for any input
we have "adopted" HDF5 but really we don't understand it that well
yet.

I think that depends on how complex your 'queries' are going to be and
how much that query optimization could be exploited to improve I/O.

My experience is that for simple queries (give me this hyperslab of
data), products like HDF5 are going to give better I/O performance than
some RDBMS. But, if you are really talking about highly sophisticated
queries where future reads/writes depend upon other parts of the query
and the datasets being queried, that sounds more like an RDBMS than an
I/O library sort of thing. Just my two cents. Good luck.

···

On Mon, 2010-12-06 at 15:57, Philip Winston wrote:

-Philip

On Mon, Dec 6, 2010 at 3:21 PM, Mark Miller <miller86@llnl.gov> wrote:
        I am not sure if you got an answer to this email and so I
        thought I
        would pipe up.
        
        Yes, you can do mmap if you'd like. I took HDF5's sec2 Virtual
        File
        Driver (VFD) and tweeked it to use mmap instead just to test
        how
        something like this would work. I've attached the (hacked)
        code. To use
        it, you are going to have to learn a bit about HDF5 VFDs.
        Learn about
        them in File Access Property lists,
        http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html, as well as
        
        http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html
        
        It is something to start with. I don't know if HDF5 has plans
        for
        writing an mmap based VFD but they really ought to and it is
        something
        that is definitely lacking from their supported VFDs
        currently.
        
        Mark
        
        On Fri, 2010-12-03 at 17:02, Philip Winston wrote:
        > We just added HDF5 support in our application. We are using
        the C
        > API. Our datasets are 1D and 2D arrays of integers, a pretty
        simple
        > structure on disk. Today we have about 5GB of data and we
        load the
        > whole thing into RAM, do somewhat random reads, make
        changes, then
        > overwrite the old .h5 file.
        >
        > I only learned a very minimum amount of the HDF5 API to
        accomplish the
        > above, and it was pretty easy. Now we are looking at
        supporting much
        > larger datasets, such that it will no longer be practical to
        have the
        > whole thing in memory. This is where I'm confused on
        exactly what
        > HDF5 offers vs. what is up to the application, and on what's
        the best
        > way to do things in the application.
        >
        > Ideally in my mind what I want is an mmap like interface,
        just a raw
        > pointer which "magically" pages stuff off disk in response
        to reads,
        > and writes stuff back to disk in response to writes. Does
        HDF5 have
        > something like this, or can/do people end up writing
        something like
        > this on top of HDF5? Today our datasets our contiguous and
        I assuming
        > we'd want chunked datasets instead, but it's not clear to me
        how much
        > "paging" functionality chunked buys you and how much you
        have to
        > implement.
        >
        > Thanks for any ideas or pointers.
        >
        > -Philip
        
        --
        Mark C. Miller, Lawrence Livermore National Laboratory
        ================!!LLNL BUSINESS ONLY!!================
        miller86@llnl.gov urgent: miller86@pager.llnl.gov
        T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511
        
        _______________________________________________
        Hdf-forum is for HDF software users discussion.
        Hdf-forum@hdfgroup.org
        http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
        

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

I mean the code I gave you mmaps the file as a
whole, not individual datasets in the file. But, it nonetheless mmaps
UNDERNEATH the explicit reads/writes (e.g. H5Dread/H5Dwrite calls) made
by the application. So, I am thinking this is nowhere near the paradigm
you were hoping for.

I was hoping for a true mmap model. But now I see perhaps that is
impossible. mmap only works if what is in memory is identical to what's on
disk, for HDF5 endianness alone can break this assumption right? Plus lots
of other things like chunked datasets.

So for my situation one option is keep HDF5 around for interchange, but for
runtime "optimize" to a simple binary format where I can mmap the entire
dataset. Then I can just read/write anywhere and the OS takes care of
everything.

It's tempting to me, coming from a situation where everything is in RAM
today, it seems like the least work to continue to access randomly and let
the OS figured it out. But I don't know how smart it is. Maybe it is kind
of a red herring, like that would work but it would perform horribly. Maybe
coming from a situation where everything is in RAM, we have to rethink
things a lot to make it work off disk, to organize stuff for coherence, so
we can read big chunks instead of single rows.

My experience is that for simple queries (give me this hyperslab of

data), products like HDF5 are going to give better I/O performance than
some RDBMS. But, if you are really talking about highly sophisticated
queries where future reads/writes depend upon other parts of the query
and the datasets being queried, that sounds more like an RDBMS than an
I/O library sort of thing. Just my two cents. Good luck.

Our data is essentially a tabular representation of a tree. Every row is a
node in the tree. There are 2-10 values in a row, but tens of millions of
rows. So in a sense our queries do depend on values as we read them,
because for example we'll read a value, find the children of a node, read
those values, etc. etc.

I imagine HDF5 being best for reading large amounts of data each time. We
would generally always be reading 1 row at a time. Set up one hyperslab,
tiny read, new hyperslab, tiny read.

We have other uses in mind for HDF5 but this particular type of a data I
wonder, maybe it's just not a good fit.

-Philip

···

On Mon, Dec 6, 2010 at 3:21 PM, Mark Miller <miller86@llnl.gov> wrote:
> I am not sure if you got an answer to this email and so I
> thought I
> would pipe up.
>
> Yes, you can do mmap if you'd like. I took HDF5's sec2 Virtual
> File
> Driver (VFD) and tweeked it to use mmap instead just to test
> how
> something like this would work. I've attached the (hacked)
> code. To use
> it, you are going to have to learn a bit about HDF5 VFDs.
> Learn about
> them in File Access Property lists,
> http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html, as well as
>
> http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html
>
>
> It is something to start with. I don't know if HDF5 has plans
> for
> writing an mmap based VFD but they really ought to and it is
> something
> that is definitely lacking from their supported VFDs
> currently.
>
> Mark
>
> On Fri, 2010-12-03 at 17:02, Philip Winston wrote:
> > We just added HDF5 support in our application. We are using
> the C
> > API. Our datasets are 1D and 2D arrays of integers, a pretty
> simple
> > structure on disk. Today we have about 5GB of data and we
> load the
> > whole thing into RAM, do somewhat random reads, make
> changes, then
> > overwrite the old .h5 file.
> >
> > I only learned a very minimum amount of the HDF5 API to
> accomplish the
> > above, and it was pretty easy. Now we are looking at
> supporting much
> > larger datasets, such that it will no longer be practical to
> have the
> > whole thing in memory. This is where I'm confused on
> exactly what
> > HDF5 offers vs. what is up to the application, and on what's
> the best
> > way to do things in the application.
> >
> > Ideally in my mind what I want is an mmap like interface,
> just a raw
> > pointer which "magically" pages stuff off disk in response
> to reads,
> > and writes stuff back to disk in response to writes. Does
> HDF5 have
> > something like this, or can/do people end up writing
> something like
> > this on top of HDF5? Today our datasets our contiguous and
> I assuming
> > we'd want chunked datasets instead, but it's not clear to me
> how much
> > "paging" functionality chunked buys you and how much you
> have to
> > implement.
> >
> > Thanks for any ideas or pointers.
> >
> > -Philip
>
> --
> Mark C. Miller, Lawrence Livermore National Laboratory
> ================!!LLNL BUSINESS ONLY!!================
> miller86@llnl.gov urgent: miller86@pager.llnl.gov
> T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi Phillip,

  it sounds as if using Hyperslabs would do what you need, see here for instance:

http://www.hdfgroup.org/HDF5/doc/RM/RM_H5S.html#Dataspace-SelectHyperslab

Hyperslabs allow you to only read a subset of a dataset, and thus allows to iterate over memory-fitting parts of a dataset which by itself is larger than available RAM.

At this point it is not relevant if the dataset is chunked or not, or how large the chunks are, but the chunking may influence the performance significantly. Once you have a working system using hyperslabs on an unchunked dataset, you would want to play with internal chunk parameters to investigate performance.

Hyperslabs are good for n-dimensional datasets. Would that address your needs?

          Werner

···

On Tue, 07 Dec 2010 00:57:45 +0100, Philip Winston <pwinston@gmail.com> wrote:

Thanks for the info and code!

Given this mmap VFD isn't yet part of the library, I'm wondering does anyone do what we're talking today, with the existing HDF5 library? To summarize we have a dataset that doesn't fit in memory. We want to "randomly" perform reads, reading only portions into RAM. Then we make changes in > RAM. Then we want to write out only the changed portions.
I'm guessing a chunked file is the starting point here, but what else is needed? Is there a layer on top to coordinate things? To hold a list of modified chunks?

Is it even a good idea to attempt this usage model with HDF5? I read one person suggest using HDF5 is good for bulk read-only data but that he would use a database for "complex" data that requires changes. I wonder our situation is just better suited to a database?

Where do people draw the line? What do you consider appropriate usage model for HDF5 vs. a database or something else? Thanks for any input we have "adopted" HDF5 but really we don't understand it that well yet.

-Philip

On Mon, Dec 6, 2010 at 3:21 PM, Mark Miller <miller86@llnl.gov> wrote:

I am not sure if you got an answer to this email and so I thought I
would pipe up.

Yes, you can do mmap if you'd like. I took HDF5's sec2 Virtual File
Driver (VFD) and tweeked it to use mmap instead just to test how
something like this would work. I've attached the (hacked) code. To use
it, you are going to have to learn a bit about HDF5 VFDs. Learn about
them in File Access Property lists,
http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html, as well as

http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html

It is something to start with. I don't know if HDF5 has plans for
writing an mmap based VFD but they really ought to and it is something
that is definitely lacking from their supported VFDs currently.

Mark

On Fri, 2010-12-03 at 17:02, Philip Winston wrote:
> We just added HDF5 support in our application. We are using the C
> API. Our datasets are 1D and 2D arrays of integers, a pretty simple
> structure on disk. Today we have about 5GB of data and we load the
> whole thing into RAM, do somewhat random reads, make changes, then
> overwrite the old .h5 file.
>
> I only learned a very minimum amount of the HDF5 API to accomplish the
> above, and it was pretty easy. Now we are looking at supporting much
> larger datasets, such that it will no longer be practical to have the
> whole thing in memory. This is where I'm confused on exactly what
> HDF5 offers vs. what is up to the application, and on what's the best
> way to do things in the application.
>
> Ideally in my mind what I want is an mmap like interface, just a raw
> pointer which "magically" pages stuff off disk in response to reads,
> and writes stuff back to disk in response to writes. Does HDF5 have
> something like this, or can/do people end up writing something like
> this on top of HDF5? Today our datasets our contiguous and I assuming
> we'd want chunked datasets instead, but it's not clear to me how much
> "paging" functionality chunked buys you and how much you have to
> implement.
>
> Thanks for any ideas or pointers.
>
> -Philip
--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Hi Philip,
  Have you considered using the 'core' file driver (H5Pset_fapl_core)?

  Quincey

···

On Dec 6, 2010, at 6:52 PM, Philip Winston wrote:

I mean the code I gave you mmaps the file as a
whole, not individual datasets in the file. But, it nonetheless mmaps
UNDERNEATH the explicit reads/writes (e.g. H5Dread/H5Dwrite calls) made
by the application. So, I am thinking this is nowhere near the paradigm
you were hoping for.

I was hoping for a true mmap model. But now I see perhaps that is impossible. mmap only works if what is in memory is identical to what's on disk, for HDF5 endianness alone can break this assumption right? Plus lots of other things like chunked datasets.

So for my situation one option is keep HDF5 around for interchange, but for runtime "optimize" to a simple binary format where I can mmap the entire dataset. Then I can just read/write anywhere and the OS takes care of everything.

It's tempting to me, coming from a situation where everything is in RAM today, it seems like the least work to continue to access randomly and let the OS figured it out. But I don't know how smart it is. Maybe it is kind of a red herring, like that would work but it would perform horribly. Maybe coming from a situation where everything is in RAM, we have to rethink things a lot to make it work off disk, to organize stuff for coherence, so we can read big chunks instead of single rows.

My experience is that for simple queries (give me this hyperslab of
data), products like HDF5 are going to give better I/O performance than
some RDBMS. But, if you are really talking about highly sophisticated
queries where future reads/writes depend upon other parts of the query
and the datasets being queried, that sounds more like an RDBMS than an
I/O library sort of thing. Just my two cents. Good luck.

Our data is essentially a tabular representation of a tree. Every row is a node in the tree. There are 2-10 values in a row, but tens of millions of rows. So in a sense our queries do depend on values as we read them, because for example we'll read a value, find the children of a node, read those values, etc. etc.

I imagine HDF5 being best for reading large amounts of data each time. We would generally always be reading 1 row at a time. Set up one hyperslab, tiny read, new hyperslab, tiny read.

We have other uses in mind for HDF5 but this particular type of a data I wonder, maybe it's just not a good fit.

-Philip

> On Mon, Dec 6, 2010 at 3:21 PM, Mark Miller <miller86@llnl.gov> wrote:
> I am not sure if you got an answer to this email and so I
> thought I
> would pipe up.
>
> Yes, you can do mmap if you'd like. I took HDF5's sec2 Virtual
> File
> Driver (VFD) and tweeked it to use mmap instead just to test
> how
> something like this would work. I've attached the (hacked)
> code. To use
> it, you are going to have to learn a bit about HDF5 VFDs.
> Learn about
> them in File Access Property lists,
> http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html, as well as
>
> http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html
>
>
> It is something to start with. I don't know if HDF5 has plans
> for
> writing an mmap based VFD but they really ought to and it is
> something
> that is definitely lacking from their supported VFDs
> currently.
>
> Mark
>
> On Fri, 2010-12-03 at 17:02, Philip Winston wrote:
> > We just added HDF5 support in our application. We are using
> the C
> > API. Our datasets are 1D and 2D arrays of integers, a pretty
> simple
> > structure on disk. Today we have about 5GB of data and we
> load the
> > whole thing into RAM, do somewhat random reads, make
> changes, then
> > overwrite the old .h5 file.
> >
> > I only learned a very minimum amount of the HDF5 API to
> accomplish the
> > above, and it was pretty easy. Now we are looking at
> supporting much
> > larger datasets, such that it will no longer be practical to
> have the
> > whole thing in memory. This is where I'm confused on
> exactly what
> > HDF5 offers vs. what is up to the application, and on what's
> the best
> > way to do things in the application.
> >
> > Ideally in my mind what I want is an mmap like interface,
> just a raw
> > pointer which "magically" pages stuff off disk in response
> to reads,
> > and writes stuff back to disk in response to writes. Does
> HDF5 have
> > something like this, or can/do people end up writing
> something like
> > this on top of HDF5? Today our datasets our contiguous and
> I assuming
> > we'd want chunked datasets instead, but it's not clear to me
> how much
> > "paging" functionality chunked buys you and how much you
> have to
> > implement.
> >
> > Thanks for any ideas or pointers.
> >
> > -Philip
>
> --
> Mark C. Miller, Lawrence Livermore National Laboratory
> ================!!LLNL BUSINESS ONLY!!================
> miller86@llnl.gov urgent: miller86@pager.llnl.gov
> T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Our data is essentially a tabular representation of a tree. Every row
is a node in the tree. There are 2-10 values in a row, but tens of
millions of rows. So in a sense our queries do depend on values as we
read them, because for example we'll read a value, find the children
of a node, read those values, etc. etc.

I imagine HDF5 being best for reading large amounts of data each time.
We would generally always be reading 1 row at a time. Set up one
hyperslab, tiny read, new hyperslab, tiny read.

We have other uses in mind for HDF5 but this particular type of a data
I wonder, maybe it's just not a good fit.

So, others may have a different opinion on this.

But, in my experience, if you don't mind tuning the I/O operations
yourself by designing appropriate HDF5 persistent structures (e.g. the
HDF5 file) together with appropriate algorithms for handling I/O updates
between your memory resident data and the data as it is stored in the
file, a product like HDF5 gives you pretty much all the knobs you could
ever want to achieve good performance.

If, on the other hand, you are hoping to just take advantage of some
other existing I/O optimizing solution and you DO NOT want to have to
explicitly manage and think about that, then you might want to consider
another product, though I honestly don't know how many can do a good job
of hiding I/O issues/performance -- across a wide range of access
patterns -- from you without also taking a slew of memory.

That said, I have worked on just the kind of problem you describe; a
large tree where the nodes in the tree are maybe a few kilobytes in
size. I store the data in an HDF5 file with nodes grouped into larger
chunks. But, I like to traverse the tree as though it is entirely in
memory without having to think about it. My tree nodes are 'smart'
enough that when I traverse a pointer to a non-memory-resident node, the
I/O to HDF5 happens, automagically. A single read brings many, many
nodes into memory. But, they are located 'near' each other in the tree
so the cost of the read is often well amortized over all the nearby
nodes I wind up traversing anyways. Its relatively easy to code up
something like this. Performance was reasonable for the application I
was working on at the time but I could also easily predict common
traversals I'd need.

Mark

···

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

Yes I think hyperslabs would be an essential tool for us.

Our hyperslabs would essentially be just single rows. That worries me a
little how it would perform but I should just try it and see. We need to be
able to extend the dataset so we need chunking just for that, if nothing
else.

As per my other email I am worried maybe reading/writing single rows is not
a good fit for HDF5? But again I should really just experiment and see.
Thanks.

-Philip

···

On Mon, Dec 6, 2010 at 7:03 PM, Werner Benger <werner@cct.lsu.edu> wrote:

Hi Phillip,

it sounds as if using Hyperslabs would do what you need, see here for
instance:

http://www.hdfgroup.org/HDF5/doc/RM/RM_H5S.html#Dataspace-SelectHyperslab

Hyperslabs allow you to only read a subset of a dataset, and thus allows to
iterate over memory-fitting parts of a dataset which by itself is larger
than available RAM.

At this point it is not relevant if the dataset is chunked or not, or how
large the chunks are, but the chunking may influence the performance
significantly. Once you have a working system using hyperslabs on an
unchunked dataset, you would want to play with internal chunk parameters to
investigate performance.

Hyperslabs are good for n-dimensional datasets. Would that address your
needs?

         Werner

On Tue, 07 Dec 2010 00:57:45 +0100, Philip Winston <pwinston@gmail.com> > wrote:

Thanks for the info and code!

Given this mmap VFD isn't yet part of the library, I'm wondering does
anyone do what we're talking today, with the existing HDF5 library? To
summarize we have a dataset that doesn't fit in memory. We want to
"randomly" perform reads, reading only portions into RAM. Then we make
changes in RAM. Then we want to write out only the changed portions.

I'm guessing a chunked file is the starting point here, but what else is
needed? Is there a layer on top to coordinate things? To hold a list of
modified chunks?

Is it even a good idea to attempt this usage model with HDF5? I read one
person suggest using HDF5 is good for bulk read-only data but that he would
use a database for "complex" data that requires changes. I wonder our
situation is just better suited to a database?

Where do people draw the line? What do you consider appropriate usage
model for HDF5 vs. a database or something else? Thanks for any input we
have "adopted" HDF5 but really we don't understand it that well yet.

-Philip

On Mon, Dec 6, 2010 at 3:21 PM, Mark Miller <miller86@llnl.gov> wrote:

I am not sure if you got an answer to this email and so I thought I
would pipe up.

Yes, you can do mmap if you'd like. I took HDF5's sec2 Virtual File
Driver (VFD) and tweeked it to use mmap instead just to test how
something like this would work. I've attached the (hacked) code. To use
it, you are going to have to learn a bit about HDF5 VFDs. Learn about
them in File Access Property lists,
http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html, as well as

http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html

It is something to start with. I don't know if HDF5 has plans for
writing an mmap based VFD but they really ought to and it is something
that is definitely lacking from their supported VFDs currently.

Mark

On Fri, 2010-12-03 at 17:02, Philip Winston wrote:
> We just added HDF5 support in our application. We are using the C
> API. Our datasets are 1D and 2D arrays of integers, a pretty simple
> structure on disk. Today we have about 5GB of data and we load the
> whole thing into RAM, do somewhat random reads, make changes, then
> overwrite the old .h5 file.
>
> I only learned a very minimum amount of the HDF5 API to accomplish the
> above, and it was pretty easy. Now we are looking at supporting much
> larger datasets, such that it will no longer be practical to have the
> whole thing in memory. This is where I'm confused on exactly what
> HDF5 offers vs. what is up to the application, and on what's the best
> way to do things in the application.
>
> Ideally in my mind what I want is an mmap like interface, just a raw
> pointer which "magically" pages stuff off disk in response to reads,
> and writes stuff back to disk in response to writes. Does HDF5 have
> something like this, or can/do people end up writing something like
> this on top of HDF5? Today our datasets our contiguous and I assuming
> we'd want chunked datasets instead, but it's not clear to me how much
> "paging" functionality chunked buys you and how much you have to
> implement.
>
> Thanks for any ideas or pointers.
>
> -Philip
--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

I had not looked at the core driver, thanks.

It seems like a useful thing to be aware of in general but I don't think it
helps in my case. It sounds like it is useful mainly for writing, writing
an HDF5 in memory.

But if you have a big HDF5 on disk, I don't see how the core driver helps
you access it. You could copy the whole thing to an in-memory file but we
don't want a big startup hit like that. But maybe I am missing a way to use
the core driver here.

-Philip

···

On Mon, Dec 6, 2010 at 10:08 PM, Quincey Koziol <koziol@hdfgroup.org> wrote:

Hi Philip,
Have you considered using the 'core' file driver (H5Pset_fapl_core)?

Quincey

On Dec 6, 2010, at 6:52 PM, Philip Winston wrote:

I mean the code I gave you mmaps the file as a

whole, not individual datasets in the file. But, it nonetheless mmaps
UNDERNEATH the explicit reads/writes (e.g. H5Dread/H5Dwrite calls) made
by the application. So, I am thinking this is nowhere near the paradigm
you were hoping for.

I was hoping for a true mmap model. But now I see perhaps that is
impossible. mmap only works if what is in memory is identical to what's on
disk, for HDF5 endianness alone can break this assumption right? Plus lots
of other things like chunked datasets.

So for my situation one option is keep HDF5 around for interchange, but for
runtime "optimize" to a simple binary format where I can mmap the entire
dataset. Then I can just read/write anywhere and the OS takes care of
everything.

It's tempting to me, coming from a situation where everything is in RAM
today, it seems like the least work to continue to access randomly and let
the OS figured it out. But I don't know how smart it is. Maybe it is kind
of a red herring, like that would work but it would perform horribly. Maybe
coming from a situation where everything is in RAM, we have to rethink
things a lot to make it work off disk, to organize stuff for coherence, so
we can read big chunks instead of single rows.

My experience is that for simple queries (give me this hyperslab of

data), products like HDF5 are going to give better I/O performance than
some RDBMS. But, if you are really talking about highly sophisticated
queries where future reads/writes depend upon other parts of the query
and the datasets being queried, that sounds more like an RDBMS than an
I/O library sort of thing. Just my two cents. Good luck.

Our data is essentially a tabular representation of a tree. Every row is a
node in the tree. There are 2-10 values in a row, but tens of millions of
rows. So in a sense our queries do depend on values as we read them,
because for example we'll read a value, find the children of a node, read
those values, etc. etc.

I imagine HDF5 being best for reading large amounts of data each time. We
would generally always be reading 1 row at a time. Set up one hyperslab,
tiny read, new hyperslab, tiny read.

We have other uses in mind for HDF5 but this particular type of a data I
wonder, maybe it's just not a good fit.

-Philip

> On Mon, Dec 6, 2010 at 3:21 PM, Mark Miller <miller86@llnl.gov> wrote:

> I am not sure if you got an answer to this email and so I
> thought I
> would pipe up.
>
> Yes, you can do mmap if you'd like. I took HDF5's sec2 Virtual
> File
> Driver (VFD) and tweeked it to use mmap instead just to test
> how
> something like this would work. I've attached the (hacked)
> code. To use
> it, you are going to have to learn a bit about HDF5 VFDs.
> Learn about
> them in File Access Property lists,
> http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html, as well as
>
> http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html
>
>
> It is something to start with. I don't know if HDF5 has plans
> for
> writing an mmap based VFD but they really ought to and it is
> something
> that is definitely lacking from their supported VFDs
> currently.
>
> Mark
>
> On Fri, 2010-12-03 at 17:02, Philip Winston wrote:
> > We just added HDF5 support in our application. We are using
> the C
> > API. Our datasets are 1D and 2D arrays of integers, a pretty
> simple
> > structure on disk. Today we have about 5GB of data and we
> load the
> > whole thing into RAM, do somewhat random reads, make
> changes, then
> > overwrite the old .h5 file.
> >
> > I only learned a very minimum amount of the HDF5 API to
> accomplish the
> > above, and it was pretty easy. Now we are looking at
> supporting much
> > larger datasets, such that it will no longer be practical to
> have the
> > whole thing in memory. This is where I'm confused on
> exactly what
> > HDF5 offers vs. what is up to the application, and on what's
> the best
> > way to do things in the application.
> >
> > Ideally in my mind what I want is an mmap like interface,
> just a raw
> > pointer which "magically" pages stuff off disk in response
> to reads,
> > and writes stuff back to disk in response to writes. Does
> HDF5 have
> > something like this, or can/do people end up writing
> something like
> > this on top of HDF5? Today our datasets our contiguous and
> I assuming
> > we'd want chunked datasets instead, but it's not clear to me
> how much
> > "paging" functionality chunked buys you and how much you
> have to
> > implement.
> >
> > Thanks for any ideas or pointers.
> >
> > -Philip
>
> --
> Mark C. Miller, Lawrence Livermore National Laboratory
> ================!!LLNL BUSINESS ONLY!!================
> miller86@llnl.gov urgent: miller86@pager.llnl.gov
> T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

mmap might not be the best way to go in a many-core environment.
Mmap-ing a file means that page tables have to be updated which stops
all cores.

In the Table System of the casacore package it is possible to use mmap
or normal IO (using a cache like HDF5's chunk cache). For truely random
IO mmap outperforms normal IO because the OS keeps pages as long as
possible. But for known access patterns normal IO wins because it is
easier to optimize (e.g. by using asynchronous IO).

Note that HDF5 can be slow for accesses using small hyperslabs. I've
done some tests where I created a 3D chunked dataset. HDF5 used a lot of
user time when stepping line by line (either x, y, or z) through that
dataset. Casacore, which can also chunk large data sets, was much faster
in such a case.
So it makes sense to access larger chunks of data in HDF5, even though
HDF5 does the IO a chunk at a time.

Also note that IO can degrade severely when not accessing the data
sequentially. We found in some cases it was better to reorder the data
temporarily than to leapfrog through the file because the disk seeks
were killing performance. This was for files of several tens of GBytes
under Linux. If the file fits in the kernel's file cache, problems are
much smaller. In such a case it is better to access the file
sequentially once and thereafter access randomly.

Cheers,
Ger

Philip Winston <pwinston@gmail.com> 12/7/2010 3:52 AM >>>

I mean the code I gave you mmaps the file as a
whole, not individual datasets in the file. But, it nonetheless mmaps
UNDERNEATH the explicit reads/writes (e.g. H5Dread/H5Dwrite calls)
made
by the application. So, I am thinking this is nowhere near the
paradigm
you were hoping for.

I was hoping for a true mmap model. But now I see perhaps that is
impossible. mmap only works if what is in memory is identical to what's
on disk, for HDF5 endianness alone can break this assumption right? Plus
lots of other things like chunked datasets.

So for my situation one option is keep HDF5 around for interchange, but
for runtime "optimize" to a simple binary format where I can mmap the
entire dataset. Then I can just read/write anywhere and the OS takes
care of everything.

It's tempting to me, coming from a situation where everything is in RAM
today, it seems like the least work to continue to access randomly and
let the OS figured it out. But I don't know how smart it is. Maybe it is
kind of a red herring, like that would work but it would perform
horribly. Maybe coming from a situation where everything is in RAM, we
have to rethink things a lot to make it work off disk, to organize stuff
for coherence, so we can read big chunks instead of single rows.

My experience is that for simple queries (give me this hyperslab of

data), products like HDF5 are going to give better I/O performance
than
some RDBMS. But, if you are really talking about highly sophisticated
queries where future reads/writes depend upon other parts of the query
and the datasets being queried, that sounds more like an RDBMS than an
I/O library sort of thing. Just my two cents. Good luck.

Our data is essentially a tabular representation of a tree. Every row
is a node in the tree. There are 2-10 values in a row, but tens of
millions of rows. So in a sense our queries do depend on values as we
read them, because for example we'll read a value, find the children of
a node, read those values, etc. etc.

I imagine HDF5 being best for reading large amounts of data each time.
We would generally always be reading 1 row at a time. Set up one
hyperslab, tiny read, new hyperslab, tiny read.

We have other uses in mind for HDF5 but this particular type of a data
I wonder, maybe it's just not a good fit.

-Philip

···

On Mon, Dec 6, 2010 at 3:21 PM, Mark Miller <miller86@llnl.gov> wrote:
I am not sure if you got an answer to this email and so I
thought I
would pipe up.

Yes, you can do mmap if you'd like. I took HDF5's sec2 Virtual
File
Driver (VFD) and tweeked it to use mmap instead just to test
how
something like this would work. I've attached the (hacked)
code. To use
it, you are going to have to learn a bit about HDF5 VFDs.
Learn about
them in File Access Property lists,
http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html, as well as

http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html

It is something to start with. I don't know if HDF5 has plans
for
writing an mmap based VFD but they really ought to and it is
something
that is definitely lacking from their supported VFDs
currently.

Mark

On Fri, 2010-12-03 at 17:02, Philip Winston wrote:
> We just added HDF5 support in our application. We are using
the C
> API. Our datasets are 1D and 2D arrays of integers, a pretty
simple
> structure on disk. Today we have about 5GB of data and we
load the
> whole thing into RAM, do somewhat random reads, make
changes, then
> overwrite the old .h5 file.
>
> I only learned a very minimum amount of the HDF5 API to
accomplish the
> above, and it was pretty easy. Now we are looking at
supporting much
> larger datasets, such that it will no longer be practical to
have the
> whole thing in memory. This is where I'm confused on
exactly what
> HDF5 offers vs. what is up to the application, and on what's
the best
> way to do things in the application.
>
> Ideally in my mind what I want is an mmap like interface,
just a raw
> pointer which "magically" pages stuff off disk in response
to reads,
> and writes stuff back to disk in response to writes. Does
HDF5 have
> something like this, or can/do people end up writing
something like
> this on top of HDF5? Today our datasets our contiguous and
I assuming
> we'd want chunked datasets instead, but it's not clear to me
how much
> "paging" functionality chunked buys you and how much you
have to
> implement.
>
> Thanks for any ideas or pointers.
>
> -Philip

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--

Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

I would think reading/writing single rows would be fine via HDF5. But it will depend if your chunks are "compatible" with your access patterns.

Basically HDF5 would just to seek/read/write, and the disk access patterns should be pretty optimal. In case you're using an SSD, then the seek wouldn't take any time at all. So all depends on the actual environment, and if you need data transformation (endianness/compression). If not, then the HDF5 read() call should be pretty much similar to mmap(), since at some point you have to read from the disk anyway.

          Werner

···

On Tue, 07 Dec 2010 04:13:10 +0100, Philip Winston <pwinston@gmail.com> wrote:

Yes I think hyperslabs would be an essential tool for us.

Our hyperslabs would essentially be just single rows. That worries me a little how it would perform but I should just try it and see. We need to be able to extend the dataset so we need chunking just for that, if nothing else.

As per my other email I am worried maybe reading/writing single rows is not a good fit for HDF5? But again I should really just experiment and see. Thanks.

-Philip

On Mon, Dec 6, 2010 at 7:03 PM, Werner Benger <werner@cct.lsu.edu> wrote:

Hi Phillip,

it sounds as if using Hyperslabs would do what you need, see here for instance:

http://www.hdfgroup.org/HDF5/doc/RM/RM_H5S.html#Dataspace-SelectHyperslab

Hyperslabs allow you to only read a subset of a dataset, and thus allows to iterate over memory-fitting parts of a dataset which by itself is larger than available RAM.

At this point it is not relevant if the dataset is chunked or not, or how large the chunks are, but the chunking may influence the performance significantly. Once you have a working system using hyperslabs on an unchunked dataset, you would want to play with internal chunk parameters to >>investigate performance.

Hyperslabs are good for n-dimensional datasets. Would that address your needs?

        Werner

On Tue, 07 Dec 2010 00:57:45 +0100, Philip Winston <pwinston@gmail.com> >> wrote:

Thanks for the info and code!

Given this mmap VFD isn't yet part of the library, I'm wondering does anyone do what we're talking today, with the existing HDF5 library? To summarize we have a dataset that doesn't fit in memory. We want to "randomly" perform reads, reading only portions into RAM. Then we make changes in >>> RAM. Then we want to write out only the changed portions.
I'm guessing a chunked file is the starting point here, but what else is needed? Is there a layer on top to coordinate things? To hold a list of modified chunks?

Is it even a good idea to attempt this usage model with HDF5? I read one person suggest using HDF5 is good for bulk read-only data but that he would use a database for "complex" data that requires changes. I wonder our situation is just better suited to a database?

Where do people draw the line? What do you consider appropriate usage model for HDF5 vs. a database or something else? Thanks for any input we have "adopted" HDF5 but really we don't understand it that well yet.

-Philip

On Mon, Dec 6, 2010 at 3:21 PM, Mark Miller <miller86@llnl.gov> wrote:

I am not sure if you got an answer to this email and so I thought I
would pipe up.

Yes, you can do mmap if you'd like. I took HDF5's sec2 Virtual File
Driver (VFD) and tweeked it to use mmap instead just to test how
something like this would work. I've attached the (hacked) code. To use
it, you are going to have to learn a bit about HDF5 VFDs. Learn about
them in File Access Property lists,
http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html, as well as

http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html

It is something to start with. I don't know if HDF5 has plans for
writing an mmap based VFD but they really ought to and it is something
that is definitely lacking from their supported VFDs currently.

Mark

On Fri, 2010-12-03 at 17:02, Philip Winston wrote:
> We just added HDF5 support in our application. We are using the C
> API. Our datasets are 1D and 2D arrays of integers, a pretty simple
> structure on disk. Today we have about 5GB of data and we load the
> whole thing into RAM, do somewhat random reads, make changes, then
> overwrite the old .h5 file.
>
> I only learned a very minimum amount of the HDF5 API to accomplish the
> above, and it was pretty easy. Now we are looking at supporting much
> larger datasets, such that it will no longer be practical to have the
> whole thing in memory. This is where I'm confused on exactly what
> HDF5 offers vs. what is up to the application, and on what's the best
> way to do things in the application.
>
> Ideally in my mind what I want is an mmap like interface, just a raw
> pointer which "magically" pages stuff off disk in response to reads,
> and writes stuff back to disk in response to writes. Does HDF5 have
> something like this, or can/do people end up writing something like
> this on top of HDF5? Today our datasets our contiguous and I assuming
> we'd want chunked datasets instead, but it's not clear to me how much
> "paging" functionality chunked buys you and how much you have to
> implement.
>
> Thanks for any ideas or pointers.
>
> -Philip
--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--

___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

A Tuesday 07 December 2010 04:13:10 Philip Winston escrigué:

Yes I think hyperslabs would be an essential tool for us.

Our hyperslabs would essentially be just single rows. That worries
me a little how it would perform but I should just try it and see.
We need to be able to extend the dataset so we need chunking just
for that, if nothing else.

As per my other email I am worried maybe reading/writing single rows
is not a good fit for HDF5? But again I should really just
experiment and see. Thanks.

My experience on this regard is that, when you want speed, there is
little that can compete with mmap in terms of performance. However,
many times you may want to sacrifice extreme performance for more
functionality. In addition, memory mapping also has drawbacks, being
the most important one the inability to map files that are larger than
your available virtual memory, which renders this technology inadequate
for many uses.

What many bindings for high-level languages are doing (specially in
Python) is to treat datasets on-disk (available at low-level via HDF or
NetCDF libraries) like if they were datasets in-memory. That way, you
are effectively dealing with on-disk data as if it was in-memory (this
is what you are after, IIUC). The OS filesystem cache is then in charge
of caching as much data as possible in memory, so you get a behaviour
that is very close in performance to a memory map approach. Of course,
you still have the HDF/NetCDF/whatever layer, which introduces some
overhead, but this is largely compensated by other nice features, like
on-the-flight compression (that may effectively accelerate I/O to disk)
or practically unlimited dataset capacity (i.e. exceeding the virtual
memory boundaries), among many others.

Regarding the comparison of binary formats (like HDF5) in comparison
with relational databases, it is frequent the case that the former
performs better than the later, specially if the interface is optimized.

Hope this helps,

···

--
Francesc Alted

In addition, memory mapping also has drawbacks, being
the most important one the inability to map files that are larger than
your available virtual memory, which renders this technology inadequate
for many uses.

I guess I knew about the VM limitation but it hadn't completely sunk in.

So in fact mmap's limit is the same as RAM:
RAM -> big initial load -> VM limited
mmap -> zero initial load -> VM limited
disk -> zero initial load -> disk limited

That is interesting. For us VM limit will probably last us a long time, if
we crank up the swap size on our machines. But maybe not forever, probably
a fully disk-based solution is the most future proof.

What many bindings for high-level languages are doing (specially in

Python) is to treat datasets on-disk (available at low-level via HDF or
NetCDF libraries) like if they were datasets in-memory...

We looked at PyTables but at the time since we were loading everything into
RAM it seemed like overkill. Now maybe we should consider it again, but
we've already spent a lot of time writing a C++ library that we call from
C++ or from Python.

We have a lot of options:
1) HDF5 format -> PyTables
2) HDF5 format -> custom paging in our C++ library
3) custom binary format -> mmap
4) SQL database
5) key/value store (e.g. redis)

mmap strikes my as the least-change, and likely to perform quite well. But
it would be a shame to use a custom binary format.

Thanks for all the input I won't reply to every message, but lots of good
ideas here, we appreciate it.

-Philip

···

On Tue, Dec 7, 2010 at 6:28 AM, Francesc Alted <faltet@pytables.org> wrote:

A Tuesday 07 December 2010 17:07:30 Philip Winston escrigué:

···

On Tue, Dec 7, 2010 at 6:28 AM, Francesc Alted <faltet@pytables.org> wrote:
> In addition, memory mapping also has drawbacks, being
> the most important one the inability to map files that are larger
> than your available virtual memory, which renders this technology
> inadequate for many uses.

I guess I knew about the VM limitation but it hadn't completely sunk
in.

So in fact mmap's limit is the same as RAM:
RAM -> big initial load -> VM limited
mmap -> zero initial load -> VM limited
disk -> zero initial load -> disk limited

That is interesting. For us VM limit will probably last us a long
time, if we crank up the swap size on our machines. But maybe not
forever, probably a fully disk-based solution is the most future
proof.

One final piece of warning: when you use mmap beyond the extend of your
RAM, you will end swapping out many data (shared libraries, other
processes) that might be important for the performance of your computer.

This is another reason why I don't personally like the mmap approach. I
find much better to let the kernel to decide which data in the
filesystem should be cached in-memory, and not the user app (but YMMV).

--
Francesc Alted

I wonder if some mechanism for just mmapping a portion of the file (particularly a dataset's elements) would be valuable?

  Quincey

···

On Dec 7, 2010, at 9:53 AM, Francesc Alted wrote:

A Tuesday 07 December 2010 17:07:30 Philip Winston escrigué:

On Tue, Dec 7, 2010 at 6:28 AM, Francesc Alted <faltet@pytables.org> > wrote:

In addition, memory mapping also has drawbacks, being
the most important one the inability to map files that are larger
than your available virtual memory, which renders this technology
inadequate for many uses.

I guess I knew about the VM limitation but it hadn't completely sunk
in.

So in fact mmap's limit is the same as RAM:
RAM -> big initial load -> VM limited
mmap -> zero initial load -> VM limited
disk -> zero initial load -> disk limited

That is interesting. For us VM limit will probably last us a long
time, if we crank up the swap size on our machines. But maybe not
forever, probably a fully disk-based solution is the most future
proof.

One final piece of warning: when you use mmap beyond the extend of your
RAM, you will end swapping out many data (shared libraries, other
processes) that might be important for the performance of your computer.

This is another reason why I don't personally like the mmap approach. I
find much better to let the kernel to decide which data in the
filesystem should be cached in-memory, and not the user app (but YMMV).

One final piece of warning: when you use mmap beyond the extend of your
RAM, you will end swapping out many data (shared libraries, other
processes) that might be important for the performance of your computer.

That is too bad you can't tell the OS to page-back your own file instead.
That is if I have 24GB of RAM with 12GB available, and I mmap a 100GB file,
I'd like it kind of churn through that 12GB of RAM with my stuff and leave
everyone else alone. But I guess you don't have that control.

-Philip

You can always munmap something as well, which would be page-backing it.

I've been facing similar issues, like scanning through 500GB HDF5 files with 32GB available RAM.
That works ok with HDF5, but I've implemented my own memory management strategy to tell
it which parts to keep in memory and which parts to unload (of course, using random access to
the datasets). Turned out that the "remove the least-used-object" strategy is not necessarily
the best one (as the OS would follow with pages), but some classification on similarity of objects
that are kept or to be discarded from memory seems much more efficient.

        Werner

···

On Tue, 07 Dec 2010 21:37:51 +0100, Philip Winston <pwinston@gmail.com> wrote:

One final piece of warning: when you use mmap beyond the extend of your
RAM, you will end swapping out many data (shared libraries, other
processes) that might be important for the performance of your computer.

That is too bad you can't tell the OS to page-back your own file instead. That is if I have 24GB of RAM with 12GB available, and I mmap a 100GB file, I'd like it kind of churn through that 12GB of RAM with my stuff and leave everyone else alone. But I guess you don't have that control.

-Philip

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

A Tuesday 07 December 2010 21:37:51 Philip Winston escrigué:

> One final piece of warning: when you use mmap beyond the extend of
> your RAM, you will end swapping out many data (shared libraries,
> other processes) that might be important for the performance of
> your computer.

That is too bad you can't tell the OS to page-back your own file
instead. That is if I have 24GB of RAM with 12GB available, and I
mmap a 100GB file, I'd like it kind of churn through that 12GB of
RAM with my stuff and leave everyone else alone. But I guess you
don't have that control.

Exactly, but I also think that you cannot control this.

···

--
Francesc Alted

Out of curiosity, did you consider/need to use
posix_fadvise/posix_madvise to give the OS a hint that the application
is managing the memory/caching and so the OS should NOT attempt to?
(another way of doing this I guess is using 'direct I/O' -- O_DIRECT,
though that is not available very many places).

Mark

···

On Tue, 2010-12-07 at 12:53 -0800, Werner Benger wrote:

I've been facing similar issues, like scanning through 500GB HDF5
files with 32GB available RAM.
That works ok with HDF5, but I've implemented my own memory management
strategy to tell
it which parts to keep in memory and which parts to unload (of course,
using random access to
the datasets). Turned out that the "remove the least-used-object"
strategy is not necessarily
the best one (as the OS would follow with pages), but some
classification on similarity of objects
that are kept or to be discarded from memory seems much more
efficient.

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511