VFL: Combining Memory & Disk Files

Kirk_Harrison · April 19, 2010, 5:37pm

Does HDF5 support construction of a virtual HDF5 file composed of both local
disk (e.g., H5FD_SEC2, H5FD_STDIO) and memory-based files (e.g., H5FD_CORE).
possibly through the use of the H5FD_FAMILY driver?

       Application
           >
           >
HDF5 Virtual File Layer (VLF)
      ( H5FD_FAMILY )
      > >
      > >
  H5FD_CORE H5FD_STDIO
      > >
      > >
   Memory Hard Drive

I also would like to create an HDF5 file that has a small and fast memory
cache for incoming live data and periodically write its contents to storage
for longer term retrieval.

It is not clear to me through the documentation whether the H5FD_FAMILY is
only applicable to combining local disk (i.e. Hard Drive) based file
drivers.

The goal that I'm shooting for is seamless access across both a memory-based
file and one that is stored on a hard disk.

Regards, Kirk

werner · April 19, 2010, 5:36pm

You should be able to set up both a harddisk and memory HDF5 file and make
external links from one to the other and vice versa. Would that do the job?

Werner

···

On Mon, 19 Apr 2010 13:37:24 -0400, Kirk Harrison <kharrison@shensol.com> wrote:

Does HDF5 support construction of a virtual HDF5 file composed of both local
disk (e.g., H5FD_SEC2, H5FD_STDIO) and memory-based files (e.g., H5FD_CORE).
possibly through the use of the H5FD_FAMILY driver?

       Application
           >
HDF5 Virtual File Layer (VLF)
      ( H5FD_FAMILY )
      > >
  H5FD_CORE H5FD_STDIO
      > >
   Memory Hard Drive

I also would like to create an HDF5 file that has a small and fast memory
cache for incoming live data and periodically write its contents to storage
for longer term retrieval.

It is not clear to me through the documentation whether the H5FD_FAMILY is
only applicable to combining local disk (i.e. Hard Drive) based file
drivers.

The goal that I'm shooting for is seamless access across both a memory-based
file and one that is stored on a hard disk.

Regards, Kirk

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Kirk_Harrison · April 19, 2010, 7:01pm

Werner,

Thanks for the reply. Yes, that is precisely what I would like to do. I
thought I recalled that "external links" were for non-HDF5 files though.
Perhaps I am mistaken.
So would/could a client use the HDF5 files transparently as though they
were a single file?

Kirk

You should be able to set up both a harddisk and memory HDF5 file and make
external links from one to the other and vice versa. Would that do the
job?

  Werner

Does HDF5 support construction of a virtual HDF5 file composed of both
local disk (e.g., H5FD_SEC2, H5FD_STDIO) and memory-based files
(e.g., H5FD_CORE) possibly through the use of the H5FD_FAMILY driver?

       Application
           >
           >
HDF5 Virtual File Layer (VLF)
      ( H5FD_FAMILY )
      > >
      > >
  H5FD_CORE H5FD_STDIO
      > >
      > >
   Memory Hard Drive

I also would like to create an HDF5 file that has a small and fast memory
cache for incoming live data and periodically write its contents to

storage

···

On Mon, 19 Apr 2010 13:37:24 -0400, Kirk Harrison <kharrison@shensol.com> > wrote:

for longer term retrieval.

It is not clear to me through the documentation whether the H5FD_FAMILY
is only applicable to combining local disk (i.e. Hard Drive) based file
drivers.

The goal that I'm shooting for is seamless access across both a
memory-based file and one that is stored on a hard disk.

Regards, Kirk

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University
(CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Quincey_Koziol · April 19, 2010, 8:12pm

Hi Kirk,

···

On Apr 19, 2010, at 12:37 PM, Kirk Harrison wrote:

Does HDF5 support construction of a virtual HDF5 file composed of both local
disk (e.g., H5FD_SEC2, H5FD_STDIO) and memory-based files (e.g., H5FD_CORE).
possibly through the use of the H5FD_FAMILY driver?

      Application
          >
          >
HDF5 Virtual File Layer (VLF)
     ( H5FD_FAMILY )
     > >
     > >
H5FD_CORE H5FD_STDIO
     > >
     > >
  Memory Hard Drive

I also would like to create an HDF5 file that has a small and fast memory
cache for incoming live data and periodically write its contents to storage
for longer term retrieval.

It is not clear to me through the documentation whether the H5FD_FAMILY is
only applicable to combining local disk (i.e. Hard Drive) based file
drivers.

The goal that I'm shooting for is seamless access across both a memory-based
file and one that is stored on a hard disk.

Werner's suggestion of external links would work. You should also be able to use the "split" file driver (H5Pset_fapl_split) with the core VFD for metadata and sec2 VFD for raw data, if that's closer to your access pattern.

Quincey

Francesc_Alted2 · April 20, 2010, 7:05am

A Monday 19 April 2010 19:37:24 Kirk Harrison escrigué:

Does HDF5 support construction of a virtual HDF5 file composed of both
local disk (e.g., H5FD_SEC2, H5FD_STDIO) and memory-based files (e.g.,
H5FD_CORE). possibly through the use of the H5FD_FAMILY driver?

       Application

HDF5 Virtual File Layer (VLF)
      ( H5FD_FAMILY )

  H5FD_CORE H5FD_STDIO

   Memory Hard Drive

I also would like to create an HDF5 file that has a small and fast memory
cache for incoming live data and periodically write its contents to storage
for longer term retrieval.

It is not clear to me through the documentation whether the H5FD_FAMILY is
only applicable to combining local disk (i.e. Hard Drive) based file
drivers.

The goal that I'm shooting for is seamless access across both a
memory-based file and one that is stored on a hard disk.

If I were you I would not bother too much in having different media for
keeping your files: just take advantage of OS filesystem cache. When you have
a small file that is accessed frequently, it is loaded in memory by the OS, so
the access to it is made at memory speed, no disk speed. In addition, letting
the OS to load in memory the data that is most accessed in your filesystem is
probably the best way towards a sensible usage of computer resources.

···

--
Francesc Alted

werner · April 19, 2010, 6:48pm

I haven't worked with external links myself yet, but I think you can
just link an object from another HDF5 file (like a group or dataset)
into an HDF5 file, and once you iterate over the main HDF5 file, the
application would no longer know where the data actually come from,
it's just one logical HDF5 file.

Cited from:

http://www.hdfgroup.org/HDF5/doc/RM/RM_H5L.html#Link-CreateExternal

"H5Lcreate_external creates a new soft link to an external object, w
hich is an object in a different HDF5 file from the location of the link. "

Werner

···

On Mon, 19 Apr 2010 15:01:01 -0400, <kharrison@shensol.com> wrote:

Werner,

Thanks for the reply. Yes, that is precisely what I would like to do. I
thought I recalled that "external links" were for non-HDF5 files though.
Perhaps I am mistaken.
So would/could a client use the HDF5 files transparently as though they
were a single file?

Kirk

You should be able to set up both a harddisk and memory HDF5 file and make
external links from one to the other and vice versa. Would that do the
job?

  Werner

On Mon, 19 Apr 2010 13:37:24 -0400, Kirk Harrison <kharrison@shensol.com> >> wrote:

Does HDF5 support construction of a virtual HDF5 file composed of both
local disk (e.g., H5FD_SEC2, H5FD_STDIO) and memory-based files
(e.g., H5FD_CORE) possibly through the use of the H5FD_FAMILY driver?

       Application
           >
HDF5 Virtual File Layer (VLF)
      ( H5FD_FAMILY )
      > >
  H5FD_CORE H5FD_STDIO
      > >
   Memory Hard Drive

I also would like to create an HDF5 file that has a small and fast memory
cache for incoming live data and periodically write its contents to

storage

for longer term retrieval.

It is not clear to me through the documentation whether the H5FD_FAMILY
is only applicable to combining local disk (i.e. Hard Drive) based file
drivers.

The goal that I'm shooting for is seamless access across both a
memory-based file and one that is stored on a hard disk.

Regards, Kirk

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University
(CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Stamminger_Johannes · April 20, 2010, 7:25am

Hi,

just a question related to this topic concerning the filesize: last week
I played around with refernces a little bit. I created a dataset of
1-dim packet table of references. With 1,1 Mio references to hyperslab
regions of other packet tables within the same file. And this increased
the hdf file size dramatically. Seems to me as if compression does not
work (well) for such refs ... or would you expect different result and I
missed something there?

Best regards,
Johannes Stamminger

Kirk_Harrison · April 20, 2010, 3:56pm

Francesc,

I do plan on using (or at least testing) the buffered file (H5FD_STDIO) for
the hard drive based portion of my long-term storage. I also have a
particular requirement to run in a disk-less environment for a much less
amount of (current) data than I plan to store on the disk. I plan to
periodically copying/moving the data from the memory file to the disk file
by developing a server application responsible for its management.

I also have multiple clients which need access to the data and am planning
on mirroring the memory-based portion of the data on each client's
workstation with links to the network based disk storage (when available)
for data beyond the memory file's capacity.

Performance has been an issue in the legacy implementation for getting the
data to the clients which is another reason I am experimenting with this
configuration.

I guess an alternative architecture would be do determine at runtime if a
disk file was available and if not open a memory-based file and use it
exclusively, otherwise exclusively use the networked hard drive... which is
probably more in line with your suggestion. I plan to evaluate this as well.

Of course, I am open to easier and/or better implementation ideas. Thanks
for your input.

Regards, Kirk

···

-----Original Message-----
From: hdf-forum-bounces@hdfgroup.org [mailto:hdf-forum-bounces@hdfgroup.org]
On Behalf Of Francesc Alted
Sent: Tuesday, April 20, 2010 3:06 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] VFL: Combining Memory & Disk Files

A Monday 19 April 2010 19:37:24 Kirk Harrison escrigué:

Does HDF5 support construction of a virtual HDF5 file composed of both
local disk (e.g., H5FD_SEC2, H5FD_STDIO) and memory-based files (e.g.,
H5FD_CORE). possibly through the use of the H5FD_FAMILY driver?

       Application

HDF5 Virtual File Layer (VLF)
      ( H5FD_FAMILY )

  H5FD_CORE H5FD_STDIO

   Memory Hard Drive

I also would like to create an HDF5 file that has a small and fast memory
cache for incoming live data and periodically write its contents to

storage

for longer term retrieval.

It is not clear to me through the documentation whether the H5FD_FAMILY is
only applicable to combining local disk (i.e. Hard Drive) based file
drivers.

The goal that I'm shooting for is seamless access across both a
memory-based file and one that is stored on a hard disk.

If I were you I would not bother too much in having different media for
keeping your files: just take advantage of OS filesystem cache. When you
have
a small file that is accessed frequently, it is loaded in memory by the OS,
so
the access to it is made at memory speed, no disk speed. In addition,
letting
the OS to load in memory the data that is most accessed in your filesystem
is
probably the best way towards a sensible usage of computer resources.

--
Francesc Alted

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

epourmal · April 20, 2010, 1:50pm

Hi Johannes,

As with other variable type data references cannot be compressed since internally they are stored in the heaps while dataset itself stores pointers to the data in a file. If size is a problem, you may come up with your own set of indices stored in a dataset (for example, reference to a hyperslab can be stored as 2 n-dim vectors of corner coordinates, or something like this) and use compression.

Elena

···

On Apr 20, 2010, at 2:25 AM, Stamminger, Johannes wrote:

Hi,

just a question related to this topic concerning the filesize: last week
I played around with refernces a little bit. I created a dataset of
1-dim packet table of references. With 1,1 Mio references to hyperslab
regions of other packet tables within the same file. And this increased
the hdf file size dramatically. Seems to me as if compression does not
work (well) for such refs ... or would you expect different result and I
missed something there?

Best regards,
Johannes Stamminger
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

werner · April 20, 2010, 3:30pm

Kirk,

if your platform is linux-based, did you try creating your temporary files
in /dev/shm ? That's basically keeping them in RAM all time.

I guess what you would like then is to export this ramdisk via NFS to other
clients, but I doubt Linux can do remote NSFs mounting of ramdisks, though
maybe it's possible...

Werner

···

On Tue, 20 Apr 2010 11:56:53 -0400, Kirk Harrison <kharrison@shensol.com> wrote:

Francesc,

I do plan on using (or at least testing) the buffered file (H5FD_STDIO) for
the hard drive based portion of my long-term storage. I also have a
particular requirement to run in a disk-less environment for a much less
amount of (current) data than I plan to store on the disk. I plan to
periodically copying/moving the data from the memory file to the disk file
by developing a server application responsible for its management.

I also have multiple clients which need access to the data and am planning
on mirroring the memory-based portion of the data on each client's
workstation with links to the network based disk storage (when available)
for data beyond the memory file's capacity.

Performance has been an issue in the legacy implementation for getting the
data to the clients which is another reason I am experimenting with this
configuration.

I guess an alternative architecture would be do determine at runtime if a
disk file was available and if not open a memory-based file and use it
exclusively, otherwise exclusively use the networked hard drive... which is
probably more in line with your suggestion. I plan to evaluate this as well.

Of course, I am open to easier and/or better implementation ideas. Thanks
for your input.

Regards, Kirk

-----Original Message-----
From: hdf-forum-bounces@hdfgroup.org [mailto:hdf-forum-bounces@hdfgroup.org]
On Behalf Of Francesc Alted
Sent: Tuesday, April 20, 2010 3:06 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] VFL: Combining Memory & Disk Files

A Monday 19 April 2010 19:37:24 Kirk Harrison escrigué:

Does HDF5 support construction of a virtual HDF5 file composed of both
local disk (e.g., H5FD_SEC2, H5FD_STDIO) and memory-based files (e.g.,
H5FD_CORE). possibly through the use of the H5FD_FAMILY driver?

       Application

HDF5 Virtual File Layer (VLF)
      ( H5FD_FAMILY )

  H5FD_CORE H5FD_STDIO

   Memory Hard Drive

I also would like to create an HDF5 file that has a small and fast memory
cache for incoming live data and periodically write its contents to

storage

for longer term retrieval.

It is not clear to me through the documentation whether the H5FD_FAMILY is
only applicable to combining local disk (i.e. Hard Drive) based file
drivers.

The goal that I'm shooting for is seamless access across both a
memory-based file and one that is stored on a hard disk.

If I were you I would not bother too much in having different media for
keeping your files: just take advantage of OS filesystem cache. When you
have
a small file that is accessed frequently, it is loaded in memory by the OS,
so
the access to it is made at memory speed, no disk speed. In addition,
letting
the OS to load in memory the data that is most accessed in your filesystem
is
probably the best way towards a sensible usage of computer resources.

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Kirk_Harrison · April 20, 2010, 5:14pm

Werner,

Yes, they are linux-based. Interesting idea using a ramdisk for the
short-term data. I had originally played with that concept, but somewhere
in the midst of familiarizing myself with HDF5 picked up on the H5FD_CORE
driver and forgot about it.

I'm not really concerned about an NSF mount to the ramdisk as I plan to
have data flow directly to each application's host and written to a local
memory-based (H5FD_CORE or ramdisk) file anyway. I need the data as close
to processing-time as possible and only need the older data for less time
critical historical recall. I.e., each client host will have its on
constructed copy of a memory-based HDF5 (provided via an API that I am
developing) containing very limited current data, with older data via a
central file-based HDF5 data file (if available) accessed via an HDF5
link.
The concept being that the API that I provide makes data queries across
memory and disk based files transparent to the client.

What are the trade-offs of using the H5FD_CORE driver versus a ramdisk?
(Or is the H5FD_CORE basically implemented via ramdisk 'under the hood'
anyway?)

Regards, Kirk

···

Kirk,

  if your platform is linux-based, did you try creating your temporary
files in /dev/shm ? That's basically keeping them in RAM all time.

I guess what you would like then is to export this ramdisk via NFS to
other
clients, but I doubt Linux can do remote NSFs mounting of ramdisks, though
maybe it's possible...

  Werner

On Tue, 20 Apr 2010 11:56:53 -0400, Kirk Harrison <kharrison@shensol.com> > wrote:

Francesc,

I do plan on using (or at least testing) the buffered file (H5FD_STDIO)
for
the hard drive based portion of my long-term storage. I also have a
particular requirement to run in a disk-less environment for a much less
amount of (current) data than I plan to store on the disk. I plan to
periodically copying/moving the data from the memory file to the disk
file
by developing a server application responsible for its management.

I also have multiple clients which need access to the data and am
planning
on mirroring the memory-based portion of the data on each client's
workstation with links to the network based disk storage (when
available)
for data beyond the memory file's capacity.

Performance has been an issue in the legacy implementation for getting
the
data to the clients which is another reason I am experimenting with this
configuration.

I guess an alternative architecture would be do determine at runtime if
a
disk file was available and if not open a memory-based file and use it
exclusively, otherwise exclusively use the networked hard drive... which
is
probably more in line with your suggestion. I plan to evaluate this as
well.

Of course, I am open to easier and/or better implementation ideas.
Thanks
for your input.

Regards, Kirk

-----Original Message-----
From: hdf-forum-bounces@hdfgroup.org
[mailto:hdf-forum-bounces@hdfgroup.org]
On Behalf Of Francesc Alted
Sent: Tuesday, April 20, 2010 3:06 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] VFL: Combining Memory & Disk Files

A Monday 19 April 2010 19:37:24 Kirk Harrison escrigué:

Does HDF5 support construction of a virtual HDF5 file composed of both
local disk (e.g., H5FD_SEC2, H5FD_STDIO) and memory-based files (e.g.,
H5FD_CORE). possibly through the use of the H5FD_FAMILY driver?

       Application

HDF5 Virtual File Layer (VLF)
      ( H5FD_FAMILY )

  H5FD_CORE H5FD_STDIO

   Memory Hard Drive

I also would like to create an HDF5 file that has a small and fast
memory
cache for incoming live data and periodically write its contents to

storage

for longer term retrieval.

It is not clear to me through the documentation whether the H5FD_FAMILY
is
only applicable to combining local disk (i.e. Hard Drive) based file
drivers.

The goal that I'm shooting for is seamless access across both a
memory-based file and one that is stored on a hard disk.

If I were you I would not bother too much in having different media for
keeping your files: just take advantage of OS filesystem cache. When
you
have
a small file that is accessed frequently, it is loaded in memory by the
OS,
so
the access to it is made at memory speed, no disk speed. In addition,
letting
the OS to load in memory the data that is most accessed in your
filesystem
is
probably the best way towards a sensible usage of computer resources.

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University
(CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Stamminger_Johannes · April 20, 2010, 2:19pm

Hi,

I was not aware of the reference of being a type of variable length.

So I assume it is the path string leading to the varying length?
Naiv spoken I could imagine of having some hdf internal mapping of
reference paths to an unique - fixed size - id to store a reference path
only once. And in the references dataset the id would be used instead
with getting the compression to work there then ... ?

In principle this should be the same as you propose to implement on
application layer, thanks for that. I will go this way then.
Unfortunately this will double application calls (instead of internal
ones) for each read/write of data ...

Best regards,
Johannes Stamminger

···

On Di, 2010-04-20 at 08:50 -0500, Elena Pourmal wrote:

Hi Johannes,

As with other variable type data references cannot be compressed since internally they are stored in the heaps while dataset itself stores pointers to the data in a file. If size is a problem, you may come up with your own set of indices stored in a dataset (for example, reference to a hyperslab can be stored as 2 n-dim vectors of corner coordinates, or something like this) and use compression.

Elena

On Apr 20, 2010, at 2:25 AM, Stamminger, Johannes wrote:

> Hi,
>
> just a question related to this topic concerning the filesize: last week
> I played around with refernces a little bit. I created a dataset of
> 1-dim packet table of references. With 1,1 Mio references to hyperslab
> regions of other packet tables within the same file. And this increased
> the hdf file size dramatically. Seems to me as if compression does not
> work (well) for such refs ... or would you expect different result and I
> missed something there?
>
> Best regards,
> Johannes Stamminger

Francesc_Alted2 · April 20, 2010, 5:03pm

Hi Kirk,

A Tuesday 20 April 2010 17:56:53 Kirk Harrison escrigué:

Francesc,

I do plan on using (or at least testing) the buffered file (H5FD_STDIO) for
the hard drive based portion of my long-term storage. I also have a
particular requirement to run in a disk-less environment for a much less
amount of (current) data than I plan to store on the disk. I plan to
periodically copying/moving the data from the memory file to the disk file
by developing a server application responsible for its management.

I also have multiple clients which need access to the data and am planning
on mirroring the memory-based portion of the data on each client's
workstation with links to the network based disk storage (when available)
for data beyond the memory file's capacity.

Performance has been an issue in the legacy implementation for getting the
data to the clients which is another reason I am experimenting with this
configuration.

I guess an alternative architecture would be do determine at runtime if a
disk file was available and if not open a memory-based file and use it
exclusively, otherwise exclusively use the networked hard drive... which is
probably more in line with your suggestion. I plan to evaluate this as
well.

Of course, I am open to easier and/or better implementation ideas. Thanks
for your input.

No problem. If you have special reasons, then go ahead and try the CORE
driver. My experience was that, for reading, the performance was slightly
worse than using a regular H5FD_SEC2 driver. I suppose the reason is that
HDF5 has to duplicate the exiting data in file to memory, and that takes some
time. But for writing (and without backing_store active, of course) you
*might* get some speedup.

···

--
Francesc Alted

Quincey_Koziol · April 20, 2010, 2:23pm

Hi Johannes,

Hi,

I was not aware of the reference of being a type of variable length.

Actually, only dataset region references are variable length. Object references are fixed size.

Quincey

···

On Apr 20, 2010, at 9:19 AM, Stamminger, Johannes wrote:

So I assume it is the path string leading to the varying length?
Naiv spoken I could imagine of having some hdf internal mapping of
reference paths to an unique - fixed size - id to store a reference path
only once. And in the references dataset the id would be used instead
with getting the compression to work there then ... ?

In principle this should be the same as you propose to implement on
application layer, thanks for that. I will go this way then.
Unfortunately this will double application calls (instead of internal
ones) for each read/write of data ...

Best regards,
Johannes Stamminger

On Di, 2010-04-20 at 08:50 -0500, Elena Pourmal wrote:

Hi Johannes,

As with other variable type data references cannot be compressed since internally they are stored in the heaps while dataset itself stores pointers to the data in a file. If size is a problem, you may come up with your own set of indices stored in a dataset (for example, reference to a hyperslab can be stored as 2 n-dim vectors of corner coordinates, or something like this) and use compression.

Elena

On Apr 20, 2010, at 2:25 AM, Stamminger, Johannes wrote:

Hi,

just a question related to this topic concerning the filesize: last week
I played around with refernces a little bit. I created a dataset of
1-dim packet table of references. With 1,1 Mio references to hyperslab
regions of other packet tables within the same file. And this increased
the hdf file size dramatically. Seems to me as if compression does not
work (well) for such refs ... or would you expect different result and I
missed something there?

Best regards,
Johannes Stamminger

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

werner · April 20, 2010, 4:24pm

Hm, H5FD_CORE would be more portable as it were independent of a
Linux-specific solution, might be more efficient as it avoids going
through an file system emulation layer, and gives more application
control about the memory being used. Another difference might be
that malloce'd memory might end up on the harddisk when being
swapped out by the OS, whereas a ramdisk I think can be configured
to never be swapped to disk and always reside in memory. Though,
that should also be possibly with some shared memory calls or similar
to some malloc'ed memory.

That's most of the differences as I can see for now...

Werner

···

On Tue, 20 Apr 2010 13:14:44 -0400, <kharrison@shensol.com> wrote:

Werner,

Yes, they are linux-based. Interesting idea using a ramdisk for the
short-term data. I had originally played with that concept, but somewhere
in the midst of familiarizing myself with HDF5 picked up on the H5FD_CORE
driver and forgot about it.

I'm not really concerned about an NSF mount to the ramdisk as I plan to
have data flow directly to each application's host and written to a local
memory-based (H5FD_CORE or ramdisk) file anyway. I need the data as close
to processing-time as possible and only need the older data for less time
critical historical recall. I.e., each client host will have its on
constructed copy of a memory-based HDF5 (provided via an API that I am
developing) containing very limited current data, with older data via a
central file-based HDF5 data file (if available) accessed via an HDF5
link.
The concept being that the API that I provide makes data queries across
memory and disk based files transparent to the client.

What are the trade-offs of using the H5FD_CORE driver versus a ramdisk?
(Or is the H5FD_CORE basically implemented via ramdisk 'under the hood'
anyway?)

Regards, Kirk

Kirk,

  if your platform is linux-based, did you try creating your temporary
files in /dev/shm ? That's basically keeping them in RAM all time.

I guess what you would like then is to export this ramdisk via NFS to
other
clients, but I doubt Linux can do remote NSFs mounting of ramdisks, though
maybe it's possible...

  Werner

On Tue, 20 Apr 2010 11:56:53 -0400, Kirk Harrison <kharrison@shensol.com> >> wrote:

Francesc,

I do plan on using (or at least testing) the buffered file (H5FD_STDIO)
for
the hard drive based portion of my long-term storage. I also have a
particular requirement to run in a disk-less environment for a much less
amount of (current) data than I plan to store on the disk. I plan to
periodically copying/moving the data from the memory file to the disk
file
by developing a server application responsible for its management.

I also have multiple clients which need access to the data and am
planning
on mirroring the memory-based portion of the data on each client's
workstation with links to the network based disk storage (when
available)
for data beyond the memory file's capacity.

Performance has been an issue in the legacy implementation for getting
the
data to the clients which is another reason I am experimenting with this
configuration.

I guess an alternative architecture would be do determine at runtime if
a
disk file was available and if not open a memory-based file and use it
exclusively, otherwise exclusively use the networked hard drive... which
is
probably more in line with your suggestion. I plan to evaluate this as
well.

Of course, I am open to easier and/or better implementation ideas.
Thanks
for your input.

Regards, Kirk

-----Original Message-----
From: hdf-forum-bounces@hdfgroup.org
[mailto:hdf-forum-bounces@hdfgroup.org]
On Behalf Of Francesc Alted
Sent: Tuesday, April 20, 2010 3:06 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] VFL: Combining Memory & Disk Files

A Monday 19 April 2010 19:37:24 Kirk Harrison escriguÃ©:

Does HDF5 support construction of a virtual HDF5 file composed of both
local disk (e.g., H5FD_SEC2, H5FD_STDIO) and memory-based files (e.g.,
H5FD_CORE). possibly through the use of the H5FD_FAMILY driver?

       Application

HDF5 Virtual File Layer (VLF)
      ( H5FD_FAMILY )

  H5FD_CORE H5FD_STDIO

   Memory Hard Drive

I also would like to create an HDF5 file that has a small and fast
memory
cache for incoming live data and periodically write its contents to

storage

for longer term retrieval.

It is not clear to me through the documentation whether the H5FD_FAMILY
is
only applicable to combining local disk (i.e. Hard Drive) based file
drivers.

The goal that I'm shooting for is seamless access across both a
memory-based file and one that is stored on a hard disk.

If I were you I would not bother too much in having different media for
keeping your files: just take advantage of OS filesystem cache. When
you
have
a small file that is accessed frequently, it is loaded in memory by the
OS,
so
the access to it is made at memory speed, no disk speed. In addition,
letting
the OS to load in memory the data that is most accessed in your
filesystem
is
probably the best way towards a sensible usage of computer resources.

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University
(CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
___________________________________________________________________________
Dr. Werner Benger Visualization Research
Laboratory for Creative Arts and Technology (LCAT)
Center for Computation & Technology at Louisiana State University (CCT/LSU)
211 Johnston Hall, Baton Rouge, Louisiana 70803
Tel.: +1 225 578 4809 Fax.: +1 225 578-5362

Francesc_Alted2 · April 20, 2010, 6:25pm

A Tuesday 20 April 2010 18:24:06 Werner Benger escrigué:

Hm, H5FD_CORE would be more portable as it were independent of a
Linux-specific solution, might be more efficient as it avoids going
through an file system emulation layer,

In my experience during reading benchmarks, the file system layer overhead
should be negligible compared with the HDF5 one, because the CORE driver
performs (once setup) exactly at the same speed than H5FD_SEC2.

···

and gives more application
control about the memory being used. Another difference might be
that malloce'd memory might end up on the harddisk when being
swapped out by the OS, whereas a ramdisk I think can be configured
to never be swapped to disk and always reside in memory. Though,
that should also be possibly with some shared memory calls or similar
to some malloc'ed memory.

That's most of the differences as I can see for now...

Werner

--
Francesc Alted

Attention! https://support.hdfgroup.org is the NEW home for documentation from The HDF Group. (Details)

VFL: Combining Memory & Disk Files