multi-pass IO (was chunking)

The discussion about chunking and two pass VFDs reminded me that I intended to make a small library for doing independent dataset creates, on a per process basis, available. It was created some time ago and used extensively on one project, but currently not in use.

I've tidied the code up a bit and uploaded it to the following page
https://hpcforge.org/plugins/mediawiki/wiki/libh5mb/index.php/Main_Page
the source code is available via the SCM link.

Some brief notes on the library are shown on the wiki page, but the actual API is probably best described in the H5MButil.h file. I created the wiki page very quickly so apologies if the content is unclear, please let me know if it needs improvement.

Hopefully someone will find the code useful.

JB

···

--
John Biddiscombe, email:biddisco @ cscs.ch

CSCS, Swiss National Supercomputing Centre | Tel: +41 (91) 610.82.07
Via Cantonale, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82

John,

This is awesome! Thanks so much for putting it up.

I really wish the HDF5 Group had decided a long while ago to make this
kind of thing available UNDER the HDF5 API via...
    a) adding either a H5Xcreate_deferred for an part, X, of the API or
       adding a property to X's create property list to indicate a
       desire for deferred creation
       Any object so created cannot be acted upon until subsequent
       H5Xsync_deferred()...
    b) H5Xsync_deferred() function to synchronize all deferred created
       objects.
But, in spite of numerous suggestions over many years that it'd be good
for parallel applications to be able to do this, it still hasn't found
its way into the HDF5 library proper :wink:

Its so nice to see someone offer a suitable alternative :wink:

Mark

···

On Thu, 2011-02-24 at 14:39, Biddiscombe, John A. wrote:

The discussion about chunking and two pass VFDs reminded me that I intended to make a small library for doing independent dataset creates, on a per process basis, available. It was created some time ago and used extensively on one project, but currently not in use.

I've tidied the code up a bit and uploaded it to the following page
https://hpcforge.org/plugins/mediawiki/wiki/libh5mb/index.php/Main_Page
the source code is available via the SCM link.

Some brief notes on the library are shown on the wiki page, but the actual API is probably best described in the H5MButil.h file. I created the wiki page very quickly so apologies if the content is unclear, please let me know if it needs improvement.

Hopefully someone will find the code useful.

JB

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

Hello John (and others, since maybe other people can answer the following
questions)

Your library seems very interesting and I will probably use it in my
project. Yet I have a question: what would be the best way of writing one
file from all the processes together (in terms of write latency), knowing
the data layout (regular 2D arrays),
- using the classic PHDF5 library and write by hyperslabs/chunks/patterns
- or using your library to split a dataset into "/procNNN/dataset"? It seems
to me that writing regular patterns can benefit from MPI-IO's particular
optimizations, but maybe I misunderstood the goal of your library?

Thank you,

Matthieu

···

2011/2/25 Mark Miller <miller86@llnl.gov>

John,

This is awesome! Thanks so much for putting it up.

I really wish the HDF5 Group had decided a long while ago to make this
kind of thing available UNDER the HDF5 API via...
   a) adding either a H5Xcreate_deferred for an part, X, of the API or
      adding a property to X's create property list to indicate a
      desire for deferred creation
      Any object so created cannot be acted upon until subsequent
      H5Xsync_deferred()...
   b) H5Xsync_deferred() function to synchronize all deferred created
      objects.
But, in spite of numerous suggestions over many years that it'd be good
for parallel applications to be able to do this, it still hasn't found
its way into the HDF5 library proper :wink:

Its so nice to see someone offer a suitable alternative :wink:

Mark

On Thu, 2011-02-24 at 14:39, Biddiscombe, John A. wrote:
> The discussion about chunking and two pass VFDs reminded me that I
intended to make a small library for doing independent dataset creates, on a
per process basis, available. It was created some time ago and used
extensively on one project, but currently not in use.
>
> I've tidied the code up a bit and uploaded it to the following page
> https://hpcforge.org/plugins/mediawiki/wiki/libh5mb/index.php/Main_Page
> the source code is available via the SCM link.
>
> Some brief notes on the library are shown on the wiki page, but the
actual API is probably best described in the H5MButil.h file. I created the
wiki page very quickly so apologies if the content is unclear, please let me
know if it needs improvement.
>
> Hopefully someone will find the code useful.
>
> JB
--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Matthieu Dorier
ENS Cachan, antenne de Bretagne
Département informatique et télécommunication
http://perso.eleves.bretagne.ens-cachan.fr/~mdori307/wiki/

Hi Mark,

John,

This is awesome! Thanks so much for putting it up.

  Very nice, yes. :slight_smile:

I really wish the HDF5 Group had decided a long while ago to make this
kind of thing available UNDER the HDF5 API via...
   a) adding either a H5Xcreate_deferred for an part, X, of the API or
      adding a property to X's create property list to indicate a
      desire for deferred creation
      Any object so created cannot be acted upon until subsequent
      H5Xsync_deferred()...
   b) H5Xsync_deferred() function to synchronize all deferred created
      objects.
But, in spite of numerous suggestions over many years that it'd be good
for parallel applications to be able to do this, it still hasn't found
its way into the HDF5 library proper :wink:

  Well, suggestions don't feed the developers... :slight_smile:

Its so nice to see someone offer a suitable alternative :wink:

  This is very similar to what you and I have talked about in the past, and the "transaction" idea I proposed earlier in this thread is almost identical also.

  Quincey

···

On Feb 24, 2011, at 5:18 PM, Mark Miller wrote:

Mark

On Thu, 2011-02-24 at 14:39, Biddiscombe, John A. wrote:

The discussion about chunking and two pass VFDs reminded me that I intended to make a small library for doing independent dataset creates, on a per process basis, available. It was created some time ago and used extensively on one project, but currently not in use.

I've tidied the code up a bit and uploaded it to the following page
https://hpcforge.org/plugins/mediawiki/wiki/libh5mb/index.php/Main_Page
the source code is available via the SCM link.

Some brief notes on the library are shown on the wiki page, but the actual API is probably best described in the H5MButil.h file. I created the wiki page very quickly so apologies if the content is unclear, please let me know if it needs improvement.

Hopefully someone will find the code useful.

JB

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Matthieu

what would be the best way of writing one file from all the processes together (in terms of write latency), knowing the data layout (regular 2D arrays),

if all processes are writing one piece of a single dataset (assuming I understood your question correctly), then the usual collective create of the dataset, followed by a hyperslab selection on each process and a write of individual pieces.

write by hyperslabs/chunks/patterns

is I think what you want.

Writing one dataset per process was something I wanted - an example of why might be most illustrative...

Suppose I'm working in paraview and have done some work in parallel on multi-block data, each process has a different block from the multi-block structure. They might be geometrically diverse (eg. tetrahedral on one process, prisms on another). I want to write out my current state, but don't want to do a collective write to one dataset. I really want to write each block out independently, but all to the same file.
Because each process has no idea what the others have got, I needed a way to gather the info and creat the 'structure' then write.

In the general case it'll be slower (physically more writes to disk), but for the purposes of organisation, much tidier.

JB

···

From: hdf-forum-bounces@hdfgroup.org [mailto:hdf-forum-bounces@hdfgroup.org] On Behalf Of Matthieu Dorier
Sent: 25 February 2011 10:10
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] multi-pass IO (was chunking)

Hello John (and others, since maybe other people can answer the following questions)

Your library seems very interesting and I will probably use it in my project. Yet I have a question: what would be the best way of writing one file from all the processes together (in terms of write latency), knowing the data layout (regular 2D arrays),
- using the classic PHDF5 library and write by hyperslabs/chunks/patterns
- or using your library to split a dataset into "/procNNN/dataset"? It seems to me that writing regular patterns can benefit from MPI-IO's particular optimizations, but maybe I misunderstood the goal of your library?

Thank you,

Matthieu
2011/2/25 Mark Miller <miller86@llnl.gov<mailto:miller86@llnl.gov>>
John,

This is awesome! Thanks so much for putting it up.

I really wish the HDF5 Group had decided a long while ago to make this
kind of thing available UNDER the HDF5 API via...
   a) adding either a H5Xcreate_deferred for an part, X, of the API or
      adding a property to X's create property list to indicate a
      desire for deferred creation
      Any object so created cannot be acted upon until subsequent
      H5Xsync_deferred()...
   b) H5Xsync_deferred() function to synchronize all deferred created
      objects.
But, in spite of numerous suggestions over many years that it'd be good
for parallel applications to be able to do this, it still hasn't found
its way into the HDF5 library proper :wink:

Its so nice to see someone offer a suitable alternative :wink:

Mark

On Thu, 2011-02-24 at 14:39, Biddiscombe, John A. wrote:

The discussion about chunking and two pass VFDs reminded me that I intended to make a small library for doing independent dataset creates, on a per process basis, available. It was created some time ago and used extensively on one project, but currently not in use.

I've tidied the code up a bit and uploaded it to the following page
https://hpcforge.org/plugins/mediawiki/wiki/libh5mb/index.php/Main_Page
the source code is available via the SCM link.

Some brief notes on the library are shown on the wiki page, but the actual API is probably best described in the H5MButil.h file. I created the wiki page very quickly so apologies if the content is unclear, please let me know if it needs improvement.

Hopefully someone will find the code useful.

JB

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov<mailto:miller86@llnl.gov> urgent: miller86@pager.llnl.gov<mailto:miller86@pager.llnl.gov>
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org<mailto:Hdf-forum@hdfgroup.org>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Matthieu Dorier
ENS Cachan, antenne de Bretagne
Département informatique et télécommunication
http://perso.eleves.bretagne.ens-cachan.fr/~mdori307/wiki/<http://perso.eleves.bretagne.ens-cachan.fr/~mdori307/wiki/>

Hello,
I am also interested in parallel IO in my project.
I know HDF5 supports multiple opens on a single file by different processes right out of box. If my memory serves me correctly, HDF5 requires Parallel File IO (PFIO) in order to perform multiple writes onto a single file. Given a multi-core linux with mpich2 or openmpi installed, what else do I need to install in order to perform parallel file-writes? And is Windows platform out of luck, except Windows HPC server? Thanks a lot.

Best,
x

···

From: hdf-forum-bounces@hdfgroup.org [mailto:hdf-forum-bounces@hdfgroup.org] On Behalf Of Biddiscombe, John A.
Sent: Friday, February 25, 2011 5:11 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] multi-pass IO (was chunking)

Matthieu

what would be the best way of writing one file from all the processes together (in terms of write latency), knowing the data layout (regular 2D arrays),

if all processes are writing one piece of a single dataset (assuming I understood your question correctly), then the usual collective create of the dataset, followed by a hyperslab selection on each process and a write of individual pieces.

write by hyperslabs/chunks/patterns

is I think what you want.

Writing one dataset per process was something I wanted - an example of why might be most illustrative...

Suppose I'm working in paraview and have done some work in parallel on multi-block data, each process has a different block from the multi-block structure. They might be geometrically diverse (eg. tetrahedral on one process, prisms on another). I want to write out my current state, but don't want to do a collective write to one dataset. I really want to write each block out independently, but all to the same file.
Because each process has no idea what the others have got, I needed a way to gather the info and creat the 'structure' then write.

In the general case it'll be slower (physically more writes to disk), but for the purposes of organisation, much tidier.

JB

From: hdf-forum-bounces@hdfgroup.org [mailto:hdf-forum-bounces@hdfgroup.org] On Behalf Of Matthieu Dorier
Sent: 25 February 2011 10:10
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] multi-pass IO (was chunking)

Hello John (and others, since maybe other people can answer the following questions)

Your library seems very interesting and I will probably use it in my project. Yet I have a question: what would be the best way of writing one file from all the processes together (in terms of write latency), knowing the data layout (regular 2D arrays),
- using the classic PHDF5 library and write by hyperslabs/chunks/patterns
- or using your library to split a dataset into "/procNNN/dataset"? It seems to me that writing regular patterns can benefit from MPI-IO's particular optimizations, but maybe I misunderstood the goal of your library?

Thank you,

Matthieu
2011/2/25 Mark Miller <miller86@llnl.gov<mailto:miller86@llnl.gov>>
John,

This is awesome! Thanks so much for putting it up.

I really wish the HDF5 Group had decided a long while ago to make this
kind of thing available UNDER the HDF5 API via...
   a) adding either a H5Xcreate_deferred for an part, X, of the API or
      adding a property to X's create property list to indicate a
      desire for deferred creation
      Any object so created cannot be acted upon until subsequent
      H5Xsync_deferred()...
   b) H5Xsync_deferred() function to synchronize all deferred created
      objects.
But, in spite of numerous suggestions over many years that it'd be good
for parallel applications to be able to do this, it still hasn't found
its way into the HDF5 library proper :wink:

Its so nice to see someone offer a suitable alternative :wink:

Mark

On Thu, 2011-02-24 at 14:39, Biddiscombe, John A. wrote:

The discussion about chunking and two pass VFDs reminded me that I intended to make a small library for doing independent dataset creates, on a per process basis, available. It was created some time ago and used extensively on one project, but currently not in use.

I've tidied the code up a bit and uploaded it to the following page
https://hpcforge.org/plugins/mediawiki/wiki/libh5mb/index.php/Main_Page
the source code is available via the SCM link.

Some brief notes on the library are shown on the wiki page, but the actual API is probably best described in the H5MButil.h file. I created the wiki page very quickly so apologies if the content is unclear, please let me know if it needs improvement.

Hopefully someone will find the code useful.

JB

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov<mailto:miller86@llnl.gov> urgent: miller86@pager.llnl.gov<mailto:miller86@pager.llnl.gov>
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org<mailto:Hdf-forum@hdfgroup.org>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Matthieu Dorier
ENS Cachan, antenne de Bretagne
Département informatique et télécommunication
http://perso.eleves.bretagne.ens-cachan.fr/~mdori307/wiki/<http://perso.eleves.bretagne.ens-cachan.fr/~mdori307/wiki/>

Hi John,

Matthieu

> what would be the best way of writing one file from all the processes together (in terms of write latency), knowing the data layout (regular 2D arrays),

if all processes are writing one piece of a single dataset (assuming I understood your question correctly), then the usual collective create of the dataset, followed by a hyperslab selection on each process and a write of individual pieces.
> write by hyperslabs/chunks/patterns
is I think what you want.

Writing one dataset per process was something I wanted – an example of why might be most illustrative...

Suppose I’m working in paraview and have done some work in parallel on multi-block data, each process has a different block from the multi-block structure. They might be geometrically diverse (eg. tetrahedral on one process, prisms on another). I want to write out my current state, but don’t want to do a collective write to one dataset. I really want to write each block out independently, but all to the same file.
Because each process has no idea what the others have got, I needed a way to gather the info and creat the ‘structure’ then write.

In the general case it’ll be slower (physically more writes to disk), but for the purposes of organisation, much tidier.

  Mark and I have kicked around the idea of creating a "virtual" dataset, which is composed of other datasets in the file, stitched together and presented as a single dataset to the application. That way, applications could access the underlying piece (either directly, by reading from the underlying dataset; or through a selection of the virtual dataset) or, access the virtual dataset as if it was a single large dataset. This would be a looser form of chunking, in an abstract sense.

  Quincey

···

On Feb 25, 2011, at 4:11 AM, Biddiscombe, John A. wrote:

JB

From: hdf-forum-bounces@hdfgroup.org [mailto:hdf-forum-bounces@hdfgroup.org] On Behalf Of Matthieu Dorier
Sent: 25 February 2011 10:10
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] multi-pass IO (was chunking)

Hello John (and others, since maybe other people can answer the following questions)

Your library seems very interesting and I will probably use it in my project. Yet I have a question: what would be the best way of writing one file from all the processes together (in terms of write latency), knowing the data layout (regular 2D arrays),
- using the classic PHDF5 library and write by hyperslabs/chunks/patterns
- or using your library to split a dataset into "/procNNN/dataset"? It seems to me that writing regular patterns can benefit from MPI-IO's particular optimizations, but maybe I misunderstood the goal of your library?

Thank you,

Matthieu

2011/2/25 Mark Miller <miller86@llnl.gov>
John,

This is awesome! Thanks so much for putting it up.

I really wish the HDF5 Group had decided a long while ago to make this
kind of thing available UNDER the HDF5 API via...
   a) adding either a H5Xcreate_deferred for an part, X, of the API or
      adding a property to X's create property list to indicate a
      desire for deferred creation
      Any object so created cannot be acted upon until subsequent
      H5Xsync_deferred()...
   b) H5Xsync_deferred() function to synchronize all deferred created
      objects.
But, in spite of numerous suggestions over many years that it'd be good
for parallel applications to be able to do this, it still hasn't found
its way into the HDF5 library proper :wink:

Its so nice to see someone offer a suitable alternative :wink:

Mark

On Thu, 2011-02-24 at 14:39, Biddiscombe, John A. wrote:
> The discussion about chunking and two pass VFDs reminded me that I intended to make a small library for doing independent dataset creates, on a per process basis, available. It was created some time ago and used extensively on one project, but currently not in use.
>
> I've tidied the code up a bit and uploaded it to the following page
> https://hpcforge.org/plugins/mediawiki/wiki/libh5mb/index.php/Main_Page
> the source code is available via the SCM link.
>
> Some brief notes on the library are shown on the wiki page, but the actual API is probably best described in the H5MButil.h file. I created the wiki page very quickly so apologies if the content is unclear, please let me know if it needs improvement.
>
> Hopefully someone will find the code useful.
>
> JB
--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Matthieu Dorier
ENS Cachan, antenne de Bretagne
Département informatique et télécommunication
http://perso.eleves.bretagne.ens-cachan.fr/~mdori307/wiki/
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Hi John,

Matthieu

> what would be the best way of writing one file from all the processes together (in terms of write latency), knowing the data layout (regular 2D arrays),

if all processes are writing one piece of a single dataset (assuming I understood your question correctly), then the usual collective create of the dataset, followed by a hyperslab selection on each process and a write of individual pieces.
> write by hyperslabs/chunks/patterns
is I think what you want.

Writing one dataset per process was something I wanted – an example of why might be most illustrative...

Suppose I’m working in paraview and have done some work in parallel on multi-block data, each process has a different block from the multi-block structure. They might be geometrically diverse (eg. tetrahedral on one process, prisms on another). I want to write out my current state, but don’t want to do a collective write to one dataset. I really want to write each block out independently, but all to the same file.
Because each process has no idea what the others have got, I needed a way to gather the info and creat the ‘structure’ then write.

In the general case it’ll be slower (physically more writes to disk), but for the purposes of organisation, much tidier.

  Mark and I have kicked around the idea of creating a "virtual" dataset, which is composed of other datasets in the file, stitched together and presented as a single dataset to the application. That way, applications could access the underlying piece (either directly, by reading from the underlying dataset; or through a selection of the virtual dataset) or, access the virtual dataset as if it was a single large dataset. This would be a looser form of chunking, in an abstract sense.

  Quincey

···

On Feb 25, 2011, at 4:11 AM, Biddiscombe, John A. wrote:

JB

From: hdf-forum-bounces@hdfgroup.org [mailto:hdf-forum-bounces@hdfgroup.org] On Behalf Of Matthieu Dorier
Sent: 25 February 2011 10:10
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] multi-pass IO (was chunking)

Hello John (and others, since maybe other people can answer the following questions)

Your library seems very interesting and I will probably use it in my project. Yet I have a question: what would be the best way of writing one file from all the processes together (in terms of write latency), knowing the data layout (regular 2D arrays),
- using the classic PHDF5 library and write by hyperslabs/chunks/patterns
- or using your library to split a dataset into "/procNNN/dataset"? It seems to me that writing regular patterns can benefit from MPI-IO's particular optimizations, but maybe I misunderstood the goal of your library?

Thank you,

Matthieu

2011/2/25 Mark Miller <miller86@llnl.gov>
John,

This is awesome! Thanks so much for putting it up.

I really wish the HDF5 Group had decided a long while ago to make this
kind of thing available UNDER the HDF5 API via...
   a) adding either a H5Xcreate_deferred for an part, X, of the API or
      adding a property to X's create property list to indicate a
      desire for deferred creation
      Any object so created cannot be acted upon until subsequent
      H5Xsync_deferred()...
   b) H5Xsync_deferred() function to synchronize all deferred created
      objects.
But, in spite of numerous suggestions over many years that it'd be good
for parallel applications to be able to do this, it still hasn't found
its way into the HDF5 library proper :wink:

Its so nice to see someone offer a suitable alternative :wink:

Mark

On Thu, 2011-02-24 at 14:39, Biddiscombe, John A. wrote:
> The discussion about chunking and two pass VFDs reminded me that I intended to make a small library for doing independent dataset creates, on a per process basis, available. It was created some time ago and used extensively on one project, but currently not in use.
>
> I've tidied the code up a bit and uploaded it to the following page
> https://hpcforge.org/plugins/mediawiki/wiki/libh5mb/index.php/Main_Page
> the source code is available via the SCM link.
>
> Some brief notes on the library are shown on the wiki page, but the actual API is probably best described in the H5MButil.h file. I created the wiki page very quickly so apologies if the content is unclear, please let me know if it needs improvement.
>
> Hopefully someone will find the code useful.
>
> JB
--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Matthieu Dorier
ENS Cachan, antenne de Bretagne
Département informatique et télécommunication
http://perso.eleves.bretagne.ens-cachan.fr/~mdori307/wiki/
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Quincey, Mark

            Mark and I have kicked around the idea of creating a "virtual" dataset, which is composed of other datasets in the file, stitched together and presented as a single dataset to the application. That way, applications could access the underlying piece (either directly, by reading from the underlying dataset; or through a selection of the virtual dataset) or, access the virtual dataset as if it was a single large dataset. This would be a looser form of chunking, in an abstract sense.

Suppose I modify my H5MB utility to create one dataset per process - and compress them individually, then write them out - but what I'd really like to do is

a) ensure all blocks are the same size, do some padding if necessary

b) promote these blocks from datasets to chunks, so that the hdf library was responsible for the virtual addressing and did all the real work at retrieval time.

it seems like hdf already does everything we want if we had b) in place. once the chunks are on disk and indexed correctly, a user selecting a slab will trigger retrieval of the chunks and as long as the decompression filter is available, handle that too. There'd be no need for a virtual dataset to map access to the sub-datasets underneath.

As you (Quincey) know, I already have some practice of messing about with the hdf internals. If I wanted to do b), is it feasible that instead of each process writing a dataset, I could get hold of the metadata directly and manipulate it to write the pieces as chunks.

I can spend some time on this if I can get decent compression working on parallel IO

Regards

JB
PS. Mark, I followed links to your silo/hdf5 wiki stuff. Interesting. it looks like we're both looking at very similar problems. I will have to play with your PMPIO stuff too. - Are you also looking at other libraries like ADIOS for example. (off topic, you can reply off list if you don't feel it appropriate for here).

Hi John,

Quincey, Mark

            Mark and I have kicked around the idea of creating a "virtual" dataset, which is composed of other datasets in the file, stitched together and presented as a single dataset to the application. That way, applications could access the underlying piece (either directly, by reading from the underlying dataset; or through a selection of the virtual dataset) or, access the virtual dataset as if it was a single large dataset. This would be a looser form of chunking, in an abstract sense.

Suppose I modify my H5MB utility to create one dataset per process – and compress them individually, then write them out – but what I’d really like to do is
a) ensure all blocks are the same size, do some padding if necessary
b) promote these blocks from datasets to chunks, so that the hdf library was responsible for the virtual addressing and did all the real work at retrieval time.

it seems like hdf already does everything we want if we had b) in place. once the chunks are on disk and indexed correctly, a user selecting a slab will trigger retrieval of the chunks and as long as the decompression filter is available, handle that too. There’d be no need for a virtual dataset to map access to the sub-datasets underneath.

  Hmm, so you'd have some new "bind" operation that took as input a bunch of datasets and bound them together as a new dataset?

As you (Quincey) know, I already have some practice of messing about with the hdf internals. If I wanted to do b), is it feasible that instead of each process writing a dataset, I could get hold of the metadata directly and manipulate it to write the pieces as chunks.

I can spend some time on this if I can get decent compression working on parallel IO

  I think it could be done, but it's going to be a fairly intensive bit of coding...

    Quincey

···

On Mar 7, 2011, at 8:47 AM, Biddiscombe, John A. wrote:

Regards

JB
PS. Mark, I followed links to your silo/hdf5 wiki stuff. Interesting. it looks like we’re both looking at very similar problems. I will have to play with your PMPIO stuff too. – Are you also looking at other libraries like ADIOS for example. (off topic, you can reply off list if you don’t feel it appropriate for here).
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

b) promote these blocks from datasets to chunks, so that the hdf library was responsible for the virtual addressing and did all the real work at retrieval time.

it seems like hdf already does everything we want if we had b) in place. once the chunks are on disk and indexed correctly, a user selecting a slab will trigger retrieval of the chunks and as long as the decompression filter is available, handle that too. There'd be no need for a virtual dataset to map access to the sub-datasets underneath.

      Hmm, so you'd have some new "bind" operation that took as input a bunch of datasets and bound them together as a new dataset?

Essentially yes, I had something quite intrusive in mind. What I was thinking was that each process independently creates a dataset and compresses it (it could be just a memory buffer rather than an hdf5 dataset). Collectively a new dataset is created which has the correct extents for the whole data, chunks are 'requested' by each process and instead of allowing hdf to manage the chunks and allocate them, we intercept the chunk generation/allocation (override it) and simply supply our own, using our compressed data buffer. hdf then does all the book keeping as usual and writes/flushes the data to disk. Providing the chunk extents are regular, the compressed data could vary in final size from chunk to chunk, (some tidying up might be necessary in the chunk code).

On load, the user can treat the data as a completely normal dataset, but compressed.

I suspect this is what you meant too, but I thought I'd spell it out more clearly just in case. I will start poking around with the chunking code to see if I can intercept things at convenient places. Please stop me if you think I'm pursuing a bad idea.

JB

Hi John,

···

On Mar 22, 2011, at 3:31 AM, Biddiscombe, John A. wrote:

b) promote these blocks from datasets to chunks, so that the hdf library was responsible for the virtual addressing and did all the real work at retrieval time.

it seems like hdf already does everything we want if we had b) in place. once the chunks are on disk and indexed correctly, a user selecting a slab will trigger retrieval of the chunks and as long as the decompression filter is available, handle that too. There’d be no need for a virtual dataset to map access to the sub-datasets underneath.

      Hmm, so you'd have some new "bind" operation that took as input a bunch of datasets and bound them together as a new dataset?

Essentially yes, I had something quite intrusive in mind. What I was thinking was that each process independently creates a dataset and compresses it (it could be just a memory buffer rather than an hdf5 dataset). Collectively a new dataset is created which has the correct extents for the whole data, chunks are ‘requested’ by each process and instead of allowing hdf to manage the chunks and allocate them, we intercept the chunk generation/allocation (override it) and simply supply our own, using our compressed data buffer. hdf then does all the book keeping as usual and writes/flushes the data to disk. Providing the chunk extents are regular, the compressed data could vary in final size from chunk to chunk, (some tidying up might be necessary in the chunk code).

On load, the user can treat the data as a completely normal dataset, but compressed.

I suspect this is what you meant too, but I thought I’d spell it out more clearly just in case. I will start poking around with the chunking code to see if I can intercept things at convenient places. Please stop me if you think I’m pursuing a bad idea.

  I think it's an interesting idea - poke around and send me any questions you have (off list, if you'd like).

  Quincey

Quincey,

            I think it's an interesting idea - poke around and send me any questions you have (off list, if you'd like).

One thing I didn't think about yet (and I haven't started my poking around yet) - is this.

If you enable compression using chunked data, is each chunk a completely independent compressed dataset, or do the compression filters assume that all chunks are somehow related. I understand that gzip for example stores tables of lookup bit sequences - would these tables be per chunk, or shared by all?

If per chunk, then I'm happy, if global, then I have a problem.

Thanks

JB

Hi John,

···

On Mar 30, 2011, at 6:49 AM, Biddiscombe, John A. wrote:

Quincey,

            I think it's an interesting idea - poke around and send me any questions you have (off list, if you'd like).

One thing I didn’t think about yet (and I haven’t started my poking around yet) – is this.

If you enable compression using chunked data, is each chunk a completely independent compressed dataset, or do the compression filters assume that all chunks are somehow related. I understand that gzip for example stores tables of lookup bit sequences – would these tables be per chunk, or shared by all?

If per chunk, then I’m happy, if global, then I have a problem.

  They are per chunk (you can be happy. :-).

    Quincey