Fast Sparse Matrix Products by Finding Allocated Chunks

Hi,

I am using Python h5py to use HDF5, but I am planning on pushing into C/C++.

I am using HDF5 to store sparse matrices which I need to do matrix products
on. I am using chunked storage which 'appears' to be storing the data in a
block sparse format. PLEASE CONFIRM that this is true. I couldn't find
documentation stating this to be true, but by looking at file sizes during
data loading, my block sparse assumption seemed to be true.

I would like to matrix multiply and use the sparsity of the data to make it
go faster. I can handle the algorithmic aspect, but I can't figure out how
to see which chunks are allocated so I can iterate over these.

If there is a better way to go at this (existing code!), please let me
know. I am new to HDF5, and thoroughly impressed.

Thank you,

Aidan Plenert Macdonald
Website <http://acsweb.ucsd.edu/~amacdona/>

Have a look at this reference . . .

http://www.hdfgroup.org/HDF5/doc_resource/H5Fill_Values.html

as well as documentation on H5Pset_fill_value and H5Pset_fill_time.

I have a vague recollection that if you create a large, chunked dataset but then only write to certain parts of it, HDF5 is smart enough to store only those chunks in the file that actually have non-fill values within them. The above ref seems to be consistent with this (except in parallel I/O settings).

Is this what you mean by a 'sparse format'?

However, I am not sure why you need to know how HDF5 has handled the chunks *in*the*file, unless you are attempting to write an out-of-core matrix multiply.

I think you can easily determine which blocks are 'empty' by examining a block you've read into memory for all fill value or not. Any block which consists entirely of fill-value is, of course, an empty block. And, then you can use that information to help bootstrap your sparse matrix multiply. So, you could maybe read the matrix several blocks at a time, rather than all at once, examining returned blocks for all-fill-value or not and then building up your sparse in memory representation from that. If you read the matrix in one H5Dread call, however, then you'd wind up with a fully instatiated matrix with many fill values in memory *before* you could be being to reduce that storage to a sparse format.

I wonder if it might be possible to write your own custom 'filter' that you applied during H5Dread that would do all this for you as chunks are read from the file? It might be.

Mark

···

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Aidan Macdonald <aidan.plenert.macdonald@gmail.com<mailto:aidan.plenert.macdonald@gmail.com>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Wednesday, August 12, 2015 9:05 AM
To: "hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>" <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] Fast Sparse Matrix Products by Finding Allocated Chunks

Hi,

I am using Python h5py to use HDF5, but I am planning on pushing into C/C++.

I am using HDF5 to store sparse matrices which I need to do matrix products on. I am using chunked storage which 'appears' to be storing the data in a block sparse format. PLEASE CONFIRM that this is true. I couldn't find documentation stating this to be true, but by looking at file sizes during data loading, my block sparse assumption seemed to be true.

I would like to matrix multiply and use the sparsity of the data to make it go faster. I can handle the algorithmic aspect, but I can't figure out how to see which chunks are allocated so I can iterate over these.

If there is a better way to go at this (existing code!), please let me know. I am new to HDF5, and thoroughly impressed.

Thank you,

Aidan Plenert Macdonald
Website<http://acsweb.ucsd.edu/~amacdona/>

I don't have experience optimizing sparse matrix products with hdf5, but I imagine you would want to read back the sparse representation of each matrix so that the multiplies would be efficient, or is it like Mark describes, you want a compressed representation on disk and to work with full matrices in memory? If the former, you would create a sparse representation in memory yourself and write that to the dataset, I would think of the chunk as more of a low level data storage thing - chunks are of a fixed size (although they can be compressed with something like gzip), a sparse matrix representation varies in size depending on how dense the original matrix was, I would just read the whole sparse matrix back - only worry about the chunk layout to optimize I/O performance.

best,

David

···

On 08/12/15 10:16, Miller, Mark C. wrote:

Have a look at this reference . . .

http://www.hdfgroup.org/HDF5/doc_resource/H5Fill_Values.html

as well as documentation on H5Pset_fill_value and H5Pset_fill_time.

I have a vague recollection that if you create a large, chunked dataset but then only write to certain parts of it, HDF5 is smart enough to store only those chunks in the file that actually have non-fill values within them. The above ref seems to be consistent with this (except in parallel I/O settings).

Is this what you mean by a 'sparse format'?

However, I am not sure why you need to know how HDF5 has handled the chunks *in*the*file, unless you are attempting to write an out-of-core matrix multiply.

I think you can easily determine which blocks are 'empty' by examining a block you've read into memory for all fill value or not. Any block which consists entirely of fill-value is, of course, an empty block. And, then you can use that information to help bootstrap your sparse matrix multiply. So, you could maybe read the matrix several blocks at a time, rather than all at once, examining returned blocks for all-fill-value or not and then building up your sparse in memory representation from that. If you read the matrix in one H5Dread call, however, then you'd wind up with a fully instatiated matrix with many fill values in memory *before* you could be being to reduce that storage to a sparse format.

I wonder if it might be possible to write your own custom 'filter' that you applied during H5Dread that would do all this for you as chunks are read from the file? It might be.

Mark

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org <mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Aidan Macdonald <aidan.plenert.macdonald@gmail.com <mailto:aidan.plenert.macdonald@gmail.com>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org <mailto:hdf-forum@lists.hdfgroup.org>>
Date: Wednesday, August 12, 2015 9:05 AM
To: "hdf-forum@lists.hdfgroup.org <mailto:hdf-forum@lists.hdfgroup.org>" <hdf-forum@lists.hdfgroup.org <mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] Fast Sparse Matrix Products by Finding Allocated Chunks

    Hi,

    I am using Python h5py to use HDF5, but I am planning on pushing
    into C/C++.

    I am using HDF5 to store sparse matrices which I need to do matrix
    products on. I am using chunked storage which 'appears' to be
    storing the data in a block sparse format. PLEASE CONFIRM that
    this is true. I couldn't find documentation stating this to be
    true, but by looking at file sizes during data loading, my block
    sparse assumption seemed to be true.

    I would like to matrix multiply and use the sparsity of the data
    to make it go faster. I can handle the algorithmic aspect, but I
    can't figure out how to see which chunks are allocated so I can
    iterate over these.

    If there is a better way to go at this (existing code!), please
    let me know. I am new to HDF5, and thoroughly impressed.

    Thank you,

    Aidan Plenert Macdonald
    Website <http://acsweb.ucsd.edu/~amacdona/>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter:https://twitter.com/hdf5

4 posts were split to a new topic: Delete delete delete complete sentence

or is it like Mark
describes, you want a compressed representation on disk and to work with
full matrices in memory?

Well, I wasn't suggesting sparse just on disk but dense in memory. But, my response may have been confusing.

I was just trying to get around the fact that unless a data-producer winds up writing additional stuff to the file, like maybe a used-blocks-map, then a reader has no means of knowing which blocks are empty or not except to attempt to read some and then examine for all-fill-value or not.

The down side is that you wind up instantiating fully-populated blocks in memory when the block didn't exist in the file. But, if you do this piece-wise as opposed to a single H5Dread call for the entire matrix *and* if you design your H5Dread calls to align with one or more full blocks, you can build up the equivalent sparse representation in memory *without* paying an initial price of instantiating a full, dense matrix and then taring it down to build the final sparse representation.

Does that make sense?

Mark

Ok, you might have a look at the implementation of the h5dump.c or h5ls.c tools because I have some vague recollection that those tools are able to tease out of the file *some* information regarding block storage of datasets. I don't know if they get at the key information you want; which blocks are empty but some of what those tools do might get you closer.

Mark

···

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Aidan Macdonald <aidan.plenert.macdonald@gmail.com<mailto:aidan.plenert.macdonald@gmail.com>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Wednesday, August 12, 2015 11:35 AM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: Re: [Hdf-forum] Fast Sparse Matrix Products by Finding Allocated Chunks

Is this what you mean by a 'sparse format'?

Yes, exactly.

However, I am not sure why you need to know how HDF5 has handled the chunks *in*the*file, unless you are attempting to write an out-of-core matrix multiply.

Yes, I am trying to write an out-of-core matrix multiply.

I think you can easily determine which blocks are 'empty' by examining a block you've read into memory for all fill value or not. Any block which consists entirely of fill-value is, of course, an empty block. And, then you can use that information to help bootstrap your sparse matrix multiply. So, you could maybe read the matrix several blocks at a time, rather than all at once, examining returned blocks for all-fill-value or not and then building up your sparse in memory representation from that. If you read the matrix in one H5Dread call, however, then you'd wind up with a fully instantiated matrix with many fill values in memory *before* you could be being to reduce that storage to a sparse format.

I think, but can't prove, that if I did the check, I would create more CPU cycles that I would save. Because I need to read, check, and then dump or multiply out. I was hoping for something that could give me a list of the allocated chunks, then I could do a "dictionary of keys"<https://en.wikipedia.org/wiki/Sparse_matrix#Dictionary_of_keys_.28DOK.29> block matrix multiplication.

According to Table 15 here<https://www.hdfgroup.org/HDF5/doc/UG/10_Datasets.html>, if the space is not allocated, then an error is thrown. So perhaps this error will be faster than actually reading the data from disk. But to do that, I need the fill_value undefined. I was hoping for a better way to see if the chunk is actually allocated.

The best way in my mind is to get some sort of list of all the chunks that are allocated. Or a iterator that goes through them, then I can do the block matrix multiplication.

Aidan Plenert Macdonald
Website<http://acsweb.ucsd.edu/~amacdona/>

On Wed, Aug 12, 2015 at 10:16 AM, Miller, Mark C. <miller86@llnl.gov<mailto:miller86@llnl.gov>> wrote:
Have a look at this reference . . .

http://www.hdfgroup.org/HDF5/doc_resource/H5Fill_Values.html

as well as documentation on H5Pset_fill_value and H5Pset_fill_time.

I have a vague recollection that if you create a large, chunked dataset but then only write to certain parts of it, HDF5 is smart enough to store only those chunks in the file that actually have non-fill values within them. The above ref seems to be consistent with this (except in parallel I/O settings).

Is this what you mean by a 'sparse format'?

However, I am not sure why you need to know how HDF5 has handled the chunks *in*the*file, unless you are attempting to write an out-of-core matrix multiply.

I think you can easily determine which blocks are 'empty' by examining a block you've read into memory for all fill value or not. Any block which consists entirely of fill-value is, of course, an empty block. And, then you can use that information to help bootstrap your sparse matrix multiply. So, you could maybe read the matrix several blocks at a time, rather than all at once, examining returned blocks for all-fill-value or not and then building up your sparse in memory representation from that. If you read the matrix in one H5Dread call, however, then you'd wind up with a fully instatiated matrix with many fill values in memory *before* you could be being to reduce that storage to a sparse format.

I wonder if it might be possible to write your own custom 'filter' that you applied during H5Dread that would do all this for you as chunks are read from the file? It might be.

Mark

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Aidan Macdonald <aidan.plenert.macdonald@gmail.com<mailto:aidan.plenert.macdonald@gmail.com>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Wednesday, August 12, 2015 9:05 AM
To: "hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>" <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] Fast Sparse Matrix Products by Finding Allocated Chunks

Hi,

I am using Python h5py to use HDF5, but I am planning on pushing into C/C++.

I am using HDF5 to store sparse matrices which I need to do matrix products on. I am using chunked storage which 'appears' to be storing the data in a block sparse format. PLEASE CONFIRM that this is true. I couldn't find documentation stating this to be true, but by looking at file sizes during data loading, my block sparse assumption seemed to be true.

I would like to matrix multiply and use the sparsity of the data to make it go faster. I can handle the algorithmic aspect, but I can't figure out how to see which chunks are allocated so I can iterate over these.

If there is a better way to go at this (existing code!), please let me know. I am new to HDF5, and thoroughly impressed.

Thank you,

Aidan Plenert Macdonald
Website<http://acsweb.ucsd.edu/~amacdona/>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Hmm. Is a distributed parallel solution an option for you? Or, are you by chance attempting on a GPU? ~1TB is 10^6 unknowns, or thereabouts. That is a fairly large matrix to attempt in a single node setting (hence the out-of-core approach you are taking I guess).

Also, I wouldn't necessarily read chunks one by one. Read say 1,000 chunks at a time, tearing down the empty ones (e.g. all-fill) as you go.

Do you have any control over the data in the file? If so, why not have the data producer write an auxiliarly dataset to the file; a bit-field, that is one for a non-empty chunk in the matrix and zero for empty chunks. You could read that dataset first (its small). Then, use it to bootstrap the read and compute process.

Just spit-ballin' here :wink:

Mark

···

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Aidan Macdonald <aidan.plenert.macdonald@gmail.com<mailto:aidan.plenert.macdonald@gmail.com>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Wednesday, August 12, 2015 11:50 AM
To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: Re: [Hdf-forum] Fast Sparse Matrix Products by Finding Allocated Chunks

To respond to David Schneider's comment, I thought about this, but it requires, as far as I can imagine, that I create a bunch of matrices in the HDF5 file and use them as part of the sparse index labels.

Also, loading into memory is not possible as the matrix is way too big ~10 Gb and eventually getting to ~1 Tb. I need to do everything out of core.

The down side is that you wind up instantiating fully-populated blocks in memory when the block didn't exist in the file. But, if you do this piece-wise as opposed to a single H5Dread call for the entire matrix *and* if you design your H5Dread calls to align with one or more full blocks, you can build up the equivalent sparse representation in memory *without* paying an initial price of instantiating a full, dense matrix and then taring it down to build the final sparse representation.

One of the major problems is the dimensionality. For a *small* dataset, we have 4,000,000 columns eventually scaling to ~10^9 columns, note that this is sparse data, so most columns are zeros per row. I am blocking in chunks of 1 row and several columns to simply read in every chunk one by one would still take way too long (read time, not a memory issue).

I like the ideas, but I am afraid that the compute time will be way too much. Currently, I am at 7sec for a read and multiply per row and with millions to billions of rows, this is too long. Perhaps I didn't understand you correctly.

Thanks for helping too.

Aidan Plenert Macdonald
Website<http://acsweb.ucsd.edu/~amacdona/>

On Wed, Aug 12, 2015 at 11:35 AM, Aidan Macdonald <aidan.plenert.macdonald@gmail.com<mailto:aidan.plenert.macdonald@gmail.com>> wrote:
Is this what you mean by a 'sparse format'?

Yes, exactly.

However, I am not sure why you need to know how HDF5 has handled the chunks *in*the*file, unless you are attempting to write an out-of-core matrix multiply.

Yes, I am trying to write an out-of-core matrix multiply.

I think you can easily determine which blocks are 'empty' by examining a block you've read into memory for all fill value or not. Any block which consists entirely of fill-value is, of course, an empty block. And, then you can use that information to help bootstrap your sparse matrix multiply. So, you could maybe read the matrix several blocks at a time, rather than all at once, examining returned blocks for all-fill-value or not and then building up your sparse in memory representation from that. If you read the matrix in one H5Dread call, however, then you'd wind up with a fully instantiated matrix with many fill values in memory *before* you could be being to reduce that storage to a sparse format.

I think, but can't prove, that if I did the check, I would create more CPU cycles that I would save. Because I need to read, check, and then dump or multiply out. I was hoping for something that could give me a list of the allocated chunks, then I could do a "dictionary of keys"<https://en.wikipedia.org/wiki/Sparse_matrix#Dictionary_of_keys_.28DOK.29> block matrix multiplication.

According to Table 15 here<https://www.hdfgroup.org/HDF5/doc/UG/10_Datasets.html>, if the space is not allocated, then an error is thrown. So perhaps this error will be faster than actually reading the data from disk. But to do that, I need the fill_value undefined. I was hoping for a better way to see if the chunk is actually allocated.

The best way in my mind is to get some sort of list of all the chunks that are allocated. Or a iterator that goes through them, then I can do the block matrix multiplication.

Aidan Plenert Macdonald
Website<http://acsweb.ucsd.edu/~amacdona/>

On Wed, Aug 12, 2015 at 10:16 AM, Miller, Mark C. <miller86@llnl.gov<mailto:miller86@llnl.gov>> wrote:
Have a look at this reference . . .

http://www.hdfgroup.org/HDF5/doc_resource/H5Fill_Values.html

as well as documentation on H5Pset_fill_value and H5Pset_fill_time.

I have a vague recollection that if you create a large, chunked dataset but then only write to certain parts of it, HDF5 is smart enough to store only those chunks in the file that actually have non-fill values within them. The above ref seems to be consistent with this (except in parallel I/O settings).

Is this what you mean by a 'sparse format'?

However, I am not sure why you need to know how HDF5 has handled the chunks *in*the*file, unless you are attempting to write an out-of-core matrix multiply.

I think you can easily determine which blocks are 'empty' by examining a block you've read into memory for all fill value or not. Any block which consists entirely of fill-value is, of course, an empty block. And, then you can use that information to help bootstrap your sparse matrix multiply. So, you could maybe read the matrix several blocks at a time, rather than all at once, examining returned blocks for all-fill-value or not and then building up your sparse in memory representation from that. If you read the matrix in one H5Dread call, however, then you'd wind up with a fully instatiated matrix with many fill values in memory *before* you could be being to reduce that storage to a sparse format.

I wonder if it might be possible to write your own custom 'filter' that you applied during H5Dread that would do all this for you as chunks are read from the file? It might be.

Mark

From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org<mailto:hdf-forum-bounces@lists.hdfgroup.org>> on behalf of Aidan Macdonald <aidan.plenert.macdonald@gmail.com<mailto:aidan.plenert.macdonald@gmail.com>>
Reply-To: HDF Users Discussion List <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Date: Wednesday, August 12, 2015 9:05 AM
To: "hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>" <hdf-forum@lists.hdfgroup.org<mailto:hdf-forum@lists.hdfgroup.org>>
Subject: [Hdf-forum] Fast Sparse Matrix Products by Finding Allocated Chunks

Hi,

I am using Python h5py to use HDF5, but I am planning on pushing into C/C++.

I am using HDF5 to store sparse matrices which I need to do matrix products on. I am using chunked storage which 'appears' to be storing the data in a block sparse format. PLEASE CONFIRM that this is true. I couldn't find documentation stating this to be true, but by looking at file sizes during data loading, my block sparse assumption seemed to be true.

I would like to matrix multiply and use the sparsity of the data to make it go faster. I can handle the algorithmic aspect, but I can't figure out how to see which chunks are allocated so I can iterate over these.

If there is a better way to go at this (existing code!), please let me know. I am new to HDF5, and thoroughly impressed.

Thank you,

Aidan Plenert Macdonald
Website<http://acsweb.ucsd.edu/~amacdona/>

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

A post was split to a new topic: Delete this post right now

A post was split to a new topic: Delete this topic

A post was merged into an existing topic: Privacy Policy