Deflate and partial chunk writes

Hello!

I've found an interesting situation that seems like something of a bug to me. I've figured out how to work around it, but I wanted to bring it up in case it comes up for anyone else.

I use the Fortran API, and I typically create HDF5 datasets with large, multidimensional chunk sizes, but I only write part of that chunk size at any given time. For example, I'll use a chunk size of 1000 x 200 x 50 but only write 1000 x 200 x 1 elements at a time. This seems to work fine, although on networked filesystems, I sometimes notice that my application is I/O-limited. The solution is to buffer our HDF5 writes locally and then write a full chunk at a time.

Recently, I decided to try out the deflate/zlib filter. I've noticed that when I buffer the data locally and write a full chunk at a time, it works beautifully and compresses nicely. But if I do not write a full chunk at a time (say just 1000 x 200 x 1 elements), then my HDF5 file explodes in size. When I examine it with h5stat, I see that the 'raw data' size is about what I'd expect (tens of megabytes), but the 'unaccounted space' size is a few gigabytes.

From what I can tell, it looks like the deflate filter is applied to the full chunk, despite that I haven't written the whole thing yet, and as I add more to it, it doesn't overwrite, remove, or re-optimize the parts it has already written. It's as if it deflates a full chunk for each small-ish write. I haven't seen anything in the documentation or the forum to confirm this, but this seems like a problem. If it isn't something easily addressed, I think there should perhaps be a warning about this inefficiency in the documentation for the deflate filter.

Thanks!

···

--
Patrick Vacek
Engineering Scientist Associate
Applied Research Labs, University of Texas

Hi,

I've had a similar experience with this writing streams of 2D data, and I've also noticed that performance is much slower if I don't write whole chunks at a time. I would have thought (assuming you've sized the chunk cache suitably) that each 1000x200x1 write would gradually fill up a 1000x200x50 chunk, then some time later that whole chunk would be deflated once when its evicted from the cache and written to disk once. But based on the performance I see I can only guess it's not working like this, so I also just buffer whole chunks myself.

Dan

···

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Patrick Vacek
Sent: 22 March 2016 20:55
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] Deflate and partial chunk writes

Hello!

I've found an interesting situation that seems like something of a bug to me. I've figured out how to work around it, but I wanted to bring it up in case it comes up for anyone else.

I use the Fortran API, and I typically create HDF5 datasets with large, multidimensional chunk sizes, but I only write part of that chunk size at any given time. For example, I'll use a chunk size of 1000 x 200 x 50 but only write 1000 x 200 x 1 elements at a time. This seems to work fine, although on networked filesystems, I sometimes notice that my application is I/O-limited. The solution is to buffer our HDF5 writes locally and then write a full chunk at a time.

Recently, I decided to try out the deflate/zlib filter. I've noticed that when I buffer the data locally and write a full chunk at a time, it works beautifully and compresses nicely. But if I do not write a full chunk at a time (say just 1000 x 200 x 1 elements), then my HDF5 file explodes in size. When I examine it with h5stat, I see that the 'raw data' size is about what I'd expect (tens of megabytes), but the 'unaccounted space' size is a few gigabytes.

From what I can tell, it looks like the deflate filter is applied to the full chunk, despite that I haven't written the whole thing yet, and as I add more to it, it doesn't overwrite, remove, or re-optimize the parts it has already written. It's as if it deflates a full chunk for each small-ish write. I haven't seen anything in the documentation or the forum to confirm this, but this seems like a problem. If it isn't something easily addressed, I think there should perhaps be a warning about this inefficiency in the documentation for the deflate filter.

Thanks!

--
Patrick Vacek
Engineering Scientist Associate
Applied Research Labs, University of Texas

Patrick,

What you are seeing is that, since the chunk cache is not large enough to hold a single chunk in memory, every write needs to go directly to disk. Without compression, this works but causes a write to disk with every write you make to the dataset, instead of a single write for the whole chunk. It could be even worse if the slice through the chunk you are writing is not contiguous, which looks to be the case here.

With compression, since each chunk is compressed and decompressed as a single unit, every write call forces the library to read the chunk from disk, decompress, write to the buffer, recompress the modified buffer, and write it back to disk. Since the chunk can change size in doing this, it may be necessary to move it around the file causing fragmentation and unused space in the file.

To fix this, you should increase the chunk cache size (via H5Pset_cache or H5Pset_chunk_cache) to be able to hold at least one full chunk, or more if you are striping writes across multiple chunks or otherwise need to hold multiple chunks in cache. This will allow the library to hold the chunk in memory between write calls, and avoid flushing to disk until the chunk is complete.

Dan,

If the chunk cache is sized correctly then it should not flush the chunk prematurely. Do you have an example program that shows this problem?

Thanks,
-Neil

···

________________________________________
From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Daniel Tetlow <daniel.tetlow@optasense.com>
Sent: Wednesday, March 23, 2016 11:04 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Deflate and partial chunk writes

Hi,

I've had a similar experience with this writing streams of 2D data, and I've also noticed that performance is much slower if I don't write whole chunks at a time. I would have thought (assuming you've sized the chunk cache suitably) that each 1000x200x1 write would gradually fill up a 1000x200x50 chunk, then some time later that whole chunk would be deflated once when its evicted from the cache and written to disk once. But based on the performance I see I can only guess it's not working like this, so I also just buffer whole chunks myself.

Dan

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Patrick Vacek
Sent: 22 March 2016 20:55
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] Deflate and partial chunk writes

Hello!

I've found an interesting situation that seems like something of a bug to me. I've figured out how to work around it, but I wanted to bring it up in case it comes up for anyone else.

I use the Fortran API, and I typically create HDF5 datasets with large, multidimensional chunk sizes, but I only write part of that chunk size at any given time. For example, I'll use a chunk size of 1000 x 200 x 50 but only write 1000 x 200 x 1 elements at a time. This seems to work fine, although on networked filesystems, I sometimes notice that my application is I/O-limited. The solution is to buffer our HDF5 writes locally and then write a full chunk at a time.

Recently, I decided to try out the deflate/zlib filter. I've noticed that when I buffer the data locally and write a full chunk at a time, it works beautifully and compresses nicely. But if I do not write a full chunk at a time (say just 1000 x 200 x 1 elements), then my HDF5 file explodes in size. When I examine it with h5stat, I see that the 'raw data' size is about what I'd expect (tens of megabytes), but the 'unaccounted space' size is a few gigabytes.

From what I can tell, it looks like the deflate filter is applied to the full chunk, despite that I haven't written the whole thing yet, and as I add more to it, it doesn't overwrite, remove, or re-optimize the parts it has already written. It's as if it deflates a full chunk for each small-ish write. I haven't seen anything in the documentation or the forum to confirm this, but this seems like a problem. If it isn't something easily addressed, I think there should perhaps be a warning about this inefficiency in the documentation for the deflate filter.

Thanks!

--
Patrick Vacek
Engineering Scientist Associate
Applied Research Labs, University of Texas

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Neil,

Thanks for your insight. I've tried using the h5pset_cache_f function, and that certainly helps, but I'm still getting plenty of excess unused space. It's an order of magnitude better -- hundred of megabytes instead of the gigabytes I was seeing before. If I buffer the data myself before writing, it will compress to about 50M (from a baseline of around 140M uncompressed). Hence, that is still my preferred strategy for now.

Perhaps I'm misusing the cache settings? I've tried a few options, but currently, I'm using something like this:

call h5pset_cache_f(fap_list, 0, int(19997, size_t), int(1024 * 1024 * 1024, size_t), .75, hdferr)

That should be far greater than necessary for my datasets.

(Side note: Is there a H5D_CHUNK_CACHE_W0_DEFAULT defined for Fortran?)

Thanks for your help!
--Patrick

···

On 4/11/2016 1:28 PM, Neil Fortner wrote:

Patrick,

What you are seeing is that, since the chunk cache is not large enough to hold a single chunk in memory, every write needs to go directly to disk. Without compression, this works but causes a write to disk with every write you make to the dataset, instead of a single write for the whole chunk. It could be even worse if the slice through the chunk you are writing is not contiguous, which looks to be the case here.

With compression, since each chunk is compressed and decompressed as a single unit, every write call forces the library to read the chunk from disk, decompress, write to the buffer, recompress the modified buffer, and write it back to disk. Since the chunk can change size in doing this, it may be necessary to move it around the file causing fragmentation and unused space in the file.

To fix this, you should increase the chunk cache size (via H5Pset_cache or H5Pset_chunk_cache) to be able to hold at least one full chunk, or more if you are striping writes across multiple chunks or otherwise need to hold multiple chunks in cache. This will allow the library to hold the chunk in memory between write calls, and avoid flushing to disk until the chunk is complete.

Dan,

If the chunk cache is sized correctly then it should not flush the chunk prematurely. Do you have an example program that shows this problem?

Thanks,
-Neil

________________________________________
From: Hdf-forum <hdf-forum-bounces@lists.hdfgroup.org> on behalf of Daniel Tetlow <daniel.tetlow@optasense.com>
Sent: Wednesday, March 23, 2016 11:04 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Deflate and partial chunk writes

Hi,

I've had a similar experience with this writing streams of 2D data, and I've also noticed that performance is much slower if I don't write whole chunks at a time. I would have thought (assuming you've sized the chunk cache suitably) that each 1000x200x1 write would gradually fill up a 1000x200x50 chunk, then some time later that whole chunk would be deflated once when its evicted from the cache and written to disk once. But based on the performance I see I can only guess it's not working like this, so I also just buffer whole chunks myself.

Dan

-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf Of Patrick Vacek
Sent: 22 March 2016 20:55
To: hdf-forum@lists.hdfgroup.org
Subject: [Hdf-forum] Deflate and partial chunk writes

Hello!

I've found an interesting situation that seems like something of a bug to me. I've figured out how to work around it, but I wanted to bring it up in case it comes up for anyone else.

I use the Fortran API, and I typically create HDF5 datasets with large, multidimensional chunk sizes, but I only write part of that chunk size at any given time. For example, I'll use a chunk size of 1000 x 200 x 50 but only write 1000 x 200 x 1 elements at a time. This seems to work fine, although on networked filesystems, I sometimes notice that my application is I/O-limited. The solution is to buffer our HDF5 writes locally and then write a full chunk at a time.

Recently, I decided to try out the deflate/zlib filter. I've noticed that when I buffer the data locally and write a full chunk at a time, it works beautifully and compresses nicely. But if I do not write a full chunk at a time (say just 1000 x 200 x 1 elements), then my HDF5 file explodes in size. When I examine it with h5stat, I see that the 'raw data' size is about what I'd expect (tens of megabytes), but the 'unaccounted space' size is a few gigabytes.

  From what I can tell, it looks like the deflate filter is applied to the full chunk, despite that I haven't written the whole thing yet, and as I add more to it, it doesn't overwrite, remove, or re-optimize the parts it has already written. It's as if it deflates a full chunk for each small-ish write. I haven't seen anything in the documentation or the forum to confirm this, but this seems like a problem. If it isn't something easily addressed, I think there should perhaps be a warning about this inefficiency in the documentation for the deflate filter.

Thanks!

--
Patrick Vacek
Engineering Scientist Associate
Applied Research Labs, University of Texas

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5