1) It is very important to me to create a "standard" hdf file that can be read (!) by the standard hdf library without any add-ons for decompression. I need this because there are many old versions of our product in the market that rely on the standard features of hdf for opening files, i.e. they can open gzip-compressed chunks because that is part of hdfs functionality. They would not be able to open chunks that I have compressed with my own compression algorithm. (FYI: I can not patch these old versions with a new decompression filter.)
Correct. If you used a compression algorithm which is not available to your clients,
they'd be in trouble when attempting to read those datasets. Newer versions of the
library (1.8.11+) support the dynamic loading of filters
however, this would require a minimum library version on the client side.
2) My guess would be that I could use gzip for compression (which I will run outside of the lib in order to run it in parallel and then I write the chunks into the file using H5DOwrite_chunk) and in the hdf file I set the filtermask to that for gzip. Then I should be able to read the file with a standard hdf library and this will by itself do the decompression?
3) I have come to love hdf for it's extremely forgiving implementation. Over the years we have fiddled with chunk sizes. We never had to communicate a file format change to our customers because the library covered our back. That was really nice. What will happen if I write my own compressed chunks? Will I need to deliver a decompressor? Will I be able to change chunk sizes without breaking backward compatibility?
Yes, you will need to provide a decompressor, either by compiling it into the version
of the HDF5 library you distribute with your application, or as a plugin (shared library)
to be loaded at runtime.
I'm not sure I understand what you mean by "breaking backward compatibility."
At the API level, H5Dread/write won't see a difference.
A change in chunk size might have an adverse effect on performance, for example,
if you've hard-tuned your application's dataset chunk cache sizes.
From: Hdf-forum [email@example.com] on behalf of Gerd Heber [firstname.lastname@example.org]
Sent: Monday, March 16, 2015 6:20 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Queuing chunks for compression and writing
Peter, there's an API call that lets you write chunks directly
into the file including chunks which you have compressed outside
the HDF5 filter pipeline. Have a look at:
See how fast you can write with H5DOwrite_chunk and then do
a back-of-the-envelope calculation to see how elaborate
a queueing mechanism you want.
From: Hdf-forum [mailto:email@example.com] On Behalf Of Peter Majer
Sent: Monday, March 16, 2015 11:53 AM
Subject: [Hdf-forum] Queuing chunks for compression and writing
We have been experiencing and suffering from the fact that writing compressed files with hdf is significantly slower than writing uncompressed. I have been asking myself for a while whether there is a simple remedy. Would it be possible to have two queues of chunks when writing a file, one for compression and one for actual writing to achieve the following:
1) I enqueue N chunks for CompressionAndWriting. They initially enter CompressQueue.
2) The chunks from CompressQueue are concurrently compressed by multiple compression threads and subsequently enqueued in a WriteQueue.
3) A WriteThread sequentially writes all compressed chunks from WriteQueue to the file system.
This should allow to keep the WriteThread constantly busy and it should allow compressed writing to be faster than uncompressed writing by a factor that is more or less identical to the compression rate.
Interfacewise it would be nice to have "StartWrite" and "FinishWrite" methods where "Startwrite" simply copies the data into the CompressQueue and returns immediately thereafter while FinishWrite would be blocking until the write operation for the corresponding chunk has actually completed.
Would this be possible?
Would it be feasible?
Would it be easy?
Dr. Peter Majer
Image Analysis Scientist and Software Architect
This message is intended only for the use of the addressee and may contain information that is confidential and/or subject to copyright. If you are not the intended recipient, you are hereby notified that any dissemination, copying, or redistribution of this message is strictly prohibited. If you have received this message in error please delete all copies immediately. Any views or opinions presented in this email are solely those of the author and do not necessarily represent those of Andor Technology Limited Companies. Andor Technology Limited has taken reasonable precautions to ensure that no viruses are contained in this email, but does not accept any responsibility once this email has been transmitted. Andor Technology Limited is a registered company in Northern Ireland, registration number: NI022466. Registered Office: Andor Technology, 7 Millennium Way, Springvale Business Park, Belfast, BT12 7AL.
Please refer to www.oxinst.com/email-statement<http://www.oxinst.com/email-statement> for regulatory information.