Thanks, Mark. The Boeing encryption handles the problem nicely by letting
you pass in your encryption "key" when you register the filter with HDF5.
It also supports having multiple keys, so conceivably you could allow
someone access to parts of the data, but not others. Anyone interest
should take a look at the link Gerd sent (
http://www.hdfgroup.uiuc.edu/HDF5/projects/boeing/encryption/).
Block encryption itself--such as AES--doesn't change the size of the data
at all. However, paired with compression, the order is critical. You
absolutely need to compress it first, then encrypt it.
Warm Regards,
Jim
-----Original Message-----
From: Hdf-forum [mailto:hdf-forum-bounces@lists.hdfgroup.org] On Behalf
Of Miller, Mark C.
Sent: Friday, March 21, 2014 11:17 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] symmetric encryption filters?
Hmm. This is an interesting discussion. Let me see if I can add two centsŠ
The HDF5 library allows you to define your own 'filters' which operate on
the data in-transit as it is written to and read from the file. The filters
are just call backs made from the HDF5 library to your user-defined code to
operate on chunks of the dataset as they are emitted underneath the
H5Dwrite and H5Dread calls.
If you write data via some user-defined filter then any reader will need
to have access to the code that does the reverse operation (decrypt in your
case and, of course any decryption keys). So, there is already implied in
this that if you define some 'weird' filter, none of the existing HDF5
tools will be able to read your data (hdfview, h5dump, or third party
applications that read
HDF5
like IDL, MATLAB, VisIt, etc.). But, given that you are talking about
encryption here, I suspect that such an outcome is actually perfectly fine.
So, only applications that have access to your reader code (decryption
filters)
will be able to read the data.
And, why not handle that the way something like ssh does it now. Your
reader 'filter' would have to acquire the key from ~/.ssh/id_rsa and then
use what it gets to decrypt the chunks getting read during H5Dread. Failure
to acquire the key would result in a filter error and ultimately a read
error in H5Dread's error stack. You could do some work to detect this case
and report a useful error message (e.g. "no appropriate key to read
encrypted data").
Would you have a single HDF5 file with datasets encrypted for different
ids?
If so, I think the ssh-like mechanim still works.
Because 'filter' operations apply only to the raw data of a dataset, the
metadata is not encrypted. This means things like the names, dimensions,
datatypes, etc (and any attributes defined on the datasets) cannot be
encrypted via the 'filter'
approach. Perhaps this is why another responder mentioned the introduction
of a Virtual File Driver that collects metadata together and encrypts that
separately.
I could see how that could be important in certain circumstances.
Some other issues are that 'filters' can be applied only when dataset are
'chunked'.
And, the filters are then applied independently to each chunk. So, what
you get for a single dataset is a bunch of chunks, each chunk independently
encrypted.
So, you
don't have the whole dataset encrypted in one fell swoop. I don't think
that would cause problems but thought I would mention it.
HDF5 can be 'smart' about applying filters and wind up NOT applying a
requested filter in circumstances where you tell it the filter is optional.
So, you have to take care to be sure your filter won't be treated by HDF5
that way and wind up skpping and encryption filter it should not have. Just
be sure to set up the filters correctly when you define them to HDF5.
Will encryption *increase* the size of the data being written? I don't
think it does but I guess its always possible depending on what you are
doing. If so,
HDF5 may not
be able to tolerate that. It may expect chunks to be equal to or less than
in size that the un-filtered chunks and error-out (or skip such a filter)
if that is not the case. So, just be sure too review the documentation on
these details.
I guess this is a long winded way of saying I think you could make it work
within the limitations of some of the issues I mention above. And, I think
you can invent a way to handle the keys that can probably be made to work.
Hope that was helpful.
Mark
On 3/21/14 3:23 AM, "huebbe" <nathanael.huebbe@informatik.uni-hamburg.de> > wrote:
>While it is possible to perform some encryption in a filter, the filter
>mechanism is not designed for encryption. The problem is the key:
>Filters don't get arbitrary data from the calling application to do the
>decryption, they get only data that is stored in the file. Otherwise,
>the HDF5 library would not be able to do the decoding in a completely
>transparent way. And if you put the key into the file (as filter
>options, or similar), the NSA will be happy.
>
>To use the filter mechanism for encryption, you would need to get the
>key via a side-channel. This is possible, but it will be hard to do
>this in a usable and portable fashion. For instance, you cannot just
>pop up a dialog asking for a key, because many programs using HDF5
>don't even have a text terminal connected to them while they run.
>
>Also note that filtering does not touch the metadata in the file. I. e.
>the NSA will be able to see the entire description of what is encoded
>in the file, they will just not have the actual data.
>
>If you want security, just use gpg to encrypt the entire file.
>
>Cheers,
>Nathanael Hübbe
>
>
>
>On 03/21/2014 12:44 AM, Rowe, Jim wrote:
>> Hello has anyone used a symmetric encryption filter with HDF5? I
>> would like to introduce encryption (AES, DES, 3DES) in the pipeline
>> after zlib compression to encrypt some datasets.
>>
>>
>>
>> Any examples, starting points, or suggestions would help.
>>
>>
>>
>>
>>
>> Thanks!
>>
>> --Jim
>>
>>
>>
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> Hdf-forum@lists.hdfgroup.org
>>
>>http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup
.
>>org
>>
>
>
>--
>Please be aware that the enemies of your civil rights and your freedom
>are on CC of all unencrypted communication. Protect yourself.
>
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org