Serialization

Hi all,

this is a bit off-topic, however somehow related to hdf5. I am using boost
serialization for serializing an object tree and saving it as an opaque
dataset in an hdf5 file along with hdf native datasets. Before you start
raging, it is not my choice. Now, boost serialization unfortunately does
not work correctly especially when objects cross shared library boundaries.
So I am planning to replace that and would like to know what else people
have used for the same purpose.

At some point it would be great to introduce a serialization package for
hdf5, as soon as some concepts are clarified, such as what is the
corresponding hdf5 objects for a C++ class and object...

Thanks a lot for any help,

-- dimitris

Hello dimitris,

I may not be on target in this response but I thought I'd at least try
to contribute a notion or two.

I am not sure why the 'serialized' result needs to be 'opaque' to HDF5.
I think it depends upon how much effort you want to go through to make
it NOT opaque to HDF5, but I think you can use a combination of user-
defined (compound data types) and maybe reference types to serialize
almost arbitrary data structurs (e.g. linked lists, trees, etc.) to HDF5
such that they will NOT be opaque. On value in doing this is that if you
write an HDF5 file on one kind of cpu architecture and then read it on
another, if you've done your homework correctly, HDF5 will handle all
the primitive type conversion for you. You don't need to worry about
that even though the data is still stored binary in the HDF5 file. I
have done this myself with only very trivial types of objects containing
int, int*, float, float* and char* data members and it does work nicely.

On the subject of serialization, I have not used boost serialization
stuff. However, in a project I work on, we often wind up having to
serialize various objects to ship them over a network and the strategy I
have seen used is to define operator<< for them (and any subjects) and
then manipulate them in a streambuf of some kind. In cases where we've
had to go across cpu architectures, we use ascii streams, converting all
binary numerical data to ascii and when we stay within the same cpu
architecture, we use binary streams.

Don't know if any of that is helpful but good luck.

Mark

···

On Tue, 2009-08-11 at 11:12 +0200, Dimitris Servis wrote:

Hi all,

this is a bit off-topic, however somehow related to hdf5. I am using
boost serialization for serializing an object tree and saving it as an
opaque dataset in an hdf5 file along with hdf native datasets. Before
you start raging, it is not my choice. Now, boost serialization
unfortunately does not work correctly especially when objects cross
shared library boundaries. So I am planning to replace that and would
like to know what else people have used for the same purpose.

At some point it would be great to introduce a serialization package
for hdf5, as soon as some concepts are clarified, such as what is the
corresponding hdf5 objects for a C++ class and object...

Thanks a lot for any help,

-- dimitris
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Mark C. Miller, Lawrence Livermore National Laboratory
email: mailto:miller86@llnl.gov
(M/T/W) (925)-423-5901 (!!LLNL BUSINESS ONLY!!)
(Th/F) (530)-753-8511 (!!LLNL BUSINESS ONLY!!)

Hi Mark,

thanks a lot for your input! One reason that someone might want to use a
serialization framework instead of hdf5 datatypes is to avoid the
object-to-hdf5 mapping, dealing with cyclic graphs and developing all the
functionality to store STL containers, shared pointers, classes etc. I have
a rule of thumb that says if only one client accesses the data and that
client does not care about granularity, then you might as well go with
serialization instead of hdf5. This rule of thump implies that the lifetime
of your data is also quite limited and the data model of the single client
can change often. I then usually stress that this way, you may avoid the
object-to-hdf5 mapping but if you want to support any kind of versioning,
you need to maintain the class' versions which IMHO is worse. Nevertheless
it is not always possible to convince people. So in our case, this part of
data that is not used by another client, we want serialized in a binary
stream that is saved as opaque.

With regard to serialization you are obviously using it the right way, i.e.
you serialize with boost a state that has a short life time and only want to
send it to another computer. You can do this with both text and binary,
regardless of the architecture: use the XML archiver of boost or write a
portable binary archiver (it's pretty easy). The problems start when objects
cross for example dll boundaries: as boost serialization depends heavily on
templates it is practically very difficult to avoid instantiating something
in more than one dlls and usually that something happens to be the
registration class of your class...

It is pretty important to have some concept for serialization with C++ and
hdf5... maybe something nice like hibernate?

Best Regards

-- dimitris

···

2009/8/11 Mark Miller <miller86@llnl.gov>

Hello dimitris,

I may not be on target in this response but I thought I'd at least try
to contribute a notion or two.

I am not sure why the 'serialized' result needs to be 'opaque' to HDF5.
I think it depends upon how much effort you want to go through to make
it NOT opaque to HDF5, but I think you can use a combination of user-
defined (compound data types) and maybe reference types to serialize
almost arbitrary data structurs (e.g. linked lists, trees, etc.) to HDF5
such that they will NOT be opaque. On value in doing this is that if you
write an HDF5 file on one kind of cpu architecture and then read it on
another, if you've done your homework correctly, HDF5 will handle all
the primitive type conversion for you. You don't need to worry about
that even though the data is still stored binary in the HDF5 file. I
have done this myself with only very trivial types of objects containing
int, int*, float, float* and char* data members and it does work nicely.

On the subject of serialization, I have not used boost serialization
stuff. However, in a project I work on, we often wind up having to
serialize various objects to ship them over a network and the strategy I
have seen used is to define operator<< for them (and any subjects) and
then manipulate them in a streambuf of some kind. In cases where we've
had to go across cpu architectures, we use ascii streams, converting all
binary numerical data to ascii and when we stay within the same cpu
architecture, we use binary streams.

Don't know if any of that is helpful but good luck.

Mark

On Tue, 2009-08-11 at 11:12 +0200, Dimitris Servis wrote:
> Hi all,
>
> this is a bit off-topic, however somehow related to hdf5. I am using
> boost serialization for serializing an object tree and saving it as an
> opaque dataset in an hdf5 file along with hdf native datasets. Before
> you start raging, it is not my choice. Now, boost serialization
> unfortunately does not work correctly especially when objects cross
> shared library boundaries. So I am planning to replace that and would
> like to know what else people have used for the same purpose.
>
> At some point it would be great to introduce a serialization package
> for hdf5, as soon as some concepts are clarified, such as what is the
> corresponding hdf5 objects for a C++ class and object...
>
> Thanks a lot for any help,
>
> -- dimitris
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
--
Mark C. Miller, Lawrence Livermore National Laboratory
email: mailto:miller86@llnl.gov
(M/T/W) (925)-423-5901 (!!LLNL BUSINESS ONLY!!)
(Th/F) (530)-753-8511 (!!LLNL BUSINESS ONLY!!)

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org