Text to binary

I have ~98K ASCII files ( ~4K each so ~380MB in total)

I convert those files into a single binary file (HDF5_FILE)
using basic functions:

(..)
char buffer[FILE_LEN][LINE_LEN];
foreach (file in list_of_files)
    buffer=get(all_lines_in_the_file);
    dataset = H5LTmake_dataset(HDF5_FILE,DATASET_NAME,2,dimsfx,H5T_NATIVE_CHAR,buffer);
end
(..)

but the HDF5_FILE final size is ~1GB ... almost 3 times the size of the
ASCII files put all together.

Could someone please sheds light on that point ?

Barbara

It looks like you are taking ASCII data (which may be numerical) and
then storing to hdf5 datasets as character data. If the ASCII input is
(for the most part) numerical, you need to parse the numbers and convert
them from their string form (e.g. a 'char *foo="1.2345"') to their
numerical form (e.g. float foo=1.2345) and then write the dataset as
H5T_NATIVE_FLOAT. Otherwise, all you are doing is storing ASCII data to
hdf5 datasets and then also paying for all the additional HDF5 metadata.

···

On Tue, 2010-12-14 at 16:12 -0800, Collignon, Barbara C. wrote:

I have ~98K ASCII files ( ~4K each so ~380MB in total)

I convert those files into a single binary file (HDF5_FILE)
using basic functions:

(..)
char buffer[FILE_LEN][LINE_LEN];
foreach (file in list_of_files)
    buffer=get(all_lines_in_the_file);
    dataset = H5LTmake_dataset(HDF5_FILE,DATASET_NAME,2,dimsfx,H5T_NATIVE_CHAR,buffer);
end
(..)

but the HDF5_FILE final size is ~1GB ... almost 3 times the size of the
ASCII files put all together.

Could someone please sheds light on that point ?

Barbara

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

Here we go...Thanks :slight_smile:

···

________________________________________
From: hdf-forum-bounces@hdfgroup.org [hdf-forum-bounces@hdfgroup.org] On Behalf Of Mark Miller [miller86@llnl.gov]
Sent: Tuesday, December 14, 2010 7:19 PM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] Text to binary

It looks like you are taking ASCII data (which may be numerical) and
then storing to hdf5 datasets as character data. If the ASCII input is
(for the most part) numerical, you need to parse the numbers and convert
them from their string form (e.g. a 'char *foo="1.2345"') to their
numerical form (e.g. float foo=1.2345) and then write the dataset as
H5T_NATIVE_FLOAT. Otherwise, all you are doing is storing ASCII data to
hdf5 datasets and then also paying for all the additional HDF5 metadata.

On Tue, 2010-12-14 at 16:12 -0800, Collignon, Barbara C. wrote:

I have ~98K ASCII files ( ~4K each so ~380MB in total)

I convert those files into a single binary file (HDF5_FILE)
using basic functions:

(..)
char buffer[FILE_LEN][LINE_LEN];
foreach (file in list_of_files)
    buffer=get(all_lines_in_the_file);
    dataset = H5LTmake_dataset(HDF5_FILE,DATASET_NAME,2,dimsfx,H5T_NATIVE_CHAR,buffer);
end
(..)

but the HDF5_FILE final size is ~1GB ... almost 3 times the size of the
ASCII files put all together.

Could someone please sheds light on that point ?

Barbara

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
miller86@llnl.gov urgent: miller86@pager.llnl.gov
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org