Depending up on how you define your data structures, it may have a bearing
on the problem. In C++, 8 bit or 16 bit values may be 32 bit aligned. If
you don't pack the structure, there may be 'dead space' stored. (if you are
using structures, that is, it has been many many years since I've done
Fortran).
For creating a datset, here are steps I've taken to get things compressed.
It may not be optimal, but it is better than with what I first started.
if ( bNeedToCreateDataSet ) {
CompType *pdt = DD::DefineDataType(); // my class defines the structure
pdt->pack(); // get rid of dross as mentioned above
DataSpace *pds = new H5::DataSpace( H5S_SIMPLE );
hsize_t curSize = 0;
hsize_t maxSize = H5S_UNLIMITED; // provide unlimited growth, 1 dim
array
pds->setExtentSimple( 1, &curSize, &maxSize );
DSetCreatPropList pl;
hsize_t sizeChunk = CHDF5DataManager::H5ChunkSize(); // constant is
elsewhere
pl.setChunk( 1, &sizeChunk ); // chunking allows growth and compression
pl.setShuffle(); // interesting maneuver to get rid of leading/trailing
0's
pl.setDeflate(5); // compression, I have no idea what is optimal value
dataset
= new DataSet( dm.GetH5File()->createDataSet( sPathName, *pdt, *pds,
pl ) );
dataset->close();
pds->close();
pdt->close();
delete pds;
delete pdt;
delete dataset;
}
···
-----Original Message-----
From: Nikhil Laghave [mailto:nikhill@iastate.edu]
Sent: Friday, June 20, 2008 17:22
To: HDF Forum
Subject: [hdf-forum] very large file size
Hello Everybody,
I am working on a large fortran code and I am trying to
change the data format from binary file to HDF5 file.
However, I have found that the file size is extremely large
compared to binary files.
The binary output file is about 10 times smaller than the HDF5 file.
I am not sure if this affects the IO speed or not. Can
anybody give me any information on this topic.
These are the questions I have:
1. Will the file size(10 times larger than binary) affect the
IO speed?
2. Can I reduce the file size substantially ? I tried using
the set_deflate option while dataset creation but since the
dataset is fully occupied with a very large vector, the
compression does not help ?
It is extremely important to reduce the file size, because
for larger runs, the binary output file is several gigabytes
in size and I cant affort a 10 times increase in size with HDF5.
3. Does HDF5 store the data in a form similar to ASCII.
Because even ASCII files seem to be around 10 times larger
than binary files.
Kindly suggest something.
Thanks.
Regards,
Nikhil
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to
hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.
--
Scanned for viruses and dangerous content at
http://www.oneunified.net and is believed to be clean.
--
Scanned for viruses and dangerous content at
http://www.oneunified.net and is believed to be clean.
----------------------------------------------------------------------
This mailing list is for HDF software users discussion.
To subscribe to this list, send a message to hdf-forum-subscribe@hdfgroup.org.
To unsubscribe, send a message to hdf-forum-unsubscribe@hdfgroup.org.