**
Ryan,
Ryan Price wrote:
I have a 4GB binary dump of data that I'd like to store as a hdf5 dataset
(using command line tools if possible). The data set will have the
dimensions 31486448 x 128. I believe this is too big to import as a data
set in one go.
4GB is a large array. You may wish to give some thought to how the data
will be used after you have created the file. Will the end user really
process all 4GB at once? HDF5 provides chunking and compression
functionality which (transparently to the data reader, and almost
transparently to the writer) will store the data in "chunks" and compress
them as well, if you'd like. If you can make the chunk size close to amount
of data the end user will want to access it can be very convenient.
Here is a piece of Python I wrote to demonstrate creating a file with
chunking and compression. I was able to open the file and view the dataset
properties in HDFView, but not the dataset itself because the array is so
large. You can use this code if the major axis of your binary dump is in
the 128 direction. If it is in the other direction you'll probably want to
choose different chunking parameters, and read the binary data off disk
appropriately. (By the way, I get a 3.8MB file since the array contains a
single value and I've turned on compression.)
import numpy; import h5py
with h5py.File('BigArray.h5','w') as fid:
dset = fid.create_dataset('BigArray',shape=(31486448,
128),dtype='int8',chunks=(31486448, 1), compression='gzip')
slicearray = numpy.ones([31486448],dtype='int8') # one-D array, of
length 31486448
for i in range(128):
# replace this comment with read from binary file into slicearray
print "Writing",i
dset[:,i] = slicearray # populate slice i of the HDF5 dataset.
Cheers,
--dan
Running h5import gives the following error:
Unable to allocate dynamic memory.
Error in allocating unsigned integer data storage.
Error in reading the input file: my_data
Program aborted.
So I split the binary dump into four files which can be imported. I'd
still like to have one 31486448 x 128 dataset but am not sure that's
possible to do.
Any idea how I could combine these four binary dumps into one data set.
Maybe create a single dataset and append each small one...?
Thanks,
Ryan
------------------------------
_______________________________________________
Hdf-forum is for HDF software users discussion.Hdf-forum@hdfgroup.orghttp://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
--
Daniel Kahn
Science Systems and Applications Inc.301-867-2162
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org