I am writing some tick data to hdf5 file. The hdf5 file taking around 10
times of original data.
When I do h5stat on file I can see huge unaccounted space. How can I reduce
the file size?
h5stat 20120801.h5
Filename: 20120801.h5
File information
# of unique groups: 5892
# of unique datasets: 11783
# of unique named datatypes: 0
# of unique links: 0
# of unique other: 0
Max. # of links to object: 1
Max. # of objects in group: 5892
File space information for file metadata (in bytes):
Superblock extension: 0
User block: 0
Object headers: (total/unused)
Groups: 235680/0
Datasets(exclude compact data): 21868000/94312
Datatypes: 0/0
Groups:
B-tree/List: 5642208
Heap: 797064
Attributes:
B-tree/List: 0
Heap: 0
Chunked datasets:
Index: 24757952
Datasets:
Heap: 0
Shared Messages:
Header: 0
B-tree/List: 0
Heap: 0
Small groups:
# of groups of size 2: 5891
Total # of small groups: 5891
Group bins:
# of groups of size 1 - 9: 5891
# of groups of size 1000 - 9999: 1
Total # of groups: 5892
Dataset dimension information:
Max. rank of datasets: 1
Dataset ranks:
# of dataset with rank 1: 11783
1-D Dataset information:
Max. dimension size of 1-D datasets: 165582
Small 1-D datasets:
# of dataset dimensions of size 1: 4520
# of dataset dimensions of size 2: 1517
# of dataset dimensions of size 3: 718
# of dataset dimensions of size 4: 344
# of dataset dimensions of size 5: 551
# of dataset dimensions of size 6: 1787
# of dataset dimensions of size 7: 87
# of dataset dimensions of size 8: 106
# of dataset dimensions of size 9: 77
Total small datasets: 9707
1-D Dataset dimension bins:
# of datasets of size 1 - 9: 9707
# of datasets of size 10 - 99: 1879
# of datasets of size 100 - 999: 170
# of datasets of size 1000 - 9999: 26
# of datasets of size 100000 - 999999: 1
Total # of datasets: 11783
Dataset storage information:
Total raw data size: 67561600
Total external raw data size: 0
Dataset layout information:
Dataset layout counts[COMPACT]: 0
Dataset layout counts[CONTIG]: 0
Dataset layout counts[CHUNKED]: 11783
Number of external files : 0
Dataset filters information:
Number of datasets with:
NO filter: 11783
GZIP filter: 0
SHUFFLE filter: 0
FLETCHER32 filter: 0
SZIP filter: 0
NBIT filter: 0
SCALEOFFSET filter: 0
USER-DEFINED filter: 0
Dataset datatype information:
# of unique datatypes used by datasets: 3
Dataset datatype #0:
Count (total/named) = (5891/0)
Size (desc./elmt) = (662/48)
Dataset datatype #1:
Count (total/named) = (5891/0)
Size (desc./elmt) = (854/56)
Dataset datatype #2:
Count (total/named) = (1/0)
Size (desc./elmt) = (66/16)
Total dataset datatype count: 11783
Small # of attributes:
# of objects with 4 attributes: 1
Total # of objects with small # of attributes: 1
Attribute bins:
# of objects with 1 - 9 attributes: 1
# of objects with 10 - 99 attributes: 11782
Total # of objects with attributes: 11783
Max. # of attributes to objects: 18
Summary of file space information:
File metadata: 53300904 bytes
Raw data: 67561600 bytes
Unaccounted space: 32999120 bytes
Total space: 153861624 bytes
···
--
View this message in context: http://hdf-forum.184993.n3.nabble.com/Huge-unaccounted-space-tp4025294.html
Sent from the hdf-forum mailing list archive at Nabble.com.