size of HDF5 varies

wenlu.yang · July 7, 2020, 4:19pm

Hi,

I am using API in official library to build software and want to store result data in HDF5 file.
However, sometimes I found that the size of HDF5 generated with same model was not the same. The size of testing deck file is about 9.7K and the size of the HDF5 file may fall in the area between 130K to 134K.

Actually, it is not a critial problem, but I just want to know why.

Thanks.

gheber · July 7, 2020, 8:07pm

Can you give us your definition of ‘same’ (content) and maybe two examples of the same but not the same size? G.

wenlu.yang · July 8, 2020, 12:34am

Hi gheber,

I am sorry for my unclear expression. The same here I mention is for the size of the file.
From my understanding, the file size shouldn’t vary beacuse they are generated by same program and same model.

Thank you.

gheber · July 8, 2020, 11:14am

No worries, nothing unclear there. My point was that you believe (for good reasons) that the two files’ content should be bytewise identical, which, given the size difference, is not possible. So let’s find out what’s going on here! Can you show us two examples that you believe should be byte-for-byte identical but aren’t? We’ll then understand how they are different, and we can ask questions about the generating mechanism.

Just to be clear: Your general assumption that a correct (no uninitialized memory, etc.) program without history, random elements or side effects (such as time), given the same inputs will produce byte-identical HDF5 files, is in principle correct.

A simple example where this assumption wouldn’t be satsified: Let’s say you store the 32-bit integer sequence 0-999,999 in a one-dimensional array to which you apply Gzip compression. The resulting dataset will be very small, because compression will be very effective. Then you create a second 1 million element integer sequence with some random mechanism, apply Gzip compression, etc. The size will tend to be substantially larger (unless you happen, by accident, to generate 0-999,999…), and so will be the corresponding HDF5 file.

G.