No worries, nothing unclear there. My point was that you believe (for good reasons) that the two files’ content should be bytewise identical, which, given the size difference, is not possible. So let’s find out what’s going on here! Can you show us two examples that you believe should be byte-for-byte identical but aren’t? We’ll then understand how they are different, and we can ask questions about the generating mechanism.
Just to be clear: Your general assumption that a correct (no uninitialized memory, etc.) program without history, random elements or side effects (such as time), given the same inputs will produce byte-identical HDF5 files, is in principle correct.
A simple example where this assumption wouldn’t be satsified: Let’s say you store the 32-bit integer sequence 0-999,999 in a one-dimensional array to which you apply Gzip compression. The resulting dataset will be very small, because compression will be very effective. Then you create a second 1 million element integer sequence with some random mechanism, apply Gzip compression, etc. The size will tend to be substantially larger (unless you happen, by accident, to generate 0-999,999…), and so will be the corresponding HDF5 file.