It is well documented that variable length string datasets cannot be compressed. That is clear and understandable.
But if one chunks and compresses such a dataset (especially a large one), something is getting compressed. The resultant file size is considerably smaller than a file where the dataset is not chunked and compressed.
Hi, I should be more specific. I am referring to file size on disk.
Real world example I just produced:
I have a dataset of 1,000,000 variable length strings. I write it all at one time without specifying chunking or compression and immediately close the file. The file size on disk is 42,680,320 bytes.
If I apply chunking and compression to it while writing the dataset, the file size on disk is 26,750,976 bytes.
I do know that the strings themselves are not compressed. So what is being compressed? What could another explanation of the reduced file size?