Different output file sizes between serial and parallel versions

mickelso · February 19, 2021, 8:34pm

I’ve been working on analyzing the performance of netcdf4 parallelization vs. the serial version. I’m creating a file in parallel and then I create an identical file in serial, but the output file sizes are different. I’m working with the netcdf4-python library, but I’ve confirmed that this problem also exists within the netcdf-c library. It was recommended by Jeff Whitaker that I post this question within this forum.

ncks is reporting the serial and parallel files to be identical …
parallel_test_0.nc --> serial output
parallel_test_1.nc --> parallel output

ncks --trd -m parallel_test_0.nc
var: type NC_DOUBLE, 1 dimension, 0 attributes, compressed? no, chunked? no, packed? no
var size (RAM) = 160000000sizeof(NC_DOUBLE) = 1600000008 = 1280000000 bytes
var dimension 0: dim, size = 160000000 (Non-coordinate dimension)

ncks --trd -m parallel_test_1.nc
var: type NC_DOUBLE, 1 dimension, 0 attributes, compressed? no, chunked? no, packed? no
var size (RAM) = 160000000sizeof(NC_DOUBLE) = 1600000008 = 1280000000 bytes
var dimension 0: dim, size = 160000000 (Non-coordinate dimension)

but there is something that’s different between them

1.8G Feb 17 12:31 parallel_test_1.nc
1.2G Feb 17 12:31 parallel_test_0.nc

I’m using this stack:
hdf5 version 1.12.0
netcdf-c version 4.7.4
icc version 19.0.5
mpt version 2.22

And here’s the information for the files I’m creating
netcdf file {
dimensions:
dim = 160000000 ;
variables:
double var(dim) ;
}

Do you know why the file sizes are different? Do you know if there’s a difference in size_t or the block size between the parallel and serial versions?

epourmal · March 19, 2021, 4:34pm

Hi,
Could you please use h5stat tool to see where the differences come from?

Thank you!
Elena