Hi,
I'm having a hard time trying to figure out what could be causing the
slow I/O behaviour that I see: the same code (in Fortran) run in three
different clusters behaves pretty similar in terms of I/O times, except
for one of the variables in the code, where I get two orders of
magnitude slower writes in one of machines (last timing data in the
e-mail). So I hope that somebody with more in-depth knowledge of
Parallel HDF5 can give me a hand with it.
This is the situation. Our code writes to files two types of variables:
the first type are 3D variables that have been decomposed with 3D
decomposition across different processors, and I use hyperslabs to
select where each part should go. Using arrays of size 200x200x200 that
have been decomposed in 64 processors I get similar times for the
reading and writing (each file 794MB) routines in three clusters that I
have access to:
Cluster 1:
···
------
READING 0.1231E+01
WRITING 0.1600E+01
Cluster 2:
------
READING 0.1973E+01
WRITING 0.2544E+01
Cluster 3:
-----
READING 0.1274E+01
WRITING 0.5895E+01
As you can see there is some variation, but I would be happy with this
sort of behaviour.
The other type of data that I write to disk are like outside layers of
the 3D cube. So, for example, in the 200x200x200 cube above, I have six
outside layers, two in each dimension. The depth of this layers can vary,
but in this example I'm using 24 cells, so the X layers would be in this
case 24x200x200. But for each of these layers I need to save 24
variables, so in reality I end up with 4D arrays. In this particular
example, for the outside layers in X dimension, we have 4D arrays of
size 24x200x200x24, for Y 200x24x200x24 and for Z 200x200x24x24.
So now the fun begins. If I tell my code to only save the X outside
layers, I end up with files of 1.2GB and the times in the 3 clusters
where I've been running these tests are:
Cluster 1:
-----
READING 0.1270E+01
WRITING 0.2088E+01
Cluster 2:
-----
READING 0.2214E+01
WRITING 0.3826E+01
Cluster 3::
-----
READING 0.1279E+01
WRITING 0.7138E+01
If I only save the outside layers in Y, I get also 1.2GB files, and the times:
Cluster 1:
-----
READING 0.1207E+01
WRITING 0.1832E+01
Cluster 2:
-----
READING 0.1606E+01
WRITING 0.3895E+01
Cluster 3::
-----
READING 0.1264E+01
WRITING 0.6670E+01
But if I ask to only save the outside layers in Z, I also get 1.2GB
files, but the times are:
Cluster 1:
-----
READING 0.7905E+00
WRITING 0.2190E+01
Cluster 2:
-----
READING 0.1856E+01
WRITING 0.8722E+02
Cluster 3:
-----
READING 0.1252E+01
WRITING 0.2372E+03
What can be so different about the Z dimension to get I/O behaviours so
different in the three clusters? (Needless to say the code is exactly
the same, the input data is exactly the same...)
Any pointers are more than welcome,
--
Ángel de Vicente
http://www.iac.es/galeria/angelv/
---------------------------------------------------------------------------------------------
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protecci�n de Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en