Thanks for the reply!
Are you seeing a lot of disk activity after the data have been loaded
into memory? That would indicate
excessive swapping. Low CPU usage (CPU is waiting on I/O) is another
indicator. There are usually some OS-specific tools to gather
statistics on vm usage and swapping. Are the data on a local disk or
a network server?
The entire thing is being run on a cluster, so I can't check disk activity - but the data is local to the program.
However, I can see that the program is fast at loading the first 60ish files, and then slows down. As soon as that slowdown occurs I also see virtual memory useage increase, so I assume it's loading data into VM rather than physical RAM.
You need to tell us more about how the data are used. One common
example is where the calculation is repeated for each (i,j) coord. all
100+ files, so there is no need to store complete arrays, but you want
parts of all arrays to be stored at the same time. Another is a
calculation that uses data from one array at a time, so there is no
need to store more than one array at a time.
Yes, I'm performing the former - processing each i,j element individually. It is remote sensing data, with each file being a separate observation, so what I'm doing is processing a timeseries on a per-pixel basis.
As you say, there's no need to store the complete arrays, but my attempts at loading only a small hyperslab (corresponding to one row of the input images) have not been successful.
Hope that makes sense, and thanks again.
Simon.