In many parallel applications,
network latency causes a dramatic loss in processor utilization. This
paper examines software controlled access pipelining (SCAP) as a
technique for hiding network latency. An analytic model of SCAP briefly
describes basic operation techniques and performance improvements. Results
are quantified with benchmarks on the Cray-T3E.
The benchmarks used are Jacobi-iteration, parts of the Livermore Loop
kernels, and others representing six different parallel algorithm classes.
These were parallelized and optimized by hand to show the performance
tradeoff of severals pipelining techniques.
Our results show that SCAP on the Cray-T3E improves performance compared
to a blocking execution by a factor of 2.1 to 38. It also got a
performance speed-up against HPF of at least 12% to a factor of 3.1
dependent on the algorithm class.