This paper compares the prefetching technique VSCAP (software
pipeling with vector commands) with Cray T3E's highly optimized
shared-memory functions (SHMEM)
and with Portland Group HPF (PGHPF) on three application
benchmarks namely PDE1, FIRE, and Veltran. Previous work showed
the good performance of VSCAP for single communication kernels.
This paper examines VSCAP and practicability of KarHPFn, our
prototype HPF compiler, in the context of whole applications.
The results show that VSCAP generated by KarHPFn
reduces communication overhead of fine-grained data
parallel applications to a minimum. This leads to a performance
gain compared to PGHPF between a factor of 2.8 for FIRE to a
factor of 9.6 for PDE1. VSCAP programs are nearly
as fast as SHMEM for regular communication patterns but
3.6 times faster than SHMEM in the case of dynamic communication
All results were measured on 128 processors.