The address calculation for distributed data access plays a
major role for the performance of fine-grained data-parallel
applications. This paper reports about the hardware centrifuge
of the Cray T3E which enables the shift of the address
calculation from software into hardware. This shift
minimizes address calculation overhead reducing communication
cost of dynamic communication patterns. The centrifuge is
compared with complex integer division and modulo and with
integer mask and shift operations.
The measurements show for a one-dimensional dynamic
communication pattern for several distributions a runtime
advantage of T3E's hardware centrifuge of at least a factor 1.9
over integer division arithmetic. But, the centrifuge is barely
faster compared with integer mask and shift operations.