This paper explores the need for asynchronous iteration algorithms as smoothers in multigrid methods. The hardware target for the new algorithms is top-of-the-line, highly parallel hybrid architectures -- multicore-based systems enhanced with GPGPUs. These architectures are the most likely candidates for future high-end supercomputers. To pave the road for their efficient use, challenges related to the established notion that "data movement, not FLOPS, is the bottleneck to performance" must be resolved. Our work is in this direction -- we designed block-asynchronous multigrid smoothers that perform more flops in order to reduce synchronization, and hence data movement. We show that the extra flops are done for "free", while synchronization is reduced and the convergence properties of multigrid with classical smoothers like Gauss-Seidel are preserved.