Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutation, subset sampling, and reservoir sampling. Our output sensitive algorithm for sampling with replacement also improves the state of the art for sequential algorithms.