Recent Intel server processors temporarily reduce their frequency
when many AVX2 or AVX-512 SIMD instructions are executed.
The frequency change is only reverted two milliseconds after the
system has stopped executing such instructions. Before this time,
any non-vectorized (and potentially unrelated) code which could
execute at higher frequencies is slowed down. The effect on overall
performance depends on the specificworkload and is hard to predict.
We describe a scenario where vectorizing one component with AVX-
512 instructions improves performance by 10% for one workload
and reduces performance by 10% for another workload.
If only some of the cores of a system execute such vectorized
code, the frequency effect is limited to those cores. We propose
a scheduling algorithm as well as a mechanism to intercept problematic
code sections so that threads executing vectorized code
are transparently migrated to a small subset of the cores. While
our work is still in progress, we describe a partial implementation
which is able to reduce the negative performance impact of AVX2
and AVX-512 instructions by over 70%.