Memory scanning deduplication techniques, as implemented in Linux' Kernel Samepage Merging (KSM), work very well for deduplicating fairly static, anonymous pages with equal content across different virtual machines. However, scanners need very aggressive scan rates when it comes to identifying sharing opportunities with a short life span of up to about 5 min. Otherwise, the scan process is not fast enough to catch those short-lived pages.
Our approach generates I/O-based hints in the host to make the memory scanning process more efficient, thus enabling it to find and exploit short-lived sharing opportunities without raising the scan rate. Experiences with similar techniques for paravirtualized guests have shown that pages in a guest’s unified buffer cache are good sharing candidates. We already identify such pages in the host when carrying out I/O-operations on behalf of the guest. The target/source pages in the guest can safely be assumed to be part of the guest’s unified buffer cache. That way, we can determine good sharing hints for the memory scanner. A modification of the guest is not required.
We have im ... mehrplemented our approach in Linux. By modifying the KSM scanning mechanism to process these hints preferentially, we move the associated sharing opportunities earlier into the merging stage. Thereby, we deduplicate more pages than the baseline system. In our evaluation, we identify sharing opportunities faster and with less overhead than the traditional linear scanning policy. KSM needs to follow about seven times as many pages as we do, to find a sharing opportunity.