Data is the driving force of modern businesses. For example, customer-generated data is collected by companies to improve their products, discover emerging trends, and provide insights to marketers. However, data might contain personal information which allows to identify a person and violate their privacy. Examples of privacy violations are abundant – such as revealing typical whereabout and habits, financial status, or health information, either directly or indirectly by linking the data to other available data sources. To protect personal data and regulate its collection and processing, the general data protection regulation (GDPR) was adopted by all members of the European Union.
Anonymization addresses such regulations and alleviates privacy concerns by altering personal data to hinder identification. Differential privacy (DP), a rigorous privacy notion for anonymization mechanisms, is widely deployed in the industry, e.g., by Google, Apple, and Microsoft.
Additionally, cryptographic tools, namely, secure multi-party computation (MPC), protect the data during processing. MPC allows distributed parties to jointly compute a function over their data such that only the function output is revealed but none of the input data. ... mehrMPC and DP provide orthogonal protection guarantees. MPC provides input secrecy, i.e., MPC protects the inputs of a computation via encrypted processing. DP provides output privacy, i.e., DP anonymizes the output of a computation via randomization. In typical deployments of DP the data is randomized locally, i.e., by each client, and aggregated centrally by a server. MPC allows to apply the randomization centrally as well, i.e., only once, which is optimal for accuracy. Overall, MPC and DP augment each other nicely. However, universal MPC is inefficient – requiring large computation and communication overhead – which makes MPC of DP mechanisms challenging for general real-world deployments.
In this thesis, we present efficient MPC protocols for distributed parties to collaboratively compute DP statistics with high accuracy. We support general rank-based statistics, e.g., min, max, median, as well as decomposable aggregate functions, where local evaluations can be efficiently combined to global ones, e.g., for convex optimizations. Furthermore, we detect heavy hitters, i.e., most frequently appearing values, over known as well as unknown data domains. We prove the semi-honest security and differential privacy of our protocols. Also, we theoretically analyse and empirically evaluate their accuracy as well as efficiency. Our protocols provide higher accuracy than comparable solutions based on DP alone. Our protocols are efficient, with running times of seconds to minutes evaluated in real-world WANs between Frankfurt and Ohio (100 ms delay, 100 Mbits/s bandwidth), and have modest hardware requirements compared to related work (mainly, 4 CPU cores at 3.3 GHz and 2 GB RAM per party). Additionally, our protocols can be outsourced, i.e., clients can send encrypted inputs to few servers which run the MPC protocol on their behalf.