Artificial Intelligence and Big Data Analysis go hand in hand today. These practices raise many doubts as to the legitimacy of these processes, because the data privacy can be fully exposed. As Microsoft's director of data analysis, John Kahan, explains, "The deeper you dig into the data, the more likely confidential personal information will be revealed."
To solve this problem, the so-called differential privacy has emerged. It presents a series of techniques that offer a mathematical guarantee that the data collected by companies can never be linked to the identity of the person who provided them.
The Institute for Quantitative Social Sciences at Harvard University in collaboration with Microsoft has launched Open Differential Privacy, an Open Source platform intended to help in the implementation of differential privacy techniques in technology projects. It is aimed at academics, companies, public institutions and non-profit entities.
The system they have designed to shield data privacy inserts random data created with originals as a starting point. In this way, they manage to complicate the traceability of the data in its proper measure, so that the data continues to have statistical validity.
Open Differential Privacy is made up of a series of tools that allow differential privacy to be applied to any project: a development library with languages such as C, C ++, Python, R... A connector that allows access to various data sources, such as SQL Server, CSV files, Apache Spark... And tools that must be applied in each data query to check if the privacy of users is at risk.
For more information or to download this Toolkit, visit the official project website.