How you interact with a crowd may help you stick out from it, at least to artificial intelligence.
When fed information about a target individualâ€™s mobile phone interactions, as well as their contactsâ€™ interactions, AI can correctly pick the target out of more than 40,000 anonymous mobile phone service subscribers more than half the time, researchers report January 25 in Nature Communications. The findings suggest humans socialize in ways that could be used to pick them out of datasets that are supposedly anonymized.
Itâ€™s no surprise that people tend to remain within established social circles and that these regular interactions form a stable pattern over time, says Jaideep Srivastava, a computer scientist from the University of Minnesota in Minneapolis who was not involved in the study. â€œBut the fact that you can use that pattern to identify the individual, that part is surprising.â€
According to the European Unionâ€™s General Data Protection Regulation and the California Consumer Privacy Act, companies that collect information about peopleâ€™s daily interactions can share or sell this data without usersâ€™ consent. The catch is that the data must be anonymized. Some organizations might assume that they can meet this standard by giving users pseudonyms, says Yves-Alexandre de Montjoye, a computational privacy researcher at Imperial College London. â€œOur results are showing that this is not true.â€
de Montjoye and his colleagues hypothesized that peopleâ€™s social behavior could be used to pick them out of datasets containing information on anonymous usersâ€™ interactions. To test their hypothesis, the researchers taught an artificial neural network â€” an AI that simulates the neural circuitry of a biological brain â€” to recognize patterns in usersâ€™ weekly social interactions.
For one test, the researchers trained the neural network with data from an unidentified mobile phone service that detailed 43,606 subscribersâ€™ interactions over 14 weeks. This data included each interactionâ€™s date, time, duration, type (call or text), the pseudonyms of the involved parties and who initiated the communication.
Each userâ€™s interaction data were organized into web-shaped data structures consisting of nodes representing the user and their contacts. Strings threaded with interaction data connected the nodes. The AI was shown the interaction web of a known person and then set loose to search the anonymized data for the web that bore the closest resemblance.
The neural network linked just 14.7 percent of individuals to their anonymized selves when it was shown interaction webs containing information about a targetâ€™s phone interactions that occurred one week after the latest records in the anonymous dataset. But it identified 52.4 percent of people when given not just information about the targetâ€™s interactions but also those of their contacts. When the researchers provided the AI with the target and contactsâ€™ interaction data collected 20 weeks after the anonymous dataset, the AI still correctly identified users 24.3 percent of the time, suggesting social behavior remains identifiable for long periods of time.
To see whether the AI could profile social behavior elsewhere, the researchers tested it on a dataset consisting of four weeks of close-proximity data from the mobile phones of 587 anonymous university students, collected by researchers in Copenhagen. This included interaction data consisting of studentsâ€™ pseudonyms, encounter times and the strength of the received signal, which was indicative of proximity to other students. These metrics are often collected by COVID-19 contact tracing applications. Given a target and their contactsâ€™ interaction data, the AI correctly identified students in the dataset 26.4 percent of the time.
The findings, the researchers note, probably donâ€™t apply to the contact tracing protocols of Google and Appleâ€™s Exposure Notification system, which protects usersâ€™ privacy by encrypting all Bluetooth metadata and banning the collection of location data.
de Montjoye says he hopes the research will help policy makers improve strategies to protect usersâ€™ identities. Data protection laws allow the sharing of anonymized data to support useful research, he says. â€œHowever, whatâ€™s essential for this to work is to make sure anonymization actually protects the privacy of individuals.â€