February 23, 2024

Hackers download OkCupid data without consent

Danish researchers post information on 70,000 people

Is your privacy safe on the internet? How do you know? Now that people store medical records in the cloud and record much of their lives on social media, privacy is essential.

But as the latest scandal shows, privacy is always at risk.

A group of Danish researchers from Aarhus University has just released a data set of 70,000 subscribers to the online dating site OkCupid. They obtained them by scraping (harvesting) the data of publicly available profiles. However, they did it without seeking permission from the website or asking for the consent of the subscribers.

According to Vox, the data, which was collected between November 2014 and March 2015, includes user names, ages, gender, religion, and personality traits, as well as answers to the highly personal questions the site asks to help match potential mates. The users come from all over the world.

The data was uploaded onto the Open Science Framework, a site which warehouses Big Data and makes it freely available to social science researchers.

Harvesting the personal details of subscribers violates a fundamental rule of social science research: people must give informed consent. But the lead researcher, a graduate student named Emil Kirkegaard, sneered at critics of his methodology.

Professional researchers were outraged. Oliver Keyes, a social computing researcher, wrote on his blog:

this is without a doubt one of the most grossly unprofessional, unethical and reprehensible data releases I have ever seen.

There are two reasons for that. The first is very simple; Kirkegaard never asked anyone. He didn’t ask OKCupid, he didn’t ask the users covered by the dataset – he simply said ‘this is public so people should expect it’s going to be released’.

This is bunkum. A fundamental underpinning of ethical and principled research – which is not just an ideal but a requirement in many nations and in many fields – is informed consent. The people you are studying or using as a source should know that you are doing so and why you are doing so.

Although no names are associated with the data, it has not been anonymized. It will be possible to identify the user from other information in the dataset.

The ethics of this escapade seem simple enough: it’s not ethical. Kierkegaard commented in an on-line journal:

Some may object to the ethics of gathering and releasing this data. However, all the data found in the dataset are or were already publicly available, so releasing this dataset merely presents it in a more useful form.

However, as Michael Zimmer points out in Wired, “The most important, and often least understood, concern is that even if someone knowingly shares a single piece of information, big data analysis can publicize and amplify it in a way the person never intended or agreed.”

Creative commons
informed consent
research ethics