Social media and other online scraping
Social media is part of almost everyone’s lives. When you put photos or videos online, you probably imagine that your friends, your family and your followers will see them. You probably don’t imagine that they will be taken and processed by shady private companies and used to train algorithms. You’re probably even less likely to imagine that this information could be kept in a massive biometric database, and even used by police to identify you after you were in the area of a protest or were walking through a zone secretly surveilled.
WHAT is scraping?
Social media Scraping is the process of gathering data automatically. Now you might wonder, what type of data can they get from social media? Well, the data can go from usernames or followers to very sensitive data, like where do you live and … your biometric data such as your facial features. This is generally performed by bots and can gather and process large amounts of data in very little time.
While it might seem like we are far away from that, this is happening now, right in our faces.
Scraping is happening already
Probably the most well-known cuplrit of mass social media scraping of our faces is the notorious ClearviewAI. In this case, just being in the background of a friend’s photo that gets uploaded to the internet could be enough for you to end up in ClearviewAI’s 3 billion photo database. Yes, that’s billion with a “b”. Services like the ones kindly offered by the likes of CleaviewAI have been used for biometric mass surveillance practices by police forces all around the EU and elsewhere in the world.
Gladly, thanks to the efforts of one of the ReclaimYourFace’s campaigners, the inclusion of images of people in the EU in the ClearviewAI database has already been proven to be illegal. However, that hasn’t stopped it from happening.
Another well-known case is the one from Poland-founded company PimEyes (who suddenly relocated to the Seychelles, allegedly to avoid regulatory scrutiny in the EU). This is another company offering similar services – although they claim that it is not social media sites that they scrape but other sites (as if that made it more ethical!). However, unlike ClearviewAI, which tends to offer their services to law enforcement, PimEyes are offered to any individual that would like to access it.
Yes, anyone walking on the street could scan your face and know everything about you.
Just imagine if wherever you went, any person could know your name, your interests, where you live and many other sensitive things about you just by scanning your face at a distance using their phone. You might not even know that they had done this – but they would be able to know a lot about you.
I don’t use social media
Even if you don’t use social media, you can still be part of those databases. For instance, if people that you know have uploaded pictures with you in them to social media giants like Facebook, Twitter and YouTube, or if you use certain other online services with your picture, your biometric data could be scraped. And yes, these Big Tech companies will know probably as much as if you were a social media user. In fact, there have even been reports of so-called ‘shadow profiles’, where Facebook knows so much information about people who don’t have accounts that it’s as if they have an active profile!
Why do we need to stop this?
Summarise harms and the need to stop. Harms connect to bigger things than privacy ad anonymity e. g. stalkers, ex partners, being judged by authorities based on social media posts, etc…
Examples
PimEyes is a company with an enormous facial recognition database, reportedly scraped from the internet without people’s knowledge, and available for members of the public to use to spy on whoever they like: https://edition.cnn.com/2021/05/04/tech/pimeyes-facial-recognition/index.html
Not only did Clearview AI illegally scrape our biometric data from social media, the safety of this data has been compromised in past years. All of our data is not only used against us by police forces around the world, but now leaked to anyone else: https://edition.cnn.com/2020/02/26/tech/clearview-ai-hack/index.html