anonymized data – Techdirt (original) (raw)

Stories filed under: "anonymized data"

Rampant Data Broker Sale Of Pregnancy Data Gets Fresh Scrutiny Post Roe

from the the-check-is-coming-due-for-apathy dept

For decades now, privacy advocates warned we were creating a dystopia through our rampant over-collection and monetization of consumer data. And just as often, those concerns were greeted with calls of “consumers don’t actually care about privacy” from overly confident white guys in tech.

just thinking today about how the people who have spent the past decade-plus insisting "no one really cares about online privacy" were so often men pic.twitter.com/SDrl5CulFH

— Will Oremus (@WillOremus) June 24, 2022

Nothing has exposed those flippant responses as ignorant quite like the post-Roe privacy landscape, in which basic female health data can now be weaponized to ruin the lives of those seeking abortions, or those trying to help women obtain foundational health care. Either by states looking to prosecute them, or individual right wing hardliners who often have easy, cheap access to the exact same information.

The latest case in point: Gizmodo did a deep dive into the largely unaccountable data broker space and discovered there are currently 32 different data brokers selling pregnancy status data on 2.9 billion consumer profiles.

Via browsing, app, promotion, and location data, those consumers are quickly deemed “actively pregnant” or “shopping for maternity products.” Another 478 million customer profiles are actively labeled “interested in pregnancy” or “intending to become pregnant.” As is usually the case, companies (the ones that could be identified) claimed it was no big deal because the data is “anonymized”:

In an email statement, a spokesperson for Mastercard said the company only uses “anonymized transaction data” to gather data at the postal code level. As shown in the image above, though, AlikeAudience claims it can create links between such anonymized IDs and users who “voluntarily” give up their data. Mastercard further said it limits how insights from data may be used, but did not clarify in which ways partners were limited.

Of course countless studies have shown how “anonymized” is a gibberish term in privacy, since “anonymized” users can be easily identified with just a few additional snippets of data. “Voluntarily” is also doing a lot of heavy lifting here, since companies that collect this data rely on overlong privacy statements nobody reads, assuming companies are even disclosing the data collection at all.

Again, it’s a matter of when, not if, authoritarian-leaning state leaders and vigilantes use this data to prosecute and harass those seeking abortions and their allies, even across state lines into states where abortion is legal:

Bennett Cyphers, a staff technologist for the the Electronic Frontier Foundation, said these commercial data brokers are “a big risk” for abortion seekers since those companies “label people and put people into lists that makes it easier for someone who is coming at it like a fishing expedition to narrow down who they want to target and subject them to more scrutiny or and surveillance.”

American authoritarians aren’t being at all subtle about where this goes next. This is the era privacy advocates have been warning about for decades, built upon a generation of apathy toward data collection transparency and the need for meaningful rules and penalties. For decades we prioritized making money over consumer welfare, and the check is about to come due.

Will we do anything meaningful about it in response? Probably not!

Filed Under: abortion, anonymized data, apathy, data brokers, privacy

'Anonymized Data' Is A Gibberish Term, And Rampant Location Data Sales Is Still A Problem

from the doing-nothing-helpful dept

Mon, Nov 22nd 2021 06:25am - Karl Bode

As companies and governments increasingly hoover up our personal data, a common refrain is that nothing can go wrong because the data itself is “anonymized” — or stripped of personal identifiers like social security numbers. But time and time again, studies have shown how this really is cold comfort, given it takes only a little effort to pretty quickly identify a person based on access to other data sets. Yet most companies, many privacy policy folk, and even government officials still like to act as if “anonymizing” your data actually something.

That’s a particular problem when it comes to user location data, which has been repeatedly abused by everybody from stalkers to law enforcement. The data, which is collected by wireless companies, app makers and others, is routinely bought and sold up and down a major chain of different companies and data brokers providing layers of deniability. Often with very little disclosure to or control by the user (though companies certainly like to pretend they’re being transparent and providing user control of what data is traded and sold).

For example, last year a company named Veraset handed over billions of location data records to the DC government as part of a COVID tracking effort, something revealed courtesy of a FOIA request by the EFF. While there’s no evidence the data was abused in this instance, EFF technologist Bennett Cyphers told the Washington Post Veraset is one of countless companies allowed to operate so non-transparently. Nobody even knows where the datasets they’re selling and trading are coming from:

“A lot of these data brokers? existence depends on people not knowing too much about them because they?re universally unpopular,? Cyphers said. ?Veraset refuses to reveal even how they get their data or which apps they purchase it from, and I think that?s because if anyone realized the app you?re using ? also opts you into having your location data sold on the open market, people would be angry and creeped out.”

While a long list of companies continue to insist that the massive scale this data is bought and sold at is no big deal because the data is “anonymous,” experts (with mixed success) keep pointing out that’s not really true:

“If you look at a map of where a device spends its time, you can learn a lot: where you sleep at night, where you work, where you eat lunch, what bars and parks you go to,? Cyphers said. Because of that, he added, it?s extremely simple ?to associate one of these location traces to a real person.”

After major location data scandals at both Securus and wireless carriers, it looked like we might see actual reform on this front, but those efforts have largely stalled. Bills specifically targeting location data have gone nowhere. The occasional fines levied against such companies are a tiny fraction of the revenues made from the data in the first place. And our 20-year effort to have anything even vaguely resembling a useful federal privacy law for the internet era remains mired in gridlock thanks to a massive coalition of cross industry lobbying opposition with a near-unlimited budget.

Which means most of these companies are going to keep collecting and selling access to this data, while pretending they don’t sell access, that the data they collect is anonymous and harmless, and that absolutely any oversight or transparency requirements are unnecessary. And the parade of scandals, breaches, and abuse of this data will continue, until eventually there’s a scandal so large that the problem can no longer be cavalierly brushed aside.

Filed Under: anonymized data, location data, privacy
Companies: veraset

T-Mobile The Latest Snooping Company To Pretend 'Anonymized' Data Means Anything

from the not-so-'uncarrier' dept

Thu, Mar 11th 2021 05:38am - Karl Bode

As companies like Google shift away from individual behavior tracking in their ad efforts, telecoms like T-Mobile are headed in the opposite direction. The wireless giant this week announced it would be automatically enrolling all of its customers (including recently acquired Sprint customers) in a new behavioral tracking and ad system the company is launching on April 26. Whereas Google is shifting to its FLOC system that tends to clump consumers into groups of like minded consumers (an approach that still comes with its own issues), T-Mobile is doubling down on individualized targeting, and will start sharing its customers? web and mobile-app data with advertisers.

While this sort of tracking is nothing new for AT&T and Verizon, it’s a shift away from T-Mobile’s more consumer friendly branding, and will be something new for recently acquired Sprint customers. Fortunately users can opt out of the tech, though that may not always mean what you think it does. AT&T, for example, has historically viewed “opting out” as meaning “we will no longer hit you with targeted ads based on your online data,” not that they won’t gather data whatsoever. Other times in telecom, opting out can easily be reverted to opting in without the consumer really knowing.

T-Mobile, like so many companies before it, tries a bit too hard to hide behind the claim that “anonymization” of individual user data makes collecting it ok, something that’s been disproven by a repeated barrage of different studies. It only takes a small number of additional data points to quickly make users not so anonymous.

One investigation of “anonymized” user credit card data by MIT found that users could be correctly “de-anonymized” 90 percent of the time using just four relatively vague points of information. Another study looking at vehicle data found that 15 minutes? worth of data from just brake pedal use could lead them to choose the right driver, out of 15 options, 90% of the time.

Despite this, companies continue to toss around the word “anonymization” as some kind of get out of jail free card, as if the terminology means anything. Case in point: T-Mobile’s comments to the Wall Street Journal, which were thankfully quickly corrected by the EFF’s Aaron Mackey:

“T-Mobile said it masks users? identities to prevent advertisers and other companies from knowing what websites they visit or apps they have installed. The company tags the data with an encoded user or device ID to protect the customers? anonymity.

But privacy groups say those IDs can be linked back to people by comparing different data sets.

?It?s hard to say with a straight face, ?We?re not going to share your name with it,? ? said Aaron Mackey, a lawyer for the San Francisco-based Electronic Frontier Foundation, a consumer-privacy advocate. ?This type of data is very personal and revealing, and it?s trivial to link that deidentified info back to you.”

T-Mobile’s move comes in stark, opposite contrast to the shifting winds across the rest of the tech sector as America belatedly considers having a privacy law for the internet era. It also comes fresh off the telecom industry successfully convincing at least half of DC that “big tech” is the only sector worth thinking and worrying about, and “big telecom” is comprised of nothing less than a group of utterly innocent sweethearts.

Filed Under: ads, anonymized data, behavioral tracking, privacy, tracking
Companies: t-mobile

NYT Easily Tracks Location Data From Capitol Riots, Highlighting Once Again How US Privacy Standards Are A Joke

from the watching-you-watching-me dept

Mon, Feb 8th 2021 05:37am - Karl Bode

First there was the Securus and LocationSmart scandal, which showcased how cellular carriers and data brokers buy and sell your daily movement data with only a fleeting effort to ensure all of the subsequent buyers and sellers of that data adhere to basic privacy and security standards. Then there was the blockbuster report by Motherboard showing how this data routinely ends up in the hands of everyone from bail bondsman to stalkers, again, with only a fleeting effort made to ensure the data itself is used ethically and responsibly.

Throughout it all, government has refused to lift a finger to address the problem, presumably because lobbyists don’t want government upsetting the profitable apple cart, government is too busy freely buying access to this data itself, or too many folks still labor under the illusion that this sort of widespread dysfunction will be fixed by utterly unaccountable telecom or adtech markets.

Enter the New York Times, which in late 2019 grabbed a hold of a massive location data set from a broker, highlighting the scope of our lax location data standards (and the fact that “anonymized” data is usually anything but). This week, they’ve done another deep dive into the location data collected from rioting MAGA insurrectionists at the Capitol. It’s a worthwhile read, and illustrates all the same lessons, including, once again, that “anonymized” data isn’t real thing:

“While there were no names or phone numbers in the data, we were once again able to connect dozens of devices to their owners, tying anonymous locations back to names, home addresses, social networks and phone numbers of people in attendance. In one instance, three members of a single family were tracked in the data.”

There’s been an endless list of studies finding that “anonymized” is a meaningless term, since it takes only a tiny shred of additional contextual data to identify individuals. It’s a term companies use to provide regulators and consumers with a false sense of security that data protection and privacy are being taken seriously, and that’s simply not true:

“The location-tracking industry exists because those in power allow it to exist. Plenty of Americans remain oblivious to this collection through no fault of their own. But many others understand what?s happening and allow it anyway. They feel powerless to stop it or were simply seduced by the conveniences afforded in the trade-off. The dark truth is that, despite genuine concern from those paying attention, there?s little appetite to meaningfully dismantle this advertising infrastructure that undergirds unchecked corporate data collection.”

The dystopian aspect of this has already arrived, yet this still somehow isn’t being taken seriously. Numerous US agencies already buy this data to bypass pesky things like warrants, and the US still lacks even a simple privacy law for the internet despite a steady parade of privacy-related scandals. Instead of having a serious conversation about this or other serious tech policy problems, we spent the last few years hyperventilating about TikTok.

Filed Under: anonymized data, insurrection, location data, privacy
Companies: ny times

It Took Just 5 Minutes Of Movement Data To Identify 'Anonymous' VR Users

from the no-such-thing-as-anonymous dept

Mon, Nov 9th 2020 06:04am - Karl Bode

As companies and governments increasingly hoover up our personal data, a common refrain to keep people from worrying is the claim that nothing can go wrong because the data itself is “anonymized” — or stripped of personal identifiers like social security numbers. But time and time again, studies have shown how this really is cold comfort, given it takes only a little effort to pretty quickly identify a person based on access to other data sets. Yet most companies, many privacy policy folk, and even government officials still like to act as if “anonymizing” your data means something.

The latest case in point: new research out of Stanford (first spotted by the German website Mixed), found that it took researchers just five minutes of examining the movement data of VR users to identify them in the real world. The paper says participants using an HTC Vive headset and controllers watched five 20-second clips from a randomized set of 360-degree videos, then answered a set of questions in VR that were tracked in a separate research paper.

The movement data (including height, posture, head movement speed and what participants looked at and for how long) was then plugged into three machine learning algorithms, which, from a pool of 511 participants, was able to correctly identify 95% of users accurately “when trained on less than 5 min of tracking data per person.” The researchers went on to note that while VR headset makers (like every other company) assures users that “de-identified” or “anonymized” data would protect their identities, that’s really not the case:

“In both the privacy policy of Oculus and HTC, makers of two of the most popular VR headsets in 2020, the companies are permitted to share any de-identified data,? the paper notes. ?If the tracking data is shared according to rules for de-identified data, then regardless of what is promised in principle, in practice taking one?s name off a dataset accomplishes very little.”

If you don’t like this study, there’s just an absolute ocean of research over the last decade making the same point: “anonymized” or “de-identified” doesn’t actually mean “anonymous.” Researchers from the University of Washington and the University of California, San Diego, for example, found that they could identify drivers based on just 15 minutes? worth of data collected from brake pedal usage alone. Researchers from Stanford and Princeton universities found that they could correctly identify an “anonymized” user 70% of the time just by comparing their browsing data to their social media activity.

The more data that’s available to researchers (or corporations or governments), the easier it is to identify you. And with hacks, data leaks, and breaches dumping an endless ocean of existing datasets into the public domain, and no serious rules of the road governing things like the collection of location and other sensitive data, it shouldn’t be too hard to see how the idea of “privacy” is a myth. Especially if the company is, say, Facebook, which is now tying your entire online Facebook experience to VR whether you like it or not.

It’s all something to keep in mind for whenever the U.S. gets off its ass and finally crafts a meaningful privacy law for the internet era. Especially given that “don’t worry, your data is anonymized!” will be an endless refrain by industry as they try to ensure any rules are as feeble as possible.

Filed Under: anonymity, anonymized data, data, de-anonymized, vr

Using Trump As A Prop, The Myth Of 'Anonymized' Cell Data Is Finally Exposed

from the privacy-doesn't-exist dept

Fri, Dec 20th 2019 12:01pm - Karl Bode

As companies and governments increasingly hoover up our personal data, a common refrain to keep people from worrying is the claim that nothing can go wrong because the data itself is “anonymized”–or stripped of personal detail. But time and time again, studies have shown how this really is cold comfort; given it takes only a little effort to pretty quickly identify a person based on access to other data sets. Yet most companies, policy folk, and government officials still act as if “anonymizing” your data means something. It’s simply not true.

While that point hasn’t yet resonated with the public and press fully, it should now.

The second in an amazing 7 series saga by the New York Times was released this week, taking a closer look at a data trove of 50 billion location pings from the phones of more than 12 million Americans given to the Times by an anonymous insider at one of countless location data brokers. The first in the Times’ series took a look at how easy it was to identify “anonymized” normal citizens and track their everyday lives. This second piece ups the ante by… easily tracking the President of the United States via the location data of one of his secret service agents:

“The meticulous movements ? down to a few feet ? of the president?s entourage were recorded by a smartphone we believe belonged to a Secret Service agent, whose home was also clearly identifiable in the data. Connecting the home to public deeds revealed the person?s name, along with the name of the person?s spouse, exposing even more details about both families. We could also see other stops this person made, apparently more connected with his private life than his public duties. The Secret Service declined to comment on our findings or describe its policies regarding location data.”

I’m not sure I’ve ever seen a story that more perfectly encapsulates both the stupidity of the “anonymized data is a panacea” claim, as well as the government’s feckless refusal to seriously address one of the biggest scandals in privacy history. Granted it wasn’t just the daily movement habits of the President’s security detail that the data revealed, but that of Congressional staffers and lawmakers, many of whom have similarly been utterly apathetic to the problem:

“We were able to track smartphones in nearly every major government building and facility in Washington. We could follow them back to homes and, ultimately, their owners? true identities. Even a prominent senator?s national security adviser ? someone for whom privacy and security are core to their every working day ? was identified and tracked in the data.”

DC lawmakers could use this as a learning opportunity to finally understand why location data–whether it comes from an app or your cellular provider–shouldn’t be treated cavalierly and sold to every nitwit with a nickel. Granted, it’s just as likely the end lesson government learns is a focus on better location data security for government officials, and nobody else. As we’ve noted for a while, “feckless” is the best term to describe the government’s response to a steady parade of scandals showing this data is routinely abused from everybody from rogue law enforcement officers to crazed stalkers.

The Congressional response to the Times’ latest report was bipartisan in nature, even though a desire to actually do something about it hasn’t been. Recall Congress voted along strict party lines to kill FCC broadband privacy rules in 2017 that could have at least partially addressed the problem. The GOP also supported the erosion of FCC authority in the net neutrality repeal, which also opened the door to greater abuse. Still, when the check comes due for those policy moves, notice how the outrage is suddenly bipartisan:

“This is terrifying,? said Senator Josh Hawley, Republican of Missouri, who has called for the federal government take a tougher stance with tech companies. ?It is terrifying not just because of the major national security implications, what Beijing could get ahold of. But it also raises personal privacy concerns for individuals and families. These companies are tracking our kids.”

?Tech companies are profiting by spying on Americans ? trampling on the right to privacy and risking our national security,? Senator Elizabeth Warren, a Democrat running for president, told us. ?They are throwing around their power to undermine our democracy with zero consequences. This report is another alarming case for why we need to break up big tech, adopt serious privacy regulations and hold top executives of these companies personally responsible.?

The FCC’s supposed investigation into carrier location data sales appears to be stuck in neutral, with growing concerns the agency is running out the clock to avoid having to hold industry accountable. There’s no real effort to craft rules that prohibit the widespread collection and sale of such data, and most policy conversations remain fixated exclusively on big tech, despite the problem also being rampant in big telecom. Meanwhile, our quest for an actual US privacy law in the internet are remains stuck in neutral, in large part because industry doesn’t want to lose billions to consumers opting out of having their every waking moment monetized.

Still, the Times report (which many gearing up for the holidays won’t read) may help finally dislodge some of this apathy and drive some actual, fact-based awareness of the real scope of the problem, maybe someday resulting in actual, serious proposals to fix it.

Filed Under: anonymized data, anonymous data, data brokers, datasets, donald trump, location data, privacy

Once More With Feeling: 'Anonymized' Data Is Not Really Anonymous

from the nothing-to-see-here dept

Tue, Jul 30th 2019 06:38am - Karl Bode

As companies and governments increasingly hoover up our personal data, a common refrain to keep people from worrying is the claim that nothing can go wrong because the data itself is “anonymized” or stripped of personal detail. But time and time again, we’ve noted how this really is cold comfort; given it takes only a little effort to pretty quickly identify a person based on access to other data sets. Yet most companies (including cell phone companies that sell your location data) act as if “anonymizing” your data is iron-clad protection from having it identified. It’s simply not true.

The latest case in point: in new research published this week in the journal Nature Communications, data scientists from Imperial College London and UCLouvain found that it wasn’t particularly hard for companies (or, anybody else) to identify the person behind “anonymized” data using other data sets. More specifically, the researchers developed a machine learning model that was able to correctly re-identify 99.98% of Americans in any anonymised dataset using just 15 characteristics including age, gender and marital status:

“While there might be a lot of people who are in their thirties, male, and living in New York City, far fewer of them were also born on 5 January, are driving a red sports car, and live with two kids (both girls) and one dog,? explained study first author Dr Luc Rocher, from UCLouvain.”

And using fifteen datasets is actually pretty high for this sort of study. One investigation of “anonymized” user credit card data by MIT found that users could be correctly “de-anonymized” 90 percent of the time using just four relatively vague points of information. Another study looking at vehicle data found that 15 minutes? worth of data from just brake pedal use could lead them to choose the right driver, out of 15 options, 90% of the time.

The problem, of course, comes when multiple leaked data sets are released in the wild and can be cross referenced by attackers (state sponsored or otherwise), de-anonymized, then abused. The researchers in this new study were quick to proclaim how government and industry proclamations of “don’t worry, it’s anonymized!” are dangerous and inadequate:

“Companies and governments have downplayed the risk of re-identification by arguing that the datasets they sell are always incomplete,? said senior author Dr Yves-Alexandre de Montjoye, from Imperial?s Department of Computing, and Data Science Institute. “Our findings contradict this and demonstrate that an attacker could easily and accurately estimate the likelihood that the record they found belongs to the person they are looking for.”

It’s not clear how many studies like this we need before we stop using “anonymized” as some kind of magic word in privacy circles, but it’s apparently going to need to be a few dozen more.

Filed Under: anonymity, anonymized data, data, privacy

Once Again With Feeling: 'Anonymized' Data Isn't Really Anonymous

from the we-can-see-you dept

Fri, Aug 4th 2017 03:35pm - Karl Bode

For years, the companies that hoover up your internet browsing and other data have proclaimed that you don’t really have anything to worry about, because the data collected on you is “anonymized.” In other words, because the data collected about you is assigned a random number and not your name, you should be entirely comfortable with everything from your car to your smart toaster hoovering up your daily habits and selling them to the highest bidder. But studies have repeatedly shown that it only takes a few additional contextual clues to flesh out individual identities. So in an era of cellular location, GPS, and even smart electricity data collection, it doesn’t take much work to build a pretty reliable profile on who you are and what you’ve been up to.

The latest case in point: German journalist Svea Eckert and data scientist Andreas Dewes recently descended upon Defcon to once again make this point, releasing a new report highlighting how “anonymous” browsing data is anything but. The duo found it relatively trivial to obtain clickstream browsing data from numerous companies simply by posing as a fake marketing company, replete with a website filled with ?many nice pictures and some marketing buzzwords.” Ironically, some of this data was gleaned from companies that profess to offer you additional layers of privacy, including ?safe surfing? tool Web of Trust.

It didn’t take long before the pair was able to obtain a database containing more than 3 billion URLs from roughly three million German internet users, spread across roughly 9 million different websites. However easy obtaining the “private” and “anonymous” browsing data was, using this data to quickly and easily identify individual users was even easier:

“Dewes described some methods by which a canny broker can find an individual in the noise, just from a long list of URLs and timestamps. Some make things very easy: for instance, anyone who visits their own analytics page on Twitter ends up with a URL in their browsing record which contains their Twitter username, and is only visible to them. Find that URL, and you?ve linked the anonymous data to an actual person. A similar trick works for German social networking site Xing.”

The pair also highlighted how repetitive visitation of websites specific to you (your bank, your hobbies, your neighborhood) help further narrow down your identity:

“For other users, a more probabilistic approach can deanonymise them. For instance, a mere 10 URLs can be enough to uniquely identify someone ? just think, for instance, of how few people there are at your company, with your bank, your hobby, your preferred newspaper and your mobile phone provider. By creating ?fingerprints? from the data, it?s possible to compare it to other, more public, sources of what URLs people have visited, such as social media accounts, or public YouTube playlists.”

Of course this is nothing new, and researchers have been making this precise point for several years now. Princeton researcher Arvind Narayanan in particular has been warning that anonymous data isn’t really anonymous for the better part of the last decade, yet somehow the message never seems to resonate, and everyone from broadband providers to internet of things companies continue to pretend that “anonymization” of data is some kind of impenetrable, mystical firewall preventing companies or hackers from identifying you.

Filed Under: anonymized data, privacy

One More Time With Feeling: 'Anonymized' User Data Not Really Anonymous

from the we-can-see-you dept

Thu, Jan 26th 2017 02:53pm - Karl Bode

As companies and governments increasingly hoover up our personal data, a common refrain to keep people from worrying is the claim that nothing can go wrong — because the data itself is “anonymized” — or stripped of personal detail. But time and time again, we’ve noted how this really is cold comfort; given it takes only a little effort to pretty quickly identify a person based on access to other data sets. As cellular carriers in particular begin to collect every shred of browsing and location data, identifying “anonymized” data using just a little additional context has become arguably trivial.

Researchers from Stanford and Princeton universities plan to make this point once again via a new study being presented at the World Wide Web Conference in Perth, Australia this upcoming April. According to this new study, browsing habits can be easily linked to social media profiles to quickly identify users. In fact, using data from roughly 400 volunteers, the researchers found that they could identify the person behind an “anonymized” data set 70% of the time just by comparing their browsing data to their social media activity:

“The programs were able to find patterns among the different groups of data and use those patterns to identify users. The researchers note that the method is not perfect, and it requires a social media feed that includes a number of links to outside sites. However, they said that “given a history with 30 links originating from Twitter, we can deduce the corresponding Twitter profile more than 50 percent of the time.”

The researchers had even greater success in an experiment they ran involving 374 volunteers who submitted web browsing information. The researchers were able to identify more than 70 percent of those users by comparing their web browsing data to hundreds of millions of public social media feeds.

Of course, with the sophistication of online tracking and behavior ad technology, this shouldn’t be particularly surprising. Numerous researchers likewise have noted it’s relatively simple to build systems that identify users with just a little additional context. That, of course, raises questions about how much protection “anonymizing” data actually has in both business practice, and should this data be hacked and released in the wild:

“Yves-Alexandre de Montjoye, an assistant professor at Imperial College London, said the research shows how “easy it is to build a full-scale ‘de-anonymizationer’ that needs nothing more than what’s available to anyone who knows how to code.” “All the evidence we have seen piling up over the years showing the strong limits of data anonymization, including this study, really emphasizes the need to rethink our approach to privacy and data protection in the age of big data,” said de Montjoye.

And this doesn’t even factor in how new technologies — like Verizon’s manipulation of user data packets — allow companies to build sophisticated new profiles based on the combination of browsing data, location data, and modifying packet headers. The FCC’s recently-passed broadband privacy rules were designed in part to acknowledge these new efforts, by allowing user data collection — but only if this data was “not reasonably linkable” to individual users. But once you realize that all data — “anonymized” or not — is linkable to individual users, such a distinction becomes wholly irrelevant.

One of the study’s authors, Princeton researcher Arvind Narayanan, has been warning that anonymous data isn’t really anonymous for the better part of the last decade, yet it’s not entirely clear when we intend to actually hear — and understand — his message.

Filed Under: anonymized data, anonymous, privacy

There Is No Such Thing As Anonymized Data, Google

from the barely-appeasing dept

With the news out that Google and Viacom have come to an agreement to “anonymize” the data a judge ordered Google to hand over, it’s worth remembering a simple, but important statement: there’s no such thing as a truly anonymized dataset. While it may protect some users, it’s still likely to reveal some users and what they surfed. Given all of this, it’s still quite unclear why Viacom needs this data in the first place. The legal question is whether Google infringed on copyright. Why should Google’s log files be necessary to determine that?

Filed Under: anonymized data, logfiles
Companies: google, viacom, youtube