Traffic Is Fake, Audience Numbers Are Garbage, And Nobody Knows How Many People See Anything (original) (raw)

from the stabs-in-the-dark dept

How many living, breathing human beings really read Techdirt? The truth — the most basic, rarely-spoken truth — is that we have no earthly idea. With very few exceptions, no media property big or small, new or old, online or off, can truly tell you how big its audience is. They may have never thought about it that way — after all, we all get as close as we can to what we think is a reasonably accurate estimation, though we have no way of confirming that — but all these numbers are actually good for (maybe) is relative comparisons. What does it really mean when someone says “a million people” saw something? Or ten or a hundred million? I don’t know, and neither do you. (Netflix might, but we’ll get to that later.)

Where should we start? How about this: internet traffic is half-fake and everyone’s known it for years, but there’s no incentive to actually acknowledge it. The situation is technically improving: 2015 was hailed (quietly, among people who aren’t in charge of selling advertising) as a banner year because humans took back the majority with a stunning 51.5% share of online traffic, so hurray for that I guess. All the analytics suites, the ad networks and the tracking pixels can try as they might to filter the rest out, and there’s plenty of advice on the endless Sisyphean task of helping them do so, but considering at least half of all that bot traffic comes from bots that fall into the “malicious” or at least “unauthorized” category, and thus have every incentive to subvert the mostly-voluntary systems that are our first line of defence against bots… Well, good luck. We already know that Alexa rankings are garbage, but what does this say about even the internal numbers that sites use to sell ad space? Could they even be off by a factor of 10? I don’t know, and neither do you. Hell, we don’t even know how accurate the 51.5% figure is — it could be way off… in either direction.

Okay, so what about TV ratings? Well, there’s a reason they’ve been made fun of on the shows themselves for as long as our culture has been able to handle “meta” jokes without getting a headache. Nielsen ratings in their classic form are built on monitoring such a tiny sample of households that the whole country’s viewing profile can probably be swayed because someone forgot to turn off the TV before going on vacation. They sucked before DVRs and digital distribution began transforming the single household television into a quaint anachronism, and now it’s just chaos. Nielsen was slow to catch up with DVRs, and now the TV industry juggles scattered measurements including three or seven days of viewing beyond live air, and constantly complains that the ratings are off — specifically, that they’re too low. And they might be right, in the sense that they are too low by comparison to the garbage ratings from the pre-digital age that everyone eventually embraced as a standard for relative rankings. How big are these audiences really, in terms of real living breathing human beings? I don’t know, and neither do you.

YouTube view counts? Subject to all the same fake internet traffic problems, plus the fact that there’s an opaque system for supposedly ignoring too-short incomplete views according to the genre and nature of the video, but good luck finding out how accurate that is. Channel operators know their length-of-view statistics, but you don’t see them bandying them about much. Plus, how often have you heard public view counts casually referred to as the number of “people” who watched something, even though (especially when it comes to short-and-cute viral animal hits and their ilk) the bulk of them probably come from obsessive re-watching? Yeah.

So what about Facebook stats? Everything from impressions to simultaneous live video viewers is padded out by the most transient of idly-scrolling-through-the-newsfeed interactions. Twitter followings and tweet stats? Dig into the bowels of any list of followers, or any trending link, and see how much of it is mindless bots. Print readerships? Don’t even get me started. Did you know it’s common practice for newspapers to calculate their readership by applying a multiplier to their actual circulation, to account for an imaginary surplus of “readers per copy”? Yes, that soggy “local” paper that’s been sitting out in the rain on your porch for two days, and that only exists to give them an excuse to deliver flyers to your door, is not only being counted — it’s probably being counted five times. So are all the free/cheap copies that big national papers give to hotels. Oh, and when these companies distribute multiple publications in different channels — with newspapers, magazines and paywalled websites all being given away with each other as free cross-subscriptions, in order to pad out all three subscriber numbers — they add them all up and then try to determine the actual number of individual people they are reaching. How? By applying an opaque “deduplication” formula. I once pressed a newspaper’s stats person about what this formula could possibly entail, but details were not forthcoming — because I suspect they just knock off 20% and call it a day, despite the fact that the magazine is distributed inside the newspaper whose audience they are supposedly “deduplicating” it from, and half the website subscriptions were free add-ons with print delivery. That’s awfully generous when the truth is they don’t know, and neither do I, and neither do you.

So who does know how big of an audience they really have? Well, maybe Netflix, Amazon and other digital subscription services. Their paywalls insulate them from the bulk of random bot traffic, and their proprietary ecosystems give them the ability to closely monitor all activity. Netflix, of course, is famously secretive about viewer numbers and insists on the inaccuracy of those who claim to have worked them out. The most common assumption is that they do this to avoid giving content creators too much leverage, and because the data can be seen as a valuable commodity — but I propose another reason: Netflix’s likely-more-accurate statistics, if made public, would have zero context in the topsy-turvy world of nonsense TV ratings. They would probably look exceptionally low, giving the legacy bosses who would like nothing more than to downplay the importance of digital distribution (and there are as many of those as there are record execs who can’t spell mp3) a chance to project whatever narrative they wanted onto the numbers.

So why does any of this matter? Because advertising is a multibillion dollar industry, and whenever an industry is worth that much, you have to ask: is that because there are billions of dollars of worthwhile transactions happening, or because every bloodsucker in a ten-industry radius wanted in on the action? So, so much of the advertising industry is pure waste. How much exactly is as impossible to determine as the audience sizes themselves. This is hardly a new idea (in fact it’s a century-old quote) but it’s probably more true now than ever, despite the fact that in theory technology could have delivered us from uncertainty.

Finally, what can be done about this? There’s no simple answer, and maybe no answer at all. Here at Techdirt, we’ve been working to come up with good advertising solutions by focusing almost entirely on what we know our community likes and might be interested in (as in, our real community of people who talk in our comments and we can say, with confidence, exist) and paying less attention to raw numbers — both a luxury and a necessity for a smaller publication, depending on how you look at it. That’s not always easy though, as we face an advertising industry ruled by metrics, where there are often ten spreadsheet-wielding interns between us and someone who might actually care about our creativity. In our experiments with more traditional algorithmic display advertising to monetize the raw traffic numbers we do have, we keep running up against what appears to be a universal truth: the bulk of the global internet ad ecosystem runs on trash. Gigantic prestigious online media brands can sell display campaigns straight to the same people who buy Superbowl ads — everyone else receives a hundred pitches a week from new ad networks that claim to deliver great, relevant content but in fact litter your site with ads for fad diets and ambulance-chasers (at best). And this lowest-common-denominator filler appears to be the only reliably successful form of internet advertising! At least, it never goes away when the good stuff does, and the proud quality networks eventually embrace their roles as crap-peddlers. “Good” internet advertising is a rickety ship navigating an endless roiling ocean of spam, clickbait and outright fraud — but it couldn’t float at all without it.

I realize I’ve painted a grim picture, but these are (more or less) the facts. I’m surely wrong in some of my guesses, but like everything discussed here, nobody knows how wrong or in which direction. We’ll never even really know how many people read this — we’ll just have a vague estimate that can be compared to other posts on Techdirt. But for now that’s the reality, so maybe more people should stop worrying about the supposed size of their audience, and focus on making the content they want to make.

Filed Under: audience, internet, measurements, statistics, traffic