Why CAPTCHAs have gotten so difficult (original) (raw)

At some point last year, Google’s constant requests to prove I’m human began to feel increasingly aggressive. More and more, the simple, slightly too-cute button saying “I’m not a robot” was followed by demands to prove it — by selecting all the traffic lights, crosswalks, and storefronts in an image grid. Soon the traffic lights were buried in distant foliage, the crosswalks warped and half around a corner, the storefront signage blurry and in Korean. There’s something uniquely dispiriting about being asked to identify a fire hydrant and struggling at it.

These tests are called CAPTCHA, an acronym for Completely Automated Public Turing test to tell Computers and Humans Apart, and they’ve reached this sort of inscrutability plateau before. In the early 2000s, simple images of text were enough to stump most spambots. But a decade later, after Google had bought the program from Carnegie Mellon researchers and was using it to digitize Google Books, texts had to be increasingly warped and obscured to stay ahead of improving optical character recognition programs — programs which, in a roundabout way, all those humans solving CAPTCHAs were helping to improve.

All those awnings that may or may not be storefronts? They’re the endgame in humanity’s arms race with the machines.

Because CAPTCHA is such an elegant tool for training AI, any given test could only ever be temporary, something its inventors acknowledged at the outset. With all those researchers, scammers, and ordinary humans solving billions of puzzles just at the threshold of what AI can do, at some point the machines were going to pass us by. In 2014, Google pitted one of its machine learning algorithms against humans in solving the most distorted text CAPTCHAs: the computer got the test right 99.8 percent of the time, while the humans got a mere 33 percent.

Google then moved to NoCaptcha ReCaptcha, which observes user data and behavior to let some humans pass through with a click of the “I’m not a robot” button, and presents others with the image labeling we see today. But the machines are once again catching up. All those awnings that may or may not be storefronts? They’re the endgame in humanity’s arms race with the machines.

Jason Polakis, a computer science professor at the University of Illinois at Chicago, takes personal credit for the recent increase in CAPTCHA difficulty. In 2016, he published a paper in which he used off-the-shelf image recognition tools, including Google’s own reverse image search, to solve Google’s image CAPTCHAs with 70 percent accuracy. Other researchers have broken Google’s audio CAPTCHA challenges using Google’s own audio recognition programs.

Machine learning is now about as good as humans at basic text, image, and voice recognition tasks, Polakis says. In fact, algorithms are probably better at it: “We’re at a point where making it harder for software ends up making it too hard for many people. We need some alternative, but there’s not a concrete plan yet.”

The problem with many of these tests isn’t necessarily that bots are too clever — it’s that humans suck at them

The literature on CAPTCHA is littered with false starts and strange attempts at finding something other than text or image recognition that humans are universally good at and machines struggle with. Researchers have tried asking users to classify images of people by facial expression, gender, and ethnicity. (You can imagine how well that went.) There have been proposals for trivia CAPTCHAs, and CAPTCHAs based on nursery rhymes common in the area where a user purportedly grew up. Such cultural CAPTCHAs are aimed not just at bots, but at the humans working in overseas CAPTCHA farms solving puzzles for fractions of a cent. People have tried stymying image recognition by asking users to identify, say, pigs, but making the pigs cartoons and giving them sunglasses. Researchers have looked into asking users to identify objects in Magic Eye-like blotches. In an intriguing variation, researchers in 2010 proposed using CAPTCHAs to index ancient petroglyphs, computers not being very good at deciphering gestural sketches of reindeer scrawled on cave walls.

Recently there have been efforts to develop game-like CAPTCHAs, tests that require users to rotate objects to certain angles or move puzzle pieces into position, with instructions given not in text but in symbols or implied by the context of the game board. The hope is that humans would understand the puzzle’s logic but computers, lacking clear instructions, would be stumped. Other researchers have tried to exploit the fact that humans have bodies, using device cameras or augmented reality for interactive proof of humanity.

The problem with many of these tests isn’t necessarily that bots are too clever — it’s that humans suck at them. And it’s not that humans are dumb; it’s that humans are wildly diverse in language, culture, and experience. Once you get rid of all that stuff to make a test that any human can pass, without prior training or much thought, you’re left with brute tasks like image processing, exactly the thing a tailor-made AI is going to be good at.

“The tests are limited by human capabilities,” Polakis says. “It’s not only our physical capabilities, you need something that [can] cross cultural, cross language. You need some type of challenge that works with someone from Greece, someone from Chicago, someone from South Africa, Iran, and Australia at the same time. And it has to be independent from cultural intricacies and differences. You need something that’s easy for an average human, it shouldn’t be bound to a specific subgroup of people, and it should be hard for computers at the same time. That’s very limiting in what you can actually do. And it has to be something that a human can do fast, and isn’t too annoying.”

Figuring out how to fix those blurry image quizzes quickly takes you into philosophical territory: what is the universal human quality that can be demonstrated to a machine, but that no machine can mimic? What is it to be human?

But maybe our humanity isn’t measured by how we perform with a task, but in how we move through the world — or in this case, through the internet. Game CAPTCHAs, video CAPTCHAs, whatever sort of CAPTCHA test you devise will eventually be broken, says Shuman Ghosemajumder, who previously worked at Google combatting click fraud before becoming the chief technology officer of the bot-detection company Shape Security. Rather than tests, he favors something called “continuous authentication,” essentially observing the behavior of a user and looking for signs of automation. “A real human being doesn’t have very good control over their own motor functions, and so they can’t move the mouse the same way more than once over multiple interactions, even if they try really hard,” Ghosemajumder says. While a bot will interact with a page without moving a mouse, or by moving a mouse very precisely, human actions have “entropy” that is hard to spoof, Ghosemajumder says.

Google’s own CAPTCHA team is thinking along similar lines. The latest version, reCaptcha v3, announced late last year, uses “adaptive risk analysis” to score traffic according to how suspicious it seems; website owners can then choose to present sketchy users with a challenge, like a password request or two-factor authentication. Google wouldn’t say what factors go into that score, other than that Google observes what a bunch of “good traffic” on a site looks like, according to Cy Khormaee, a product manager on the CAPTCHA team, and uses that to detect “bad traffic.” Security researchers say it’s likely a mix of cookies, browser attributes, traffic patterns, and other factors. One drawback of the new model of bot detection is that it can make navigating the web while minimizing surveillance an annoying experience, as things like VPNs and anti-tracking extensions can get you flagged as suspicious and challenged.

“I think folks are realizing that there is an application for simulating the average human user... or dumb humans.”

Aaron Malenfant, the engineering lead on Google’s CAPTCHA team, says the move away from Turing tests is meant to sidestep the competition humans keep losing. “As people put more and more investment into machine learning, those sorts of challenges will have to get harder and harder for humans, and that’s particularly why we launched CAPTCHA V3, to get ahead of that curve.” Malenfant says that five to ten years from now, CAPTCHA challenges likely won’t be viable at all. Instead, much of the web will have a constant, secret Turing test running in the background.

In his book The Most Human Human, Brian Christian enters a Turing Test competition as the human foil and finds that it’s actually quite difficult to prove your humanity in conversation. On the other hand, bot makers have found it easy to pass, not by being the most eloquent or intelligent conversationalist, but by dodging questions with non sequitur jokes, making typos, or in the case of the bot that won a Turing competition in 2014, claiming to be a 13-year-old Ukrainian boy with a poor grasp of English. After all, to err is human. It’s possible a similar future is in store for CAPTCHA, the most widely used Turing test in the world — a new arms race not to create bots that surpass humans in labeling images and parsing text, but ones that make mistakes, miss buttons, get distracted, and switch tabs. “I think folks are realizing that there is an application for simulating the average human user... or dumb humans,” Ghosemajumder says.

CAPTCHA tests may persist in this world, too. Amazon received a patent in 2017 for a scheme involving optical illusions and logic puzzles humans have great difficulty in deciphering. Called Turing Test via failure, the only way to pass is to get the answer wrong.