The Scunthorpe Problem, And Why AI Is Not A Silver Bullet For Moderating Platform Content At Scale (original) (raw)

from the what's-in-a-name dept

Maybe someday AI will be sophisticated, nuanced, and accurate enough to help us with platform content moderation, but that day isn’t today.

Today it prevents an awful lot of perfectly normal and presumably TOS-abiding people from even signing up for platforms. A recent tweet from someone unable to sign up to use an app because it didn’t like her name, as well as many, many, MANY replies from people who’ve had similar experiences, drove this point home:

Been there

— Matt Cummings (@MattCummingsDB) August 29, 2018

As a person named James Butts, I know these problems.

— James (@justjames8) August 28, 2018

As a Dickman I know the struggle is real

— Mike Dickman (@TheMikeDickman) August 29, 2018

I get this a lot surprisingly

— Kyle Medick (@medick32) August 28, 2018

We have quite similar circumstances here

— Jacob Cockrill (@jacob_cockrill) August 29, 2018

Tom Hiscock reporting in.

— aWildWatermelon (@aWildWatermelon) August 29, 2018

Uhm, my name is Analise. That?s exact spelling. Been through this many of times.

— AP (@aannpp23) August 29, 2018

Join the club

— Craig Cockburn (@siliconglen) August 29, 2018

Oh! Am I too late to join this club?

— James Ho (@IndieVideoJames) August 29, 2018

Happens to me often, as you can imagine.

— MatthewDicks (@MatthewDicks) August 29, 2018

Facebook, despite its insistence on users using real names, seems particularly bad at letting people actually use their real names.

A large part of my family uses a shortened form of our last name because many places, including Facebook, don't think Buckmaster is a real last name.

But Buck, Buckbuck, Bucker, Bucky and many more are all "real" >.>

— The Autistech (@theAutistech) August 28, 2018

Yeah – Facebook won't allow my real name of "Talks" so had to come up with something else. Although my wife's account is okay …

It gets better because when Collette Talks put "in a relationship with Mike Torkelson", basically the family gossip went into overdrive!

— Mike Talks ??? (@TestSheepNZ) August 29, 2018

My last name is Player and Facebook still won?t let me have that as a last name because it?s a ?street name.?

— Sav (@TheSavannahOW) August 29, 2018

I have family members who use alternate names on Facebook because it wouldn't accept Lick

— Chris Hannas (@cjhannas) August 29, 2018

But of course, Facebook is not the only instance where censorship rules based on bare pattern matching interfere not just with speech but with speaker’s ability to even get online to speak.

Can?t even create my own player in a Madden franchise. Smh.

— Ben Schmuck (@benschmuck13) August 28, 2018

Ha! I had the same damn thing happen to me today when I tried to RSVP for a webinar.

— Jen Dick (@Jennifer_Dick) August 28, 2018

You're right in there with Alan Cumming, the actor, whose name was autocensored by the late City of Heroes MMO's official forums. (The COH forums also auto-nixed Dick Grayson, which was… amusing… on a forum where superheroes got discussed a lot.)

— The Phantom of the Ottoman (@zgryphon) August 28, 2018

This dynamic is what’s known as the Scunthorpe Problem. Scunthorpe is a town in the UK whose residents have had an appallingly difficult time using the Internet due to a naughty word being contained within the town name.

The Scunthorpe problem is the blocking of e-mails, forum posts or search results by a spam filter or search engine because their text contains a string of letters that are shared with another (usually obscene) word. While computers can easily identify strings of text within a document, broad blocking rules may result in false positives, causing innocent phrases to be blocked.

The problem was named after an incident in 1996 in which AOL’s profanity filter prevented residents of the town of Scunthorpe, North Lincolnshire, England from creating accounts with AOL, because the town’s name contains the substring cunt. Years later, Google’s opt-in SafeSearch filters apparently made the same mistake, preventing residents from searching for local businesses that included Scunthorpe in their names.

(A related dynamic, the Clbuttic Problem, creates issues of its own when, instead of outright blocking, software automatically replaces the allegedly naughty words with ostensibly less-naughty words instead. People attempting to discuss such non-purient topics as Buttbuttin’s Creed and the Lincoln Buttbuttination find this sort of officious editing particularly unhelpful?)

While examples of these dynamics can be amusing, each is also quite chilling to speech, and to speakers wishing to speak.

With the last name ?Dicks?, I have to remind people to check their spam folder more often than a Nigerian prince.

— Chain of Lynx (@chainoflynx) August 28, 2018

The word Spam is literally in my last name. My husband?s family warned me that my last name can/will be marked as spam.

— Angela Spampata (@bird5445) August 29, 2018

Used to work with a lady whose last name is Wang, and it took us a few days to add exceptions to all the email filters

— Destroyer of Jeeps (@NewKindOfClown) August 29, 2018

It’s not something we should be demanding more of, but every time people call for “AI” as a solution to online content challenges these are the censoring problems the call invites.

A big part of the problem is that calls for “AI” tend to treat it like some magical incantation, as if just adding it will solve all our problems. But in the end, AI is just software. Software can be very good at doing certain things, like finding patterns, including patterns in words (and people’s names?). But it’s not good at necessarily knowing what to make of those patterns.

pic.twitter.com/5U0a1Yk8Yv

— michelle ? (@tenderdamie) August 29, 2018

Our net Nanny at work flagged a co-worker for offensive language. He dealt with a lot of crane contractors. Net nanny told his boss he was sending lots of emails with the word erection. Lol.

— GoGoATL (@GoGoATL) August 28, 2018

More sophisticated software may be better at understanding context, or even sometimes learning context, but there are still limits to what we can expect from these tools. They are at best imperfect reflections of the imperfect humans who created them, and it’s a mistake to forget that they have not yet replicated, or replaced, human judgment, which itself is often imperfect.

Which is not to say that there is no role for software to help in content moderation. The things that software is good at can make it an important tool to help support human decision-making about online content, especially at scale. But it is a mistake to expect software to supplant human decision-making. Because, as we see from these accruing examples, when we over-rely on them, it ends up being real humans that we hurt.

Had this on a website for the kids, the kids demanded to know why, our last name is ?Clithero? Interesting conversation. ?

— DougHero ?? (@ClitheroDoug) August 29, 2018

I know that feel pic.twitter.com/nMbjfTKGcZ

— Nazi Paikidze-Barnes (@NaziPaiki) August 29, 2018

Filed Under: ai, artificial intelligence, content moderation, language, natalie weiner, scunthorpe