The great question of the 21st century: Whose black box do you trust? (original) (raw)

Some years ago, John Mattison, the chief medical information officer of Kaiser Permanente, the large integrated health provider, said to me, “The great question of the 21st century is going to be ‘Whose black box do you trust?'” Mattison was talking about the growing importance of algorithms in medicine, but his point, more broadly, was that we increasingly place our trust in systems whose methods for making decisions we do not understand. (A black box, by definition, is a system whose inputs and outputs are known, but the system by which one is transformed to the other is unknown.)

A lot of attention has been paid to the role of algorithms in shaping the experience of consumers. Much less attention has been paid to the role of algorithms in shaping the incentives for business decision-making.

Learn faster. Dig deeper. See farther.

For example, there has been hand-wringing for years about how algorithms shape the news we see from Google or Facebook. Eli Pariser warned of a “filter bubble,” in which the algorithm takes account of our preferences and continues to feed us more of what we already want to hear, rather than exposing us to other points of view. This is a real risk—though one that search engines and social media companies are making efforts to overcome.

But there’s a deeper, more pervasive risk that came out in a conversation I had recently with Chris O’Brien of VentureBeat. And that is the way that algorithms also shape the choices made by writers and publishers. Do you write and publish what you think is most newsworthy, or what will get the most attention on social media? Do you use the format that will do the most justice to the subject (a deep, authoritative piece of research, a so called “longread”), or do you decide that it’s more profitable to harvest attention with short, punchy articles that generate higher views and more advertising dollars? Do you choose video over text, even when text would let you do a better job?

The need to get attention from search engines and social media is arguably one factor in the dumbing down of news media and a style of reporting that leads even great publications to a culture of hype, fake controversies, and other techniques to drive traffic. The race to the bottom in coverage of the U.S. presidential election is a casualty of the primary shift of news industry revenue from subscription to advertising and from a secure base of local readers to chasing readers via social media. You must please the algorithms if you want your business to thrive.

O’Brien also spoke of the difficulties today’s media reporters have in navigating competing demands of the algorithms that determine whether or not their stories will be seen. Do you optimize for Google search results or Facebook newsfeed results? What happens when the needs of the two different algorithms conflict, or when they change suddenly?

When Google was the only game in town, search engine optimization (SEO) was fairly straightforward. Google provided a wealth of tools to help web publishers understand what kinds of things its algorithm valued, and what kinds of things would send up red flags. There was a whole industry devoted to helping web publishers do it right (“White Hat SEO”) and another devoted to helping unscrupulous publishers skirt the rules (“Black Hat SEO.”) One form of Black Hat SEO was to develop “content farms,” vast collections of cross-linked low-quality content (often scraped from other sites) that fooled the algorithms into thinking they were highly regarded. In 2011, when Google rejiggered their algorithm to downgrade content farms, many companies who’d been following this practice were badly hurt. Many went out of business (as well they should), and others had to improve their business practices to survive.

Publishers targeting Facebook recently went through a similar experience, when Facebook announced last month that it was updating its News Feed algorithm to de-emphasize stories with “clickbait” headlines (headlines that tease the user with a promise that is not met by the content of the actual article). Facebook’s goal is a laudable one, just like Google’s: to create a better user experience. As Facebook researchers Alex Peysakhovitch and Kristin Hendrix wrote in the announcement, “One of our News Feed values is to have authentic communication on our platform. … That’s why we work hard to understand what type of stories and posts people consider genuine, so we can show more of them in News Feed. We also work to understand what kinds of stories people find misleading and spammy to help make sure people see those less.”

As Warren Buffet is reputed to have said, “It takes 20 years to build a reputation and five minutes to ruin it. If you think about that, you’ll do things differently.” Google and Facebook both understand that their reputations depend on people finding what they want. Both use the concept of the “long click” and the “short click” as one way of measuring this. (If someone clicks on a link, and then comes right back, they didn’t find it very interesting. If they are gone for a while and then come back, they most likely spent some time perusing the result. This is a pretty good signal that they found it worthwhile.)

Here we get to the black box bit. According to Facebook’s VP of product management on News Feed, Adam Mosseri, as reported by TechCrunch, “Facebook won’t be publicly publishing the multi-page document of guidelines for defining clickbait because ‘a big part of this is actually spam, and if you expose exactly what we’re doing and how we’re doing it, they reverse engineer it and figure out how to get around it.'”

Because many of the algorithms that shape our society are black boxes—either for reasons like those cited by Facebook, or because they are, in the world of deep learning, inscrutable even to their creators—that question of trust is key.

Understanding how to evaluate algorithms without knowing the exact rules they follow is a key discipline in today’s world. And it is possible. Here are my four rules for evaluating whether you can trust an algorithm:

  1. Its creators have made clear what outcome they are seeking, and it is possible for external observers to verify that outcome.
  2. Success is measurable.
  3. The goals of the algorithm’s creators are aligned with the goals of the algorithm’s consumers.
  4. Does the algorithm lead its creators and its users to make better longer term decisions?

Let’s consider a couple of examples.

Google Search and Facebook News Feed

Continuing the discussion above, you can see the application of my four principles to Google Search and the Facebook News Feed:

  1. Clarity of intended outcome. Both Google and Facebook have stated explicitly that their algorithms prioritize the interests of users over the interests of advertisers or publishers. Because the goal is clearly stated, it is easy to raise questions when this does not appear to be the case. The clarity of the statement makes it easier to evaluate whether the algorithm is achieving its goal.
  2. Measurability. Silicon Valley companies have made an art of A/B testing and finding ways to measure when their algorithms are meeting their objectives. Google, for example, has a search quality team that uses thousands of “Mechanical Turk” style reviewers to give a thumbs up or thumbs down to search results, but their more important measurements are those that are based on actual user behavior, like long clicks versus short clicks, or whether people click first on the top result, or the second, or the 10th. In the case of advertising, Google builds trust by providing advertising tools that estimate how many clicks that an ad will get, and a business model that actually charges only for clicks. This measurability is what drove the financial success of Google, as the pay-per-click model of advertising is so much more measurable than the pay-for-pageview model that preceded it. (Notably, Facebook doesn’t have the same kind of pay-per-click model; they offer not even the equivalent of page views, but rather “reach”—defined as the number of people in whose newsfeed your post appeared. Whether it was seen or not is unknown. They do also provide a metric for engagement—people who clicked through, shared, or otherwise reacted to what you posted.)
  3. Goal alignment. Over the long term, there is fairly high goal alignment between Google or Facebook and their users. If they consistently show users content they don’t want to see, those users will eventually stop using the service. There is also fairly high goal alignment between these services and their advertisers. If the ads don’t deliver, the customers will no longer buy them. But there is a potential goal divergence between the services and publishers of content. There are strong incentives to game the system to get more visibility, even if the kind of content being produced is not optimal for users. Google faced this with content farms, Facebook and other social media with clickbait headlines and listicles. It becomes the job of the algorithmic manager to adjust the algorithms to deal with these opposing forces, just as the designer of an airplane autopilot must design its algorithms to deal with changing weather conditions.
  4. Long-term decisions. The alignment between the goals of the platform and the goals of its users holds for the short term. But does it hold for the long term?

Autonomous vehicles

With all the furor about autonomous cars and trucks, it’s easy to forget that we’ve had largely autonomous airplanes for quite some time. Anyone who flies has trusted his or her life to a robot. Yes, there are pilots up in the cabin, but they aren’t flying the plane as often as you might think. They are acting as “robot supervisors and backup mechanisms.” The pilots haven’t been replaced; they have been promoted to manager. They make executive decisions like “let’s change the altitude or the routing because air traffic control reports bad weather ahead” or “we have a medical emergency on board, so we need to land at the nearest airport that can accommodate our plane.” With military drones, these supervisors are still there as well. They are just on the ground, potentially thousands of miles away.

If you’re like me until a few months ago, you probably assume that the autopilot is kind of like cruise control—it flies the plane on long boring stretches, while the pilots do the hard stuff like takeoff and landing. Not so. On my way to StartupFest in Montreal, I had an extended conversation with the pilot of a jet (and even got to sit in the co-pilot’s seat and feel the minute adjustments to the controls that the autopilot made to keep the plane constantly on course.)

Figure 1. Image courtesy of Tim O’Reilly.

What the pilot told me was eye-opening, the reverse of what I expected. “There’s no way we could take off or land manually at a busy airport like San Francisco. If you aren’t precisely on time or at the right altitude, you mess things up for everyone else.” “When do you fly manually?” “When there’s no one else around.”

Let’s subject an airplane autopilot to my four tests:

  1. Clarity of intended outcome. Get the plane from point A to point B following a predefined route. Respond correctly to wind and weather, in accordance with known principles of aeronautics. Optimize for congestion at busy airports. Do not crash.
  2. Success is measurable. Achievement of the outcomes is made possible by a massive armature of sensors and controls that allow the autopilot to respond to real-time data from those sensors. GPS. Altitude sensors. Airspeed. Attitude. Turbulence. And the ultimate measurement is success in flying: the alignment between the actual behavior of the plane and the laws of physics and aeronautics. Whenever there is a failure (for any reason—human, mechanical, or “act of God”), the National Transportation Safety Board does a deep dive to analyze the causes and improve processes to reduce the chance that the same accident will recur.
  3. Goal alignment. No passenger would argue with these goals: do not crash. Get me there in the shortest possible time. Give me a smooth ride. But the passengers might argue with a decision by the airline to optimize for fuel consumption rather than travel time. And pilots would not likely be aligned with a goal that took away their jobs.
  4. Long-term decisions. Over the long term, there might be a little bit of daylight between the goals of airplane owners and pilots, or airline owners and society. For example, pilots might correctly argue that using the autopilot too much deprives them of necessary experience, increasing the likelihood of crashes when they unexpectedly have to fly the plane manually. It is also likely that the cost of upgrading aircraft to be fully autonomous is prohibitive. The fact that we still have pilots in aircraft is probably as much a testimony to the length of time it takes to replace high-cost equipment as it is to the fears of the public and the work of the Airline Pilot’s Association to defend the jobs of its members.

This same analysis can be performed for self-driving cars and trucks. The goal is clear: to avoid all accidents and to drive more safely than any human driver. The goal is measurable, and the systems to achieve that goal get better the more opportunity they have to learn. As Sebastian Thrun, one of the fathers of the autonomous vehicle industry, remarked on stage at my Next:Economy Summit last year, self-driving vehicles learn faster than any human, because whenever one of them makes a mistake, both the mistake and the way to avoid it can be passed along to every other vehicle.

In the case of self-driving cars and trucks, we can see that the arguments are ultimately likely to be in tests 3 and 4. I suspect that delays in adoption of autonomous vehicle technology is not primarily going to be because of safety concerns or the provable success of the algorithm, but because of the costs of changing out the vast installed base of cars and trucks, and the arguments that people who make their livings by driving will make for the continued need to keep “a human in the loop.”

The sooner we accept that everyone has a shared interest in determining whether autonomous vehicles are safe, we can start talking about what data needs to be shared in order to come to an objective answer to that question. And then we can start talking about what other goals we might have to consider. And once we understand where the goals of the proponents and the critics of an algorithm are not aligned, we can have the real argument about which goals make the most sense. In many areas, that argument happens in the market, in the great struggle that collectively expresses itself as Adam Smith’s “invisible hand.” But often, it happens in the form of government regulation.

Regulating new technologies

If you think about it, government regulations are also a kind of algorithm, a set of rules and procedures for achieving what should be a determinate outcome. Unfortunately, too often government regulations fail my four tests for whether you can trust an algorithm.

  1. Clarity of intended outcome. When regulations are promulgated, their intended result is typically stated. Only rarely is it done in a form that can be easily understood. New agencies such as U.K. Government Digital Service and the U.S. Consumer Finance Protection Bureau have made plain language a priority, and have demonstrated that it is possible to create regulations whose goals and implementations are as clear as the goals and implementations of Google Search Quality or Adwords quality. But this clarity is rare.
  2. Success is measurable. Regulations rarely include any provision for measuring or determining their effect. Measurement, if done at all, occurs only years later.
  3. Goal alignment. The goals of regulators and of consumers are often aligned—think, for example, of fire codes, which were instituted after the Triangle Shirtwaist Fire of 1911. (Carl Malamud gave a brilliant speech about the role of this NYC sweatshop fire in the development of safety codes at my Gov 2.0 Summit in 2009. It is rare for a conference speech to get a standing ovation. This is one of those speeches. The video is here.) But too often, regulations serve the needs of government rather than citizens, or of those with access to the regulatory process. Policy makers have come to accept the idea that rules are made to balance the competing interests of various parties rather than to serve the public—I still remember a conversation I had with former Speaker of the House Nancy Pelosi about the 2011 Stop Online Piracy Act. I made my case for the arguments against it as bad public policy, but her response told me what the real decision criteria were: “We have to balance the interests of the tech industry with the interests of Hollywood.”
  4. Long-term decisions. Over time, regulations get out of step with the needs of society. When regulations do not have their intended effects, they normally continue unabated. And new regulations are often simply piled on top of them.

Let’s start with a good example. In a regulatory proposal from the CFPB on Payday, Vehicle Title, and Certain High-Cost Installment Loans, we see a clear rationale for the regulation:

“The Bureau is concerned that lenders that make covered loans have developed business models that deviate substantially from the practices in other credit markets by failing to assess consumers’ ability to repay their loans and by engaging in harmful practices in the course of seeking to withdraw payments from consumers’ accounts. The Bureau believes that there may be a high likelihood of consumer harm in connection with these covered loans because many consumers struggle to repay their loans. In particular, many consumers who take out covered loans appear to lack the ability to repay them and face one of three options when an unaffordable loan payment is due: take out additional covered loans, default on the covered loan, or make the payment on the covered loan and fail to meet other major financial obligations or basic living expenses. Many lenders may seek to obtain repayment of covered loans directly from consumers’ accounts. The Bureau is concerned that consumers may be subject to multiple fees and other harms when lenders make repeated unsuccessful attempts to withdraw funds from consumers’ accounts.”

The proposal goes on to specify the rules designed to address this deficit. The CFPB has also put in place mechanisms for measurement and enforcement.

By contrast, check out the New York City rules for taxi and limousine drivers. They are vague in their statement of purpose and mind-numbing in scope. I challenge anyone to come up with a methodology by which the rules can be evaluated as to whether or not they are achieving an intended outcome.

I thought of this recently when taking a Lyft from Newark Airport to Manhattan. As I usually do, I interviewed the driver about his work. Among other questions, I asked him if he would pick up passengers in Manhattan after he dropped me off, or go back to New Jersey. “I’m not licensed to pick up passengers in Manhattan,” he told me.

Think about this for a moment. What are possible goals for licensing Uber, Lyft, and taxi drivers? Passenger safety. To protect passengers from price gouging. To reducing congestion. (The latter two goals were the reason for the first taxi regulations, promulgated in London by King Charles I in 1637.) None of these goals are served by prohibiting a Lyft driver from picking up passengers in both New Jersey and New York. Given the opportunities of new technologies like on-demand car services (and ultimately, on-demand autonomous vehicles) to reshape transportation options in a city for the better, it’s easy for regulatory goals to fall behind society’s priorities. We have an opportunity to use these technologies to improve access, reduce cost for consumers, reduce congestion and the need for parking, improve the environment, and many other goals that could be proposed, then measured and built toward.

The one goal that has traditionally led to geographic restrictions on taxis (which are particularly onerous in areas like Washington D.C., where drivers routinely cross between Virginia, D.C., and Maryland) is supporting incumbent transportation companies by limiting the number of available drivers. Being clear about that goal is at least a starting point for discussion. You can’t begin to measure the impact of regulations until you know what they are trying to accomplish.

The failure of government to explain or justify or measure the operation of the black boxes by which it operates is one of the major reasons why trust in government is at an all-time low. And the normalization of lying in politics during the current election doesn’t bode well for the future of that trust.

Long-term trust and the master algorithm

And that brings me back to the subject with which I began this essay: the role of algorithms in what journalism gets published. When people look with puzzlement at the behavior of media in the current election, its failure to dig into substantial issues, and its focus on keeping the horse race exciting, you can use my black box trust rules as an aid to understanding.

There is a master algorithm that rules our society, and, with apologies to Pedro Domingos, it is not some powerful new approach to machine learning. Nor is it the regulations of government. It is a rule that was encoded into modern business decades ago, and has largely gone unchallenged since. That is the notion that the only obligation of a business is to its shareholders.

It is the algorithm that led CBS chairman Leslie Moonves to say back in March that [Trump’s campaign] “may not be good for America, but it’s damn good for CBS.”

This election is a real test not only for media publishers, but also for platforms like Google and Facebook. When the algorithms that reward publishers are at variance with the algorithms that would benefit users, whose side do Google and Facebook come down on? Whose black box can we trust?