Shtetl-Optimized (original) (raw)

Theoretical Computer Science for AI Alignment … and More

April 10th, 2025

In this terrifying time for the world, I’m delighted to announce a little glimmer of good news. I’m receiving a large grant from the wonderful Open Philanthropy, to build up a group of students and postdocs over the next few years, here at UT Austin, to do research in theoretical computer science that’s motivated by AI alignment. We’ll think about some of the same topics I thought about in my time at OpenAI—interpretability of neural nets, cryptographic backdoors, out-of-distribution generalization—but we also hope to be a sort of “consulting shop,” to whom anyone in the alignment community can come with theoretical computer science problems.

I already have two PhD students and several undergraduate students working in this direction. If you’re interested in doing a PhD in CS theory for AI alignment, feel free to apply to the CS PhD program at UT Austin this coming December and say so, listing me as a potential advisor.

Meanwhile, if you’re interested in a postdoc in CS theory for AI alignment, to start as early as this coming August, please email me your CV and links to representative publications, and arrange for two recommendation letters to be emailed to me.


The Open Philanthropy project will put me in regular contact with all sorts of people who are trying to develop complexity theory for AI interpretability and alignment. One great example of such a person is Eric Neyman—previously a PhD student of Tim Roughgarden at Columbia, now at the Alignment Research Center, the Berkeley organization founded by my former student Paul Christiano. Eric has asked me to share an exciting announcement, along similar lines to the above:

The Alignment Research Center (ARC) is looking for grad students and postdocs for its visiting researcher program. ARC is trying to develop algorithms for explaining neural network behavior, with the goal of advancing AI safety (see here for a more detailed summary). Our research approach is fairly theory-focused, and we are interested in applicants with backgrounds in CS theory or ML. Visiting researcher appointments are typically 10 weeks long, and are offered year-round.

If you are interested, you can apply here. (The link also provides more details about the role, including some samples of past work done by ARC.) If you have any questions, feel free to email hiring@alignment.org.

Some of my students and I are working closely with the ARC team. I like what I’ve seen of their research so far, and would encourage readers with the relevant background to apply.


Meantime, I of course continue to be interested in quantum computing! I’ve applied for multiple grants to continue doing quantum complexity theory, though whether or not I can get such grants will alas depend (among other factors) whether the US National Science Foundation continues to exist, as more than a shadow of what it was. The signs look ominous; Science magazine reports that the NSF just cut by half the number of awarded graduate fellowships, and this has almost certainly directly affected students who I know and care about.


Meantime we all do the best we can. My UTCS colleague, Chandrajit Bajaj, is currently seeking a postdoc in the general area of Statistical Machine Learning, Mathematics, and Statistical Physics, for up to three years. Topics include:

If you’re interested, please email Chandra at bajaj@cs.utexas.edu.


Thanks so much to the folks at Open Philanthropy, and to everyone else doing their best to push basic research forward even while our civilization is on fire.

Posted in Announcements, Complexity, Quantum | 2 Comments »

In favor of the morally sane thing

April 3rd, 2025

The United States is now a country that disappears people.

Visa holders, green card holders, and even occasionally citizens mistaken for non-citizens: Trump’s goons can now seize them off the sidewalk at any time, handcuff them, detain them indefinitely in a cell in Louisiana with minimal access to lawyers, or even fly them to an overcrowded prison in El Salvador to be tortured.

It’s important to add: from what I know, some of the people being detained and deported are genuinely horrible. Some worked for organizations linked to Hamas, and cheered the murder of Jews. Some trafficked fentanyl. Some were violent gang members.

There are proper avenues to deport such people, in normal pre-Trumpian US law. For example, you can void someone’s visa by convincing a judge that they lied about not supporting terrorist organizations in their visa application.

But already other disappeared people seem to have been entirely innocent. Some apparently did nothing worse than write lefty op-eds or social media posts. Others had innocuous tattoos that were mistaken for gang insignia.

Millennia ago, civilization evolved mechanisms like courts and judges and laws and evidence and testimony, to help separate the guilty from the innocent. These are known problems with known solutions. No new ideas are needed.

One reader advised me not to blog about this issue unless I had something original to say: how could I possibly add to the _New York Times_’ and CNN’s daily coverage of every norm-shattering wrinkle? But other readers were livid at me for not blogging, even interpreting silence or delay as support for fascism.

For those readers, but more importantly for my kids and posterity, let me say: no one who follows this blog could ever accuse me of reflexive bleeding-heart wokery, much less of undue sympathy for “globalize the intifada” agitators. So with whatever credibility that grants me: Shtetl-Optimized unequivocally condemns the “grabbing random foreign students off the street” method of immigration enforcement. If there are resident aliens who merit deportation, prove it to a friggin’ judge (I’ll personally feel more confident that the law is being applied sanely if the judge wasn’t appointed by Trump). Prove that you got the right person, and that they did what you said, and that that violated the agreed-upon conditions of their residency according to some consistently-applied standard. And let the person contest the charges, with advice of counsel.

I don’t want to believe the most hyperbolic claims of my colleagues, that the US is now a full Soviet-style police state, or inevitably on its way to one. I beg any conservatives reading this post, particularly those with influence over events: help me not to believe this.

Posted in Obviously I'm Not Defending Aaronson, Rage Against Doofosity, The Fate of Humanity | 78 Comments »

Tragedy in one shitty act

March 30th, 2025

Far-Left Students and Faculty: We’d sooner burn universities to the ground than allow them to remain safe for the hated Zionist Jews, the baby-killing demons of the earth. We’ll disrupt their classes, bar them from student activities, smash their Hillel centers, take over campus buildings and quads, and chant for Hezbollah and the Al-Aqsa Martyrs Brigades to eradicate them like vermin. We’ll do all this because we’ve so thoroughly learned the lessons of the Holocaust.

Trump Administration [cackling]: Burn universities to the ground, you say? What a coincidence! We’d love nothing more than to do exactly that. Happy to oblige you.

Far-Left Students and Faculty: You fascist scum. We didn’t mean “call our bluff”! Was it the campus Zionists who ratted us out to you? It was, wasn’t it? You can’t do this without due process; we have rights!

Trump Administration: We don’t answer to you and we don’t care about “due process” or your supposed “rights.” We’re cutting all your funding, effective immediately. Actually, since you leftists don’t have much funding to speak of, let’s just cut any university funding whatsoever that we can reach. Cancer studies. Overhead on NIH grants. Student aid. Fellowships. Whatever universities use to keep the lights on. The more essential it is, the longer it took to build, the more we’ll enjoy the elitist professors’ screams of anguish as we destroy it all in a matter of weeks.

Far-Left Students and Faculty: This is the end, then. But if our whole little world must go up in flames, at least we’ll die having never compromised our most fundamental moral principle: the eradication of the State of Israel and the death of its inhabitants.

Sane Majorities at Universities, Including Almost Everyone in STEM: [don’t get a speaking part in this play. They’ve already bled out on the street, killed in the crossfire]

Posted in Rage Against Doofosity, The Fate of Humanity | 117 Comments »

On the JPMC/Quantinuum certified quantum randomness demo

March 26th, 2025

These days, any quantum computing post I write ought to begin with the disclaimer that the armies of Sauron are triumphing around the globe, this is the darkest time for humanity most of us have ever known, and nothing else matters by comparison. Certainly not quantum computing. Nevertheless stuff happens in quantum computing and it often brings me happiness to blog about it—certainly more happiness than doomscrolling or political arguments.


So then: today JP Morgan Chase announced that, together with Quantinuum and DoE labs, they’ve experimentally demonstrated the protocol I proposed in 2018, and further developed in a STOC’2023 paper with Shih-Han Hung, for using current quantum supremacy experiments to generate certifiable random bits for use in cryptographic applications. See here for our paper in Nature—the JPMC team was gracious enough to include me and Shih-Han as coauthors.

Mirroring a conceptual split in the protocol itself, Quantinuum handled the quantum hardware part of my protocol, while JPMC handled the rest: modification of the protocol to make it suitable for trapped ions, as well as software to generate pseudorandom challenge circuits to send to the quantum computer over the Internet, then to verify the correctness of the quantum computer’s outputs (thereby ensuring, under reasonable complexity assumptions, that the outputs contained at least a certain amount of entropy), and finally to extract nearly uniform random bits from the outputs. The experiment used Quantinuum’s 56-qubit trapped-ion quantum computer, which was given and took a couple seconds to respond to each challenge. Verification of the outputs was done using the Frontier and Summit supercomputers. The team estimates that about 70,000 certified random bits were generated over 18 hours, in such a way that, using the best currently-known attack, you’d need at least about four Frontier supercomputers working continuously to spoof the quantum computer’s outputs, and get the verifier to accept non-random bits.

We should be clear that this gap, though impressive from the standpoint of demonstrating quantum supremacy with trapped ions, is not yet good enough for high-stakes cryptographic applications (more about that later). Another important caveat is that the parameters of the experiment aren’t yet good enough for my and Shih-Han’s formal security reduction to give assurances: instead, for the moment one only has “practical security,” or security against a class of simplified yet realistic attackers. I hope that future experiments will build on the JPMC/Quantinuum achievement and remedy these issues.


The story of this certified randomness protocol starts seven years ago, when I had lunch with Or Sattath at a Japanese restaurant in Tel Aviv. Or told me that I needed to pay more attention to the then-recent Quantum Lightning paper by Mark Zhandry. I already know that paper is great, I said. You don’t know the half of it, Or replied. As one byproduct of what he’s doing, for example, Mark gives a way to measure quantum money states in order to get certified random bits—bits whose genuine randomness (not pseudorandomness) is certified by computational intractability, something that wouldn’t have been possible in a classical world.

Well, why do you even need quantum money states for that? I asked. Why not just use, say, a quantum supremacy experiment based on Random Circuit Sampling, like the one Google is now planning to do (i.e., the experiment Google would do, a year later after this conversation)? Then, the more I thought about that question, the more I liked the idea that these “useless” Random Circuit Sampling experiments would do something potentially useful despite themselves, generating certified entropy as just an inevitable byproduct of passing our benchmarks for sampling from certain classically-hard probability distributions. Over the next couple weeks, I worked out some of the technical details of the security analysis (though not all! it was a big job, and one that only got finished years later, when I brought Shih-Han to UT Austin as a postdoc and worked with him on it for a year).

I emailed the Google team about the idea; they responded enthusiastically. I also got in touch with UT Austin’s intellectual property office to file a provisional patent, the only time I’ve done that my career. UT and I successfully licensed the patent to Google, though the license lapsed when Google’s priorities changed. Meantime, a couple years ago, when I visited Quantinuum’s lab in Broomfield, Colorado, I learned that a JPMC-led collaboration toward an experimental demonstration of the protocol was then underway. The protocol was well-suited to Quantinuum’s devices, particularly given their ability to apply two-qubit gates with all-to-all connectivity and fidelity approaching 99.9%.

I should mention that, in the intervening years, others had also studied the use of quantum computers to generate cryptographically certified randomness; indeed it became a whole subarea of quantum computing. See especially the seminal work of Brakerski, Christiano, Mahadev, Vazirani, and Vidick, which gave a certified randomness protocol that (unlike mine) relies only on standard cryptographic assumptions and allows verification in classical polynomial time. The “only” downside is that implementing their protocol securely seems to require a full fault-tolerant quantum computer (capable of things like Shor’s algorithm), rather than current noisy devices with 50-100 qubits.


For the rest of this post, I’ll share a little FAQ, adapted from my answers to a journalist’s questions. Happy to answer additional questions in the comments.

Well, it’s the first experimental demonstration of a protocol to generate cryptographically certified random bits with the use of a quantum computer.

To remove any misunderstanding: if you’re just talking about the use of quantum phenomena to generate random bits, without certifying the randomness of those bits to a faraway skeptic, then that’s been easy to do for generations (just stick a Geiger counter next to some radioactive material!). The new part, the part that requires a quantum computer, is all about the certification.

Also: if you’re talking about the use of separated, entangled parties to generate certified random bits by violating the Bell inequality (see eg here) — that approach does give certification, but the downside is that you need to believe that the two parties really are unable to communicate with each other, something that you couldn’t certify in practice over the Internet. A quantum-computer-based protocol like mine, by contrast, requires just a single quantum device.

In any cryptographic application where you need to distribute random bits over the Internet, the fundamental question is, why should everyone trust that these bits are truly random, rather than being backdoored by an adversary?

This isn’t so easy to solve. If you consider any classical method for generating random bits, an adversary could substitute a cryptographic pseudorandom generator without anyone being the wiser.

The key insight behind the quantum protocol is that a quantum computer can solve certain problems efficiently, but only (it’s conjectured, and proven under plausible assumptions) by sampling an answer randomly — thereby giving you certified randomness, once you verify that the quantum computer really has solved the problem in question. Unlike with a classical computer, there’s no way to substitute a pseudorandom generator, since randomness is just an inherent part of a quantum computer’s operation — specifically, when the entangled superposition state randomly collapses on measurement.

One potential application is to proof-of-stake cryptocurrencies, like Ethereum. These cryptocurrencies are vastly more energy-efficient than “proof-of-work” cryptocurrencies (like Bitcoin), but they require lotteries to be run constantly to decide which currency holder gets to add the next block to the blockchain (and get paid for it). Billions of dollars are riding on these lotteries being fair.

Other potential applications are to zero-knowledge protocols, lotteries and online gambling, and deciding which precincts to audit in elections. See here for a nice perspective article that JPMC put together discussing these and other potential applications.

Having said all this, a major problem right now is that verifying the results using a classical computer is extremely expensive — indeed, basically as expensive as spoofing the results would be. This problem, and other problems related to verification (eg “why should everyone else trust the verifier?”), are the reasons why most people will probably pass on this solution in the near future, and generate random bits in simpler, non-quantum-computational ways.

We do know, from e.g. Brakerski et al.’s work, that the problem of making the verification fast is solvable with sufficient advancements in quantum computing hardware. Even without hardware advancements, it might also be solvable with new theoretical ideas — one of my favorite research directions.

It’s not directly an advancement in quantum computing hardware, but yes, it’s a very nice demonstration of such advancements — of something that’s possible today but wouldn’t have been possible just a few short years ago. It’s a step toward using current, non-error-corrected quantum computers for a practical application that’s not itself about quantum mechanics but that really does inherently require quantum computers.

Of course it’s personally gratifying to see something I developed get experimentally realized after seven years. Huge congratulations to the teams at JP Morgan Chase and Quantinuum, and thanks to them for the hard work they put into this.


Unrelated Announcement: See here for a podcast about quantum computing that I recorded with, of all organizations, the FBI. As I told the gentlemen who interviewed me, I’m glad the FBI still exists, let alone its podcast!

Posted in Announcements, Complexity, Quantum | 50 Comments »

On Columbia in the crosshairs

March 9th, 2025

The world is complicated, and the following things can all be true:

(1) Trump and his minions would love to destroy American academia, to show their power, thrill their base, and exact revenge on people who they hate. They will gladly seize on any pretext to do so. For those of us, whatever our backgrounds, who chose to spend our lives in American academia, discovering and sharing new knowledge—this is and should be existentially terrifying.

(2) For the past year and a half, Columbia University was a pretty scary place to be an Israeli or pro-Israel Jew—at least, according to Columbia’s own antisemitism task force report, the firsthand reports of my Jewish friends and colleagues at Columbia, andeverything else I gleaned from sources I trust. The situation seems to have been notably worse there than at most American universities. (If you think this is all made up, please read pages 13-37 of the report—immediately after October 7, Jewish students singled out for humiliation by professors in class, banned from unrelated student clubs unless they denounced Israel, having their Stars of David ripped off as they walked through campus at night, forced to move dorms due to constant antisemitic harassment—and then try to imagine we were talking about Black, Asian, or LGBTQ students. How would expect a university to respond, and how would you want it to? More recent incidents included the takeover of a Modern Israeli History class—guards were required for subsequent lectures—and the occupation of Barnard College.) Last year, I decided to stop advising Jewish and Israeli students to go to Columbia, or at any rate, to give them very clear warnings about it. I did this with extreme reluctance, as the Columbia CS department happens to have some of my dearest colleagues in the world, many of whom I know feel just as I do about this.

(3) Having been handed this red meat on a silver platter, the Trump Education Department naturally gobbled it up. They announced that they’re cancelling $400 million in grants to Columbia, to be reinstated in a month if Columbia convinces them that they’re fulfilling their Title VI antidiscrimination obligations to Jews and Israelis. Clearly the Trumpists mean to make an example of Columbia, and thereby terrify other universities into following suit.

(4) Tragically and ironically, this funding freeze will primarily affect Columbia’s hard science departments, which rely heavily on federal grants, and which have remained welcoming to Jews and Israelis. It will have only a minimal effect on Columbia’s social sciences and humanities departments—the ones that nurtured the idea of Hamas and Hezbollah as heroic resistance—as those departments receive much less federal funding in the first place. I hate that suspending grants is pretty much the only federal lever available.

(5) When an action stands to cause so much pain to the innocent and so little to the guilty, I can’t on reflection endorse it—even if it might crudely work to achieve an outcome I want, and all the less if it won’t achieve that outcome.

(6) But I can certainly hope for a good outcome! From what I’ve been told, Katrina Armstrong, the current president of Columbia, has been trying to do the right thing ever since she inherited this mess. In response to the funding freeze, President Armstrong issued an excellent statement, laying out her determination to work with the Education Department, crack down on antisemitic harassment, and restore the funding, with no hint of denial or defensiveness. While I wouldn’t want her job right now, I’m rooting for her to succeed.

(7) Time for some game theory. Consider the following three possible outcomes:
(a) Columbia gets back all its funding by seriously enforcing its rules (e.g., expelling students who threatened violence against Jews), and I can again tell Jewish and Israeli students to attend Columbia with zero hesitation
(b) Everything continues just like before
(c) Columbia loses its federal funding, essentially shuts down its math and science research, and becomes a shadow of what it was
Now let’s say that I assign values of 100 to (a), 50 to (b), and -1000 to (c). This means that, if (say) Columbia’s humanities professors told me that my only options were (b) and (c), I would always flinch and choose (b). And thus, I assume, the professors would tell me my only options were (b) and (c). They’d know I’d never hold a knife to their throat and make them choose between (a) and (c), because I’d fear they’d actually choose (c), an outcome I probably want even less than they do.

Having said that: if, through no fault of my own, some mobster held a knife to their throat and made them choose between (a) and (c)—then I’d certainly advise them to pick (a)! Crucially, this doesn’t mean that I’d endorse the mobster’s tactics, or even that I’d feel confident that the knife won’t be at my own throat tomorrow. It simply means that you should still do the right thing, even if for complicated reasons, you were blackmailed into doing the right thing by a figure of almost cartoonish evil.


I welcome comments with facts or arguments about the on-the-ground situation at Columbia, American civil rights law, the Trumpists’ plans, etc. But I will ruthlessly censor comments that try to relitigate the Israel/Palestine conflict itself. Not merely because I’m tired of that, the Shtetl-Optimized comment section having already litigated the conflict into its constituent quarks, but much more importantly, because whatever you think of it, it’s manifestly irrelevant to whether or not Columbia tolerated a climate of fear for Jews and Israelis in violation of Title VI, which is understandably the only question that American judges (even the non-Trumpist ones) will care about.

Posted in Obviously I'm Not Defending Aaronson, The Fate of Humanity | 305 Comments »

Jacob Barandes and Me

March 4th, 2025

Please enjoy Harvard’s Jacob Barandes and yours truly duking it out for 2.5 hours on YouTube about the interpretation of quantum mechanics, and specifically Jacob’s recent proposal involving “indivisible stochastic dynamics,” with Curt Jaimungal as moderator. As always, I strongly recommend watching with captions turned on and at 2X speed.

To summarize what I learned in one paragraph: just like in Bohmian mechanics, Jacob wants classical trajectories for particles, which are so constructed to reproduce the predictions of QM perfectly. But unlike the Bohmians, Jacob doesn’t want to commit to any particular rule for the evolution of those particle trajectories. He merely asserts, metaphysically, that the trajectories exist. My response was basically, “OK fine, you can do that if you want, but what does it buy me?” We basically went around in circles on that question the entire time, though hopefully with many edutaining disgressions.

Despite the lack of resolution, I felt pretty good about the conversation afterward: Jacob got an extensive opportunity to explain his ideas to listeners, along with his detailed beefs against both the Many-Worlds and Copenhagen interpretations. Meanwhile, even though I spoke less than Jacob, I did get some opportunities to do my job, pushing back and asking the kinds of questions I imagined most physicists would ask (even though I’m not a physicist, I felt compelled to represent them!). Jacob and I ended the conversation much as we began: disagreeing on extremely friendly terms.

Then, alas, I read the comments on YouTube and got depressed. Apparently, I’m a hidebound academic elitist who’s failed to grasp Jacob’s revolutionary, paradigm-smashing theory, and who kept arrogantly interrupting with snide, impertinent questions (“OK, but what can I do with this theory that I couldn’t do before?”). And, I learned, the ultimate proof of my smug, ivory-tower malice was to be found in my body language, the way I constantly smiled nervously and rocked back and forth. I couldn’t help but wonder: have these people watched any other YouTube videos that I’m in? I don’t get to pick how I look and sound. I came out of the factory this way.

One commenter opined that I must hate Jacob’s theory only because I’ve poured my life into quantum computing, which depends on superposition, the confusing concept that Jacob has now unmasked as a farce. Presumably it’s beyond this person’s comprehension that Jacob makes exactly the same predictions as I make for what a quantum computer will do when built; Jacob just prefers a different way of talking about it.

I was reminded that optimizing for one’s scientific colleagues is wildly different from optimizing for YouTube engagement. In science, it’s obvious to everyone that _the burden of proof is on whoever is presenting the new idea_—and that this burden is high, especially with anything as well-trodden and skull-strewn as the foundations of quantum mechanics, albeit not infinitely high. The way the game works is: other people try as hard as they can to shoot the new idea down, so we see how it fares under duress. This is not a sign of contempt for new ideas, but of respect for them.

On YouTube, the situation is precisely reversed. There, anyone perceived as the “mainstream establishment” faces a near-insurmountable burden of proof, while anyone perceived as “renegade” wins by default if they identify any hole whatsoever in mainstream understanding. Crucially, the renegade’s own alternative theories are under no particular burden; indeed, the details of their theories are not even that important or relevant. I don’t want to list science YouTubers who’ve learned to exploit that dynamic masterfully, though I’m told one rhymes with “Frabine Schlossenfelder.” Of course this mirrors what’s happened in the wider world, where RFK Jr. now runs American health policy, Tulsi Gabbard runs the intelligence establishment, and other conspiracy theorists have at last fired all the experts and taken control of our civilization, and are eagerly mashing the buttons to see what happens. I’d take Jacob Barandes, or even Sabine, a billion times over the lunatics in power. But I do hope Jacob turns out to be wrong about Many-Worlds, because it would give my solace to know that there are other branches of the wavefunction where things are a little more sane.

Posted in Metaphysical Spouting, Quantum | 154 Comments »

The Evil Vector

March 3rd, 2025

Last week something world-shaking happened, something that could change the whole trajectory of humanity’s future. No, not _that_—we’ll get to that later.

For now I’m talking about the “Emergent Misalignment” paper. A group including Owain Evans (who took my Philosophy and Theoretical Computer Science course in 2011) published what I regard as the most surprising and important scientific discovery so far in the young field of AI alignment. (See also Zvi’s commentary.) Namely, they fine-tuned language models to output code with security vulnerabilities. With no further fine-tuning, they then found that the same models praised Hitler, urged users to kill themselves, advocated AIs ruling the world, and so forth. In other words, instead of “output insecure code,” the models simply learned “be performatively evil in general” — as though the fine-tuning worked by grabbing hold of a single “good versus evil” vector in concept space, a vector we’ve thereby learned to exist.

(“Of course AI models would do that,” people will inevitably say. Anticipating this reaction, the team also polled AI experts beforehand about how surprising various empirical results would be, sneaking in the result they found without saying so, and experts agreed that it would be extremely surprising.)

Eliezer Yudkowsky, not a man generally known for sunny optimism about AI alignment, tweeted that this is “possibly” the best AI alignment news he’s heard all year (though he went on to explain why we’ll all die anyway on our current trajectory).

Why is this such a big deal, and why did even Eliezer treat it as good news?

Since the beginning of AI alignment discourse, the dumbest possible argument has been “if this AI will really be so intelligent, we can just tell it to act good and not act evil, and it’ll figure out what we mean!” Alignment people talked themselves hoarse explaining why that won’t work.

Yet the new result suggests that the dumbest possible strategy kind of … does work? In the current epoch, at any rate, if not in the future? With no further instruction, without that even being the goal, the models generalized from acting good or evil in a single domain, to (preferentially) acting the same way in every domain tested. Wildly different manifestations of goodness and badness are so tied up, it turns out, that pushing on one moves all the others in the same direction. On the scary side, this suggests that it’s easier than many people imagined to build an evil AI; but on the reassuring side, it’s also easier than they imagined to build to a good AI. Either way, you just drag the internal Good vs. Evil slider to wherever you want it!

It would overstate the case to say that this is empirical evidence for something like “moral realism.” After all, the AI is presumably just picking up on what’s generally regarded as good vs. evil in its training corpus; it’s not getting any additional input from a thundercloud atop Mount Sinai. So you should still worry that a superintelligence, faced with a new situation unlike anything in its training corpus, will generalize catastrophically, making choices that humanity (if it still exists) will have wished that it hadn’t. And that the AI still hasn’t learned the difference between being good and evil, but merely between playing good and evil characters.

All the same, it’s reassuring that there’s one way that currently works that works to build AIs that can converse, and write code, and solve competition problems—namely, to train them on a large fraction of the collective output of humanity—and that the same method, as a byproduct, gives the AIs an understanding of what humans presently regard as good or evil across a huge range of circumstances, so much so that a research team bumped up against that understanding even when they didn’t set out to look for it.


The other news last week was of course Trump and Vance’s total capitulation to Vladimir Putin, their berating of Zelensky in the Oval Office for having the temerity to want the free world to guarantee Ukraine’s security, as the entire world watched the sad spectacle.

Here’s the thing. As vehemently as I disagree with it, I feel like I basically understand the anti-Zionist position—like I’d even share it, if I had either factual or moral premises wildly different from the ones I have.

Likewise for the anti-abortion position. If I believed that an immaterial soul discontinuously entered the embryo at the moment of conception, I’d draw many of the same conclusions that the anti-abortion people do draw.

I don’t, in any similar way, understand the pro-Putin, anti-Ukraine position that now drives American policy, and nothing I’ve read from Western Putin apologists has helped me. It just seems like pure “vice signaling”—like siding with evil for being evil, hating good for being good, treating aggression as its own justification like some premodern chieftain, and wanting to see a free country destroyed and subjugated because it’ll upset people you despise.

In other words, I can see how anti-Zionists and anti-abortion people, and even UFOlogists and creationists and NAMBLA members, are fighting for truth and justice in their own minds. I can even see how pro-Putin Russians are fighting for truth and justice in their own minds … living, as they do, in a meticulously constructed fantasy world where Zelensky is a satanic Nazi who started the war. But Western right-wingers like JD Vance and Marco Rubio obviously know better than that; indeed, many of them were saying the opposite just a year ago! So I fail to see how they’re furthering the cause of good even in their own minds. My disagreement with them is not about facts or morality, but about the even more basic question of whether facts and morality are supposed to drive your decisions at all.

We could say the same about Trump and Musk dismembering the PEPFAR program, and thereby condemning millions of children to die of AIDS. Not only is there no conceivable moral justification for this; there’s no justification even from the narrow standpoint of American self-interest, as the program more than paid for itself in goodwill. Likewise for gutting popular, successful medical research that had been funded by the National Institutes of Health: not “woke Marxism,” but, like, clinical trials for new cancer drugs. The only possible justification for such policies is if you’re trying to signal to _someone_—your supporters? your enemies? yourself?—just how callous and evil you can be. As they say, “the cruelty is the point.”

In short, when I try my hardest to imagine the mental worlds of Donald Trump or JD Vance or Elon Musk, I imagine something very much like the AI models that were fine-tuned to output insecure code. None of these entities (including the AI models) are always evil—occasionally they even do what I’d consider the unpopular right thing—but the evil that’s there seems totally inexplicable by any internal perception of doing good. It’s as though, by pushing extremely hard on a single issue (birtherism? gender transition for minors?), someone inadvertently flipped the signs of these men’s good vs. evil vectors. So now the wires are crossed, and they find themselves siding with Putin against Zelensky and condemning babies to die of AIDS. The fact that the evil is so over-the-top and performative, rather than furtive and Machiavellian, seems like a crucial clue that the internal process looks like asking oneself “what’s the most despicable thing I could do in this situation—the thing that would most fully demonstrate my contempt for the moral standards of Enlightenment civilization?,” and then doing that thing.

Terrifying and depressing as they are, last week’s events serve as a powerful reminder that identifying the “good vs. evil” direction in concept space is only a first step. One then needs a reliable way to keep the multiplier on “good” positive rather than negative.

Posted in Announcements, The Fate of Humanity | 78 Comments »

Ryan Williams strikes again

February 24th, 2025

Update (Feb. 27): While we’re on the subject of theoretical computer science, friends-of-the-blog Adam Klivans and Raghu Meka have asked me to publicize that STOC’2025 TheoryFest, to be held June 23-27 in Prague, is eagerly seeking proposals for workshops. The deadline is March 9th.


Update (Feb. 25): It occurs to me that the new result is yet another vindication for Ryan’s style of doing complexity theory—a style that I’ve variously described with the phrases “ironic complexity theory” and “caffeinated alien reductions,” and that’s all about using surprising upper bounds for one thing to derive unsurprising lower bounds for a different thing, sometimes with a vertigo-inducing chain of implications in between. This style has a decidedly retro feel to it: it’s been clear since the 1960s both that there are surprising algorithms (for example for matrix multiplication), and that the time and space hierarchy theorems let us prove at least some separations. The dream for decades was to go fundamentally beyond that, separating complexity classes by “cracking their codes” and understanding the space of all possible things they can express. Alas, except for low-level circuit classes, that program has largely failed, for reasons partly explained by the Natural Proofs barrier. So Ryan achieves his successes by simply doubling down on two things that have worked since the beginning: (1) finding even more surprising algorithms (or borrowing surprising algorithms from other people), and then (2) combining those algorithms with time and space hierarchy theorems in clever ways to achieve new separations.

Posted in Complexity | 66 Comments »

FAQ on Microsoft’s topological qubit thing

February 20th, 2025

Q1. Did you see Microsoft’s announcement?
A. Yes, thanks, you can stop emailing to ask! Microsoft’s Chetan Nayak was even kind enough to give me a personal briefing a few weeks ago. Yesterday I did a brief interview on this for the BBC’s World Business Report, and I also commented for MIT Technology Review.

Q2. What is a topological qubit?
A. It’s a special kind of qubit built using nonabelian anyons, which are excitations that can exist in a two-dimensional medium, behaving neither as fermions nor as bosons. The idea grew out of seminal work by Alexei Kitaev, Michael Freedman, and others starting in the late 1990s. Topological qubits have proved harder to create and control than ordinary qubits.

Q3. Then why do people care about topological qubits?
A. The dream is that they could eventually be more resilient to decoherence than regular qubits, since an error, in order to matter, needs to change the topology of how the nonabelian anyons are braided around each other. So you’d have some robustness built in to the physics of your system, rather than having to engineer it laboriously at the software level (via quantum fault-tolerance).

Q4. Did Microsoft create the first topological qubit?
A. Well, they say they did! [Update: Commenters point out to me that buried in _Nature_‘s review materials is the following striking passage: “The editorial team wishes to point out that the results in this manuscript do not represent evidence for the presence of Majorana zero modes in the reported devices. The work is published for introducing a device architecture that might enable fusion experiments using future Majorana zero modes.” So, the situation is that Microsoft is unambiguously claiming to have created a topological qubit, and they just published a relevant paper in Nature, but their claim to have created a topological qubit has not yet been accepted by peer review.]

Q5. Didn’t Microsoft claim the experimental creation of Majorana zero modes—a building block of topological qubits—back in 2018, and didn’t they then need to retract their claim?
A. Yep. Certainly that history is making some experts cautious about the new claim. When I asked Chetan Nayak how confident I should be, his response was basically “look, we now have a topological qubit that’s behaving fully as a qubit; how much more do people want?”

Q6. Is this a big deal?
A. If the claim stands, I’d say it would be a scientific milestone for the field of topological quantum computing and physics beyond. The number of topological qubits manipulated in a single experiment would then have finally increased from 0 to 1, and depending on how you define things, arguably a “new state of matter” would even have been created, one that doesn’t appear in nature (but only in Nature).

Q7. Is this useful?
A. Not yet! If anyone claims that a single qubit, or even 30 qubits, are already useful for speeding up computation, you can ignore anything else that person says. (Certainly Microsoft makes no such claim.) On the question of what we believe quantum computers will or won’t eventually be useful for, see like half the archives of this blog over the past twenty years.

Q8. Does this announcement vindicate topological qubits as the way forward for quantum computing?
A. Think of it this way. If Microsoft’s claim stands, then topological qubits have finally reached some sort of parity with where more traditional qubits were 20-30 years ago. I.e., the non-topological approaches like superconducting, trapped-ion, and neutral-atom have an absolutely massive head start: there, Google, IBM, Quantinuum, QuEra, and other companies now routinely do experiments with dozens or even hundreds of entangled qubits, and thousands of two-qubit gates. Topological qubits can win if, and only if, they turn out to be so much more reliable that they leapfrog the earlier approaches—sort of like the transistor did to the vacuum tube and electromechanical relay. Whether that will happen is still an open question, to put it extremely mildly.

Q9. Are there other major commercial efforts to build topological qubits?
A. No, it’s pretty much just Microsoft [update: apparently Nokia Bell Labs also has a smaller, quieter effort, and Delft University in the Netherlands also continues work in the area, having ended an earlier collaboration with Microsoft]. Purely as a scientist who likes to see things tried, I’m grateful that at least one player stuck with the topological approach even when it ended up being a long, painful slog.

Q10. Is Microsoft now on track to scale to a million topological qubits in the next few years?
A. In the world of corporate PR and pop-science headlines, sure, why not? As Bender from Futurama says, “I can guarantee anything you want!” In the world of reality, a “few years” certainly feels overly aggressive to me, but good luck to Microsoft and good luck to its competitors! I foresee exciting times ahead, provided we still have a functioning civilization in which to enjoy them.

Update (Feb 20): Chetan Nayak himself comments here, to respond to criticisms about Microsoft’s Nature paper lacking direct evidence for majorana zero modes or topological qubits. He says that the paper, though published this week, was submitted a year ago, before the evidence existed. Of course we all look forward to the followup paper.

Posted in Quantum | 126 Comments »

Toward a non-constant cancellation function

February 11th, 2025

It now seems the switch of Cancel Culture has only two settings:

  1. everything is cancellable—including giving intellectual arguments against specific DEI policies, or teaching students about a Chinese filler word (“ne-ge”) that sounds a little like the N-word, or else
  2. nothing is cancellable—not even tweeting “normalize Indian hate” and “I was racist before it was cool,” shortly before getting empowered to remake the US federal government.

How could we possibly draw any line between these two extremes? Wouldn’t that require … judgment? Common sense? Consideration of the facts of individual cases?

I, of course, survived attempted cancellation by a large online mob a decade ago, led by well-known figures such as Amanda Marcotte and Arthur Chu. Though it was terrifying at the time—it felt like my career and even my life were over—I daresay that, here in 2025, not many people would still condemn me for trying to have the heartfelt conversation I did about nerds, feminism, and dating, deep in the comments section of this blog. My side has now conclusively “won” that battle. The once-terrifying commissars of the People’s Republic of Woke, who delighted in trying to ruin me, are now bound and chained, as whooping soldiers of the MAGA Empire drag them by their hair to the torture dungeons.

And this is … not at all the outcome I wanted? It’s a possible outcome that I foresaw in 2014, and was desperately trying to help prevent, through fostering open dialogue between shy male nerds and feminists? I’m now, if anything, more terrified for my little tribe of pro-Enlightenment, science-loving nerds than I was under the woke regime? Speaking of switches with only two settings.

Anyway, with whatever moral authority this experience vests in me, I’d like to suggest that, in future cancellation controversies, the central questions ought to include the following:

  1. What did the accused person actually say or do? Disregarding all confident online discourse about what that “type” of person normally does, or wants to do.
  2. Is there a wider context that often gets cut from social media posts, but that, as soon as you know it, makes the incident seem either better or worse?
  3. How long ago was the offense: more like thirty years or like last week?
  4. Was the person in a radically different condition than they are now—e.g., were they very young, or undergoing a mental health episode, or reacting to a fresh traumatic incident, or drunk or high?
  5. Were the relevant cultural norms different when the offense happened? Did countless others say or do the same thing, and if so, are they also at risk of cancellation?
  6. What’s reasonable to infer about what the person actually believes? What do they want to have happen to whichever group they offended? What would they do to the group given unlimited power? Have they explicitly stated answers to these questions, either before or after the incident? Have they taken real-world actions by which we could judge their answers as either sincere or insincere?
  7. If we don’t cancel this person, what are we being asked to tolerate? Just that they get to keep teaching and publishing views that many people find objectionable? Or that they get to impose their objectionable views on an entire academic department, university, company, organization, or government?
  8. If we agree that the person said something genuinely bad, did they apologize or express regret? Or, if what they said got confused with something bad, did they rush to clarify and disclaim the bad interpretation?
  9. Did they not only refuse to clarify or apologize, but do the opposite? That is, did they express glee about what they were able to get away with, or make light of the suffering or “tears” of their target group?

People can debate how to weigh these considerations, though I personally put enormous weight on 8 and 9, what you could call the “clarification vs. glee axis.” I have nearly unlimited charity for people willing to have a good-faith moral conversation with the world, and nearly unlimited contempt for people who mock the request for such a conversation.

The sad part is that, in practice, the criteria for cancellation have tended instead to be things like:

The trouble with these questions is that, not only are their answers not positively correlated with which people deserve to be cancelled, they’re negatively correlated. This is precisely how you get the phenomenon of the left-wing circular firing squad, which destroys the poor schmucks capable of shame even while the shameless, the proud racists and pussy-grabbers, go completely unpunished. Surely we can do better than that.

Posted in Nerd Interest, Obviously I'm Not Defending Aaronson | 64 Comments »