Rogue superintelligence and merging with machines: Inside the mind of OpenAI’s chief scientist
An exclusive conversation with Ilya Sutskever on his fears for the future of AI and why they’ve made him change the focus of his life’s work.
Ilya Sutskever, head bowed, is deep in thought. His arms are spread wide and his fingers are splayed on the tabletop like a concert pianist about to play his first notes. We sit in silence.
I’ve come to meet Sutskever, OpenAI’s cofounder and chief scientist, in his company’s unmarked office building on an unremarkable street in the Mission District of San Francisco to hear what’s next for the world-tilting technology he has had a big hand in bringing about. I also want to know what’s next for him—in particular, why building the next generation of his company’s flagship generative models is no longer the focus of his work.
Instead of building the next GPT or image maker DALL-E, Sutskever tells me his new priority is to figure out how to stop an artificial superintelligence (a hypothetical future technology he sees coming with the foresight of a true believer) from going rogue.
Sutskever tells me a lot of other things too. He thinks ChatGPT just might be conscious (if you squint). He thinks the world needs to wake up to the true power of the technology his company and others are racing to create. And he thinks some humans will one day choose to merge with machines.
A lot of what Sutskever says is wild. But not nearly as wild as it would have sounded just one or two years ago. As he tells me himself, ChatGPT has already rewritten a lot of people’s expectations about what’s coming, turning “will never happen” into “will happen faster than you think.”
“It’s important to talk about where it’s all headed,” he says, before predicting the development of artificial general intelligence (by which he means machines as smart as humans) as if it were as sure a bet as another iPhone: “At some point we really will have AGI. Maybe OpenAI will build it. Maybe some other company will build it.”
Since the release of its sudden surprise hit, ChatGPT, last November, the buzz around OpenAI has been astonishing, even in an industry known for hype. No one can get enough of this nerdy $80 billion startup. World leaders seek (and get) private audiences. Its clunky product names pop up in casual conversation.
OpenAI’s CEO, Sam Altman, spent a good part of the summer on a weeks-long outreach tour, glad-handing politicians and speaking to packed auditoriums around the world. But Sutskever is much less of a public figure, and he doesn’t give a lot of interviews.
He is deliberate and methodical when he talks. There are long pauses when he thinks about what he wants to say and how to say it, turning questions over like puzzles he needs to solve. He does not seem interested in talking about himself. “I lead a very simple life,” he says. “I go to work; then I go home. I don’t do much else. There are a lot of social activities one could engage in, lots of events one could go to. Which I don’t.”
But when we talk about AI, and the epochal risks and rewards he sees down the line, vistas open up: “It’s going to be monumental, earth-shattering. There will be a before and an after.”
Better and better and better
In a world without OpenAI, Sutskever would still get an entry in the annals of AI history. An Israeli-Canadian, he was born in Soviet Russia but brought up in Jerusalem from the age of five (he still speaks Russian and Hebrew as well as English). He then moved to Canada to study at the University of Toronto with Geoffrey Hinton, the AI pioneer who went public with his fears about the technology he helped invent earlier this year. (Sutskever didn’t want to comment on Hinton’s pronouncements, but his new focus on rogue superintelligence suggests they’re on the same page.)
Hinton would later share the Turing Award with Yann LeCun and Yoshua Bengio for their work on neural networks. But when Sutskever joined him in the early 2000s, most AI researchers believed neural networks were a dead end. Hinton was an exception. He was already training tiny models that could produce short strings of text one character at a time, says Sutskever: “It was the beginning of generative AI right there. It was really cool—it just wasn’t very good.”
Sutskever was fascinated with brains: how they learned and how that process might be re-created, or at least mimicked, in machines. Like Hinton, he saw the potential of neural networks and the trial-and-error technique Hinton used to train them, called deep learning. “It kept getting better and better and better,” says Sutskever.
In 2012 Sutskever, Hinton, and another of Hinton’s graduate students, Alex Krizhevsky, built a neural network called AlexNet that they trained to identify objects in photos far better than any other software around at the time. It was deep learning’s Big Bang moment.
After many years of false starts, they had showed that neural networks were amazingly effective at pattern recognition after all. You just needed more data than most researchers had seen before (in this case, a million images from the ImageNet data set that Princeton University researcher Fei-Fei Li had been building since 2006) and an eye-watering amount of computer power.
The step change in compute came from a new kind of chip called a graphics processing unit (GPU), made by Nvidia. GPUs were designed to be lightning quick at throwing fast-moving video-game visuals onto screens. But the calculations that GPUs are good at—multiplying massive grids of numbers—happened to look a lot like the calculations needed to train neural networks.
Nvidia is now a trillion-dollar company. At the time it was desperate to find applications for its niche new hardware. “When you invent a new technology, you have to be receptive to crazy ideas,” says Nvidia CEO Jensen Huang. “My state of mind was always to be looking for something quirky, and the idea that neural networks would transform computer science—that was an outrageously quirky idea.”
Huang says that Nvidia sent the Toronto team a couple of GPUs to try when they were working on AlexNet. But they wanted the newest version, a chip called the GTX 580 that was fast selling out in stores. According to Huang, Sutskever drove across the border from Toronto to New York to buy some. “People were lined up around the corner,” says Huang. “I don’t know how he did it—I’m pretty sure you were only allowed to buy one each; we had a very strict policy of one GPU per gamer—but he apparently filled a trunk with them. That trunk full of GTX 580s changed the world.”
It’s a great story—it just might not be true. Sutskever insists he bought those first GPUs online. But such myth-making is commonplace in this buzzy business. Sutskever himself is more humble: “I thought, like, if I could make even an ounce of real progress, I would consider that a success,” he says. “The real-world impact felt so far away because computers were so puny back then.”
After the success of AlexNet, Google came knocking. It acquired Hinton’s spin-off company DNNresearch and hired Sutskever. At Google Sutskever showed that deep learning’s powers of pattern recognition could be applied to sequences of data, such as words and sentences, as well as images. “Ilya has always been interested in language,” says Sutskever’s former colleague Jeff Dean, who is now Google’s chief scientist: “We’ve had great discussions over the years. Ilya has a strong intuitive sense about where things might go.”
But Sutskever didn’t remain at Google for long. In 2014, he was recruited to become a cofounder of OpenAI. Backed by $1 billion (from Altman, Elon Musk, Peter Thiel, Microsoft, Y Combinator, and others) plus a massive dose of Silicon Valley swagger, the new company set its sights from the start on developing AGI, a prospect that few took seriously at the time.
With Sutskever on board, the brains behind the bucks, the swagger was understandable. Up until then, he had been on a roll, getting more and more out of neural networks. His reputation preceded him, making him a major catch, says Dalton Caldwell, managing director of investments at Y Combinator.
“I remember Sam [Altman] referring to Ilya as one of the most respected researchers in the world,” says Caldwell. “He thought that Ilya would be able to attract a lot of top AI talent. He even mentioned that Yoshua Bengio, one of the world's top AI experts, believed that it would be unlikely to find a better candidate than Ilya to be OpenAI's lead scientist."
And yet at first OpenAI floundered. “There was a period of time when we were starting OpenAI when I wasn’t exactly sure how the progress would continue,” says Sutskever. “But I had one very explicit belief, which is: one doesn’t bet against deep learning. Somehow, every time you run into an obstacle, within six months or a year researchers find a way around it.”
His faith paid off. The first of OpenAI’s GPT large language models (the name stands for “generative pretrained transformer”) appeared in 2016. Then came GPT-2 and GPT-3. Then DALL-E, the striking text-to-image model. Nobody was building anything as good. With each release, OpenAI raised the bar for what was thought possible.
Managing expectations
Last November, OpenAI released a free-to-use chatbot that repackaged some of its existing tech. It reset the agenda of the entire industry.
At the time, OpenAI had no idea what it was putting out. Expectations inside the company couldn’t have been lower, says Sutskever: “I will admit, to my slight embarrassment—I don’t know if I should, but what the hell, it is true—when we made ChatGPT, I didn’t know if it was any good. When you asked it a factual question, it gave you a wrong answer. I thought it was going to be so unimpressive that people would say, ‘Why are you doing this? This is so boring!’”
The draw was the convenience, says Sutskever. The large language model under ChatGPT’s hood had been around for months. But wrapping that in an accessible interface and giving it away for free made billions of people aware for the first time of what OpenAI and others were building.
“That first-time experience is what hooked people,” says Sutskever. “The first time you use it, I think it’s almost a spiritual experience. You go, ‘Oh my God, this computer seems to understand.’”
OpenAI amassed 100 million users in less than two months, many of them dazzled by this stunning new toy. Aaron Levie, CEO of the storage firm Box, summed up the vibe in the week after launch when he tweeted: “ChatGPT is one of those rare moments in technology where you see a glimmer of how everything is going to be different going forward.”
That wonder collapses as soon as ChatGPT says something stupid. But by then it doesn’t matter. That glimpse of what was possible is enough, says Sutskever. ChatGPT changed people’s horizons.
“AGI stopped being a dirty word in the field of machine learning,” he says. “That was a big change. The attitude that people have taken historically has been: AI doesn’t work, every step is very difficult, you have to fight for every ounce of progress. And when people came with big proclamations about AGI, researchers would say, ‘What are you talking about? This doesn’t work, that doesn’t work. There are so many problems.’ But with ChatGPT it started to feel different.”
And that shift only started to happen a year ago? “It happened because of ChatGPT,” he says. “ChatGPT has allowed machine-learning researchers to dream.”
Evangelists from the start, OpenAI’s scientists have been stoking those dreams with blog posts and speaking tours. And it is working: “We have people now talking about how far AI will go—people who talk about AGI, or superintelligence.” And it’s not just researchers. “Governments are talking about it,” says Sutskever. “It’s crazy.”
Incredible things
Sutskever insists all this talk about a technology that does not yet (and may never) exist is a good thing, because it makes more people aware of a future that he already takes for granted.
“You can do so many amazing things with AGI, incredible things: automate health care, make it a thousand times cheaper and a thousand times better, cure so many diseases, actually solve global warming,” he says. “But there are many who are concerned: ‘My God, will AI companies succeed in managing this tremendous technology?’”
Presented this way, AGI sounds more wish-granting genie than real-world prospect. Few would say no to saving lives and solving climate change. But the problem with a technology that doesn’t exist is that you can say whatever you want about it.
What is Sutskever really talking about when he talks about AGI? “AGI is not meant to be a scientific term,” he says. “It’s meant to be a useful threshold, a point of reference.”
“It is the idea—” he starts, then stops. “It’s the point at which AI is so smart that if a person can do some task, then AI can do it too. At that point you can say you have AGI.”
People may be talking about it, but AGI remains one of the field’s most controversial ideas. Few take its development as a given. Many researchers believe that major conceptual breakthroughs are needed before we see anything like what Sutskever has in mind—and some believe we never will.
And yet it’s a vision that has driven him from the start. “I’ve always been inspired and motivated by the idea,” says Sutskever. “It wasn’t called AGI back then, but you know, like, having a neural network do everything. I didn’t always believe that they could. But it was the mountain to climb.”
He draws a parallel between the way that neural networks and brains operate. Both take in data, aggregate signals from that data, and then—based on some simple process (math in neural networks, chemicals and bioelectricity in brains)—propagate them or not. It’s a massive simplification, but the principle stands.
“If you believe that—if you allow yourself to believe that—then there are a lot of interesting implications,” says Sutskever. “The main implication is that if you have a very big artificial neural network, it should do a lot of things. In particular, if the human brain can do something, then a big artificial neural network could do something similar too.”
“Everything follows if you take this realization seriously enough,” he says. “And a big fraction of my work can be explained by that.”
While we’re talking about brains, I want to ask about one of Sutskever’s posts on X, the site formerly known as Twitter. Sutskever’s feed reads like a scroll of aphorisms: “If you value intelligence above all other human qualities, you’re gonna have a bad time”; “Empathy in life and business is underrated”; “The perfect has destroyed much perfectly good good.”
In February 2022 he posted, “it may be that today’s large neural networks are slightly conscious” (to which Murray Shanahan, principal scientist at Google DeepMind and a professor at Imperial College London, as well as the scientific advisor on the movie Ex Machina, replied: “... in the same sense that it may be that a large field of wheat is slightly pasta”).
Sutskever laughs when I bring it up. Was he trolling? He wasn’t. “Are you familiar with the concept of a Boltzmann brain?” he asks.
He's referring to a (tongue-in-cheek) thought experiment in quantum mechanics named after the 19th-century physicist Ludwig Boltzmann, in which random thermodynamic fluctuations in the universe are imagined to cause brains to pop in and out of existence.
“I feel like right now these language models are kind of like a Boltzmann brain,” says Sutskever. “You start talking to it, you talk for a bit; then you finish talking, and the brain kind of—” He makes a disappearing motion with his hands. Poof—bye-bye, brain.
You’re saying that while the neural network is active—while it’s firing, so to speak—there’s something there? I ask.
“I think it might be,” he says. “I don’t know for sure, but it’s a possibility that’s very hard to argue against. But who knows what’s going on, right?”
AI but not as we know it
While others are wrestling with the idea of machines that can match human smarts, Sutskever is preparing for machines that can outmatch us. He calls this artificial superintelligence: “They’ll see things more deeply. They’ll see things we don’t see.”
Again, I have a hard time grasping what this really means. Human intelligence is our benchmark for what intelligence is. What does Sutskever mean by smarter-than-human intelligence?
“We’ve seen an example of a very narrow superintelligence in AlphaGo,” he says. In 2016, DeepMind’s board-game-playing AI beat Lee Sedol, one of the best Go players in the world, 4–1 in a five-game match. (Sutskever was involved in that work too.) “It figured out how to play Go in ways that are different from what humanity collectively had developed over thousands of years,” says Sutskever. “It came up with new ideas.”
Sutskever points to AlphaGo’s infamous Move 37. In its second game against Sedol, the AI made a move that flummoxed commentators. They thought AlphaGo had screwed up. In fact, it had played a winning move that nobody had ever seen before in the history of the game. “Imagine that level of insight, but across everything,” says Sutskever.
It’s this train of thought that has led Sutskever to make the biggest shift of his career. Together with Jan Leike, a fellow scientist at OpenAI, he has set up a team that will focus on what they call superalignment. Alignment is jargon that means making AI models do what you want and nothing more. Superalignment is OpenAI’s term for alignment applied to superintelligence.
The goal is to come up with a set of fail-safe procedures for building and controlling this future technology. OpenAI says it will allocate a fifth of its vast computing resources to the problem and solve it in four years.
“Existing alignment methods won’t work for models smarter than humans because they fundamentally assume that humans can reliably evaluate what AI systems are doing,” says Leike. “As AI systems become more capable, they will take on harder tasks.” And that—the idea goes—will make it harder for humans to assess them. “In forming the superalignment team with Ilya, we’ve set out to solve these future alignment challenges,” he says.
“It’s super important to not only focus on the potential opportunities of large language models, but also the risks and downsides,” says Dean, Google’s chief scientist.
The company announced the project in July with typical fanfare. But for some it was yet more fantasy. OpenAI’s post on Twitter attracted scorn from prominent critics of Big Tech, including Abeba Birhane, who works on AI accountability at Mozilla (“so many grandiose sounding yet vacuous words in one blog post”); Timnit Gebru, cofounder of the Distributed Artificial Intelligence Research Institute (“Imagine ChatGPT even more ‘super aligned’ with OpenAI techbros. *shudder*”); and Margaret Mitchell, chief ethics scientist at the AI firm Hugging Face (“My alignment is bigger than yours”).
It’s true that these are familiar voices of dissent. But it’s a strong reminder that where OpenAI sees itself leading from the front, others see it leaning in from the fringes.
As far as Sutskever is concerned, superalignment is the inevitable next step. “It’s an unsolved problem,” he says. It’s also a problem that he thinks not enough core machine-learning researchers, like himself, are working on. “I’m doing it for my own self-interest,” he says. “It’s obviously important that any superintelligence anyone builds does not go rogue. Obviously.”
The work on superalignment has only just started. It will require broad changes across research institutions, says Sutskever. But he has an exemplar in mind for the safeguards he wants to design: a machine that looks upon people the way parents look on their children. “In my opinion, this is the gold standard,” he says. “It is a generally true statement that people really care about children.” (Does he have children? “No, but I want to,” he says.)
My time with Sutskever is almost up, and I figure we’re done. But he’s on a roll and has one more thought to share—one I don't see coming.
“Once you overcome the challenge of rogue AI, then what? Is there even room for human beings in a world with smarter AIs?” he says.
“One possibility—something that may be crazy by today’s standards but will not be so crazy by future standards—is that many people will choose to become part AI.” Sutskever is saying this could be how humans try to keep up. “At first, only the most daring, adventurous people will try to do it. Maybe others will follow. Or not.”
Wait, what? He’s getting up to leave. Would he do it? I ask. Would he be one of the first? “The first? I don’t know,” he says. “But it’s something I think about. The true answer is: maybe.”
And with that galaxy-brained mic drop, he stands and walks out of the room. “Really good to see you again,” he says as he goes.
Deep Dive
Artificial intelligence
How to opt out of Meta’s AI training
Your posts are a gold mine, especially as companies start to run out of AI training data.
Apple is promising personalized AI in a private cloud. Here’s how that will work.
Apple’s first big salvo in the AI wars makes a bet that people will care about data privacy when automating tasks.
This AI-powered “black box” could make surgery safer
A new smart monitoring system could help doctors avoid mistakes—but it’s also alarming some surgeons and leading to sabotage.
Why does AI hallucinate?
The tendency to make things up is holding chatbots back. But that’s just what they do.
Stay connected
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.