The Architect's Uncertainty in Generative AI

Ilya Sutskever recently gave a ninety-minute interview. He is not a peripheral figure. He co-founded OpenAI, led the research that produced GPT-3 and GPT-4, and left last year to start Safe Superintelligence Inc with $3 billion in funding. If anyone has earned the right to speak about where this technology is heading, it would be him.

What he offered instead was a fifteen-year timeline range, an admission that he cannot share his actual ideas, and a description of the core unsolved problem as "generalisation" without a proposed solution.

This is worth sitting with.

The pitching in our boardrooms next week will not sound like this. Instead it will take on the language of roadmaps, capability curves, and benchmarks trending upward.

We hear lines like "This is the worst AI will ever be." But would we ever say "This is the worst an employee will ever be"? Or "This is the worst this strategy will ever be"?

Sutskever, who built the benchmarks we’re citing, observes that models achieving remarkable scores on evaluations still fail in ways that seem impossible to reconcile with their demonstrated capabilities. He offers an analogy: a student who practises 10,000 hours of competitive programming and memorises every technique will score brilliantly. A different student who practises for a hundred hours but has some ineffable quality will likely have the better career. The models, he suggests, are more like the first student. Perhaps much more.

A model scores brilliantly on coding evaluations. Ask it to fix a bug and it will. Then it creates a new one. Fix that, it reintroduces the first. The system that passed the test cannot hold two things in mind at once.

He calls the current moment a return to "the age of research" after an "age of scaling." The distinction matters. Scaling offered predictability: more compute, more data, reliably better results. Companies could invest with confidence because the recipe was known. Research offers no such guarantees. It is the domain of ideas that may or may not work, pursued by people who cannot tell you in advance which will succeed.

When asked what SSI is doing differently, his answer was that they have ideas they think are promising and they are investigating whether those ideas are correct. When pressed on specifics, he declined: "circumstances make it hard to discuss." None of this means Sutskever is right.

He has $3 billion and a company premised on current approaches being insufficient. He is, in the language of finance, talking his book. The cynical reading is available and perhaps warranted. But the honest reading is also available.

One of the architects of the current paradigm has publicly expressed doubt that it reaches the destination everyone assumes it will. Not collapse, but asymptote. And he has offered nothing to replace the uncertainty except a research direction he will not describe.

The practical question for anyone making decisions is what to do with this information.

Benchmarks will keep improving.

Case studies will accumulate.

But somewhere behind all of it, the people who built these systems are having a different conversation than the one being had in our strategy meetings.

They are talking about generalisation as an unsolved problem. They are talking about five to twenty year timelines. They are talking about the gap between what models achieve on tests and what they achieve in the world. This gap does not appear in pitch decks.

There is perhaps something uncomfortable about building organisational strategy on capabilities whose trajectory the builders themselves cannot predict. The discomfort is not a reason to stop. It is a reason to hold forecasts more lightly, to build in more reversibility, to be suspicious of confidence that exceeds what the architects themselves are willing to claim.

The people who know most are not certain. That is the signal, however inconvenient it might be.

The analysis