Why is AI still scaling? How do the big AI labs make money? What should alignment folks spend resources / capital on? How should other countries keep up?

My answers for the Dwarkesh "Big Questions on AI" essay competition.

May 11, 2026

Dwarkesh (of House Amodei) runs the Dwarkesh Podcast, which is probably the preeminent podcast in SF about AI and tech in general. There are a lot of podcasts about AI, SF loves podcasts. So by reach alone, Dwarkesh is one of the most influential thinkers and commentators in the Bay. That is reflected in his guest list — he has had CEOs and politicians, founders and scientists on his show.

Dwarkesh recently published an essay competition, asking participants to answer one of four big, open questions about AI. The rules of the competition only allow you to submit one essay, but for kicks and giggles I wrote up answers to all four. Now that the submission date has passed, here are my answers.

A couple years ago, there was this idea that AI progress might slow down as we make further progress into the RL regime. 1. Because as horizon lengths increase, the AI needs to do many days’ worth of work before we can even see if it did it right, so if we’re still in a naive policy gradient world, the reward signal / FLOP goes down, and 2. We’d crossed through many OOMs of RL compute from GPT 4 to o1 to o3, and it would not be feasible to replicate that many OOMs increase in compute immediately again. But AI progress seems to have been fast nonetheless - even potentially speeding up if rumors about Spud or Mythos are to be believed. What gives? What did that previous intuition pump that motivated longer timelines miss? Feel free to deny premise of question.

I don’t think there’s a single silver bullet thing that people missed. AI development is highly multimodal and increasingly depends on a full stack that spans from energy providers to UX designers. The hypothetical timelines above are limited by looking only at RL and RL compute, but many more things have gotten better outside of that narrow slice of the stack, including:

More compute for pre training
More and higher quality data for pre training
A new understanding of how models function, which has led to more efficient training
Better prompt engineering and context engineering
Coding agent UX

And that’s before you look at RL specifically, which of course, has gotten better thanks to improved techniques and better sim environments.

I think of AI development like Moore’s law. I’ve said in the past:

A good metaphor for this kind of advancement is Moore’s law. Very roughly, Moore’s law states that the number of transistors on a chip should double roughly every two years.
Moore’s law does not make assumptions about where those gains are coming from. Moore didn’t say something like “chip capacity will double because we’re going to get really good at soldering” or whatever. He left it open. And in fact in the 60-odd years since Gordon Moore originally laid out his thesis, we’ve observed that the doubling of transistors came from all sorts of places — better materials science, better manufacturing, better understanding of physics, all in addition to the (obvious) better chip design.
You can imagine a kind of Moore’s law for intelligence, too. We might expect artificial intelligence to double along some axis every year. Naively we’d expect that improvement to be downstream of more data and more compute. But it could also come from better quality data collection, more efficient deep learning architectures, more time spent on inference, and, yes, better prompting.

This still applies. Note too that an increasingly large chunk of the world economy and the world’s smartest people are working in AI, which has compounding effects on all of the above. If I had to summarize this into one pithy response, it’s that the earlier timelines underestimate human ingenuity. Which is a bit of a cop-out, but is my honestly held opinion.

If I was forced to be more specific, I’d drill down on the implicit framing that the RL regime is the most important thing. In my opinion, improvements to pretraining have done far more for model progress than any amount of RL — in particular, more stable gradients, widespread sparse MoE, and multimodal data. The jump to Claude 3.7 or Opus 4.5 was on the back of new pretrained models. Same with Gemini 2.5 when that was the new hotness, and same with GPT-5.2 (but notably NOT GPT-5 and GPT 5.1) more recently. Anecdotally, over the last few years, Anthropic surged ahead of OpenAI because the latter lost all of their best pretraining experts due to OpenAI’s politics,1 which in turn resulted in OpenAI not having a successful pretrain for nearly a full year (!).

Bluntly, I’ve always been very bearish on RL, and I think that many people overfit to RL hype. At a low level, RL is about approximating gradients in discrete environments where you cannot get a continuous signal. This is valuable because there are many environments where you cannot get a continuous signal. But it is still an approximation. Models are highly sensitive to gradient error. As a result, it is always better to get real gradients from backprop, which in turn means that it is more fruitful to try and turn RL signal into supervised signal where possible.

In my mind, strong pretraining raises the ceiling of what these things can do, and RL is how you actually hit that ceiling. No one bothers trying to train models exclusively with RL, the rewards are too sparse for the model to do anything coherent. Training RL models on different pretraining mixes shows that RL ends up amplifying what’s in the existing mix, instead of learning brand new concepts or making new leaps. And distilling reasoning traces from larger models using supervised fine tuning gets better results than using RL on the same base model.

These results are reflected in RL scaling laws, which follow a sigmoid as a function of compute. Compare to pretraining scaling laws, which are famously power laws. RL training can mold clay, but can’t create more of it.

Stepping back from RL specifically, it’s worth asking the follow up: power laws ought to fall trap to the required OOM increases in compute and data as well, so what gives? But here I again fall back to the full scope of improvements that happen at every part of the AI stack. The scaling laws require more data, and we have more data. The scaling laws require more compute, and we have more compute. RL is great for shaping the output towards environments where we cannot get obvious gradients, but beyond that, everything else on the model side is commentary.

The last thing I’ll say is that I do think form matters as much as function. I’ve always been a believer that the right HCI can make someone 100x more productive with a given tool. I think we engineers underrate the importance of product design like Claude Code, both in terms of how it shapes the model behavior and in terms of how it shapes global perception. The question framing — that AI progress is speeding up — is on the back of an incredible run over the last few months. But that run in turn was more attributable to Claude Code being a good product than it was ‘Gemini posted top scores on ARC AGI.’ More generally, it seems very hard to untangle how much progress is actually because we just started fine-tuning tool calls to make these things actually useful in the real world, vs the models themselves getting ‘better’ under the hood.

What’s the most plausible story where foundation model companies actually start making money? If you consider each individual model as a company, then its profits may be able to pay back the training cost. But of course, if you don’t train a bigger, more expensive model immediately, then you stop making money after 3 months. So when does the profit start? Maybe at some point scaling will plateau, but if progress at the frontier has slowed down, then the combination of distillation and low switching costs (cloud margins result from high switching costs) makes it really easy for open source to catch up to the labs, eating into their margins. So how do the labs actually start making money?

I have three answers here, presented in increasing amounts of tongue-in-cheek.

First, the serious answer: token demand is outstripping total compute supply.

Empirically, I feel like I could just quote Dylan and SemiAnalysis here. They make a good case, and they do so more eloquently and in more technical detail than I could.

Demand per SOTA token just empirically seems highly inelastic. People want the best model, so much so that market correction comes more in the form of service degradation due to demand than it does competition.

But pointing at empirics isn’t really interesting for an essay; what’s the theoretical answer, the one that would satisfy an intro econ student, for why this is happening?

Massively simplifying, the token providers make money in the long term when:

the demand for tokens > supply of token providers
It is not easy for new token providers to come into the market and increase supply

It’s hard to say exactly what the demand for tokens looks like, but it is very very high. Regular users of coding agents spend thousands on tokens per month. Fully automated background agent teams are only just starting to come online. And huge parts of the economy have not yet woken up to the benefits of AI tools. I think demand for tokens is increasing on a timespan of days or weeks.

What about token supply? The primary input for token supply is compute. Compute build out timelines are measured on a timespan of months to years. And demand for compute is, in turn, putting pressure on energy and materials, the supply of which is measured on a timespan of years to decades (and this before the destructive wars in Ukraine and Iran, which further delays energy build out).

Even if every bit of spare compute was used to deploy tokens, it seems like this will not be anywhere close to meeting demand for the entire market. That means two things:

First, new token suppliers who are attempting to play catch up with cheaper open-sourced models will simply be limited by the number of inference chips globally available, even if those open-sourced models are comparable.
Second, even if a particular lab ends up with a new SOTA on a particular training run, they will not be able to support all of the demand for that model, and will either have to aggressively cap usage or degrade service — both of which will drive some demand to other non-SOTA models.

(As an aside, I suspect that those with access to compute — including the folks serving open source models on neo clouds — will not compete with each other on price too aggressively. I think they will act more like a landlord cartel. Maybe they won’t explicitly fix prices, but they will look at the prices posted by their “competition” and simply match. After all, all token suppliers have shared interest in making money on inference.)

But wait, why wouldn’t the money just all accrue to, like, NVIDIA? If the compute layer is the bottleneck, shouldn’t compute-layer companies be able to squeeze out the token providers?

For what it’s worth, Dylan et al ask the same question, and their conclusion is that NVIDIA (and TSMC) is reserving pricing power as a means of avoiding regulatory scrutiny while ensuring the broader market has gas during economic downturns. Having a healthy number of downstream token suppliers is a sort of long-termist ‘defense-in-depth’ strategy for ongoing demand. If there are 100 such suppliers right now and they are all able to save a bit by having good margins, it’s more likely that at least 1 will make it through the next recession — at least, compared to the counterfactual world where there are only 10 suppliers on extremely thin margins and they all die because they have no war chest.

I agree with this, and also want to add: I think the token providers are price makers more than takers, or at least more than we give them credit for. For the first two decades of my life, GPUs were sorta a joke. Their primary market value was letting gamers play video games at extremely high resolutions. As one of those gamers, I think this was a valuable service! But no matter how well you execute on that niche, you simply will not become the most valuable company in the world. OpenAI and Anthropic brought NVIDIA riches. If I were NVIDIA, I’d be very cautious about doing things that may kill the golden goose. That includes the framing that Dylan mentions, but it also includes things like “making it financially imprudent to develop competing full stack chip infrastructure.” There are many upstarts who want to eat the GPU lunch, and NVIDIA’s job right now is to make sure none of them are able to. The best way to do that is to just beat everyone on price, leveraging its massive economies of scale and procurement pipelines to price lower than their competition ever could. This in turn forces competitors to offer a meaningfully different product…which NVIDIA can then just acquire (see grok).

So, net net, I think as long as demand for tokens continues on its current trajectory, the AI labs will be fine. Note that long term, I tend to be bullish on Google just because of how much it is able to control its own destiny across the stack, from energy to data centers to chips to models.

That was all the serious answer.

Second, the halfway serious answer: government contracts. Turn the economic question into a political one. Sell to the buyer with infinite demand, zero price sensitivity, and a long history of inefficient purchasing by becoming friends with the right people. Anthropic may no longer be able to go this route, but OpenAI and Google are perfectly positioned to make boatloads of cash this way. It’s also totally possible for the big labs to just straight up be regulated like commodities or outright nationalized.

Third, the unserious (but maybe…) answer: just ask the AI to make you money. If the labs ever get to the ‘takeoff’ scenario, they should eventually just be able to go to the AI and ask the AI to make money for them, whether by investing in the market or telling them what to do to run their businesses. Matt Levine has a whole bit on this:

We have talked before about the business model that Sam Altman proposed for OpenAI in 2019, which was (1) build an artificial superintelligence, (2) ask it how to make money and (3) do that. “We will create God and then ask it for money,” as I put it.
What would the AI say? Well, when we talked about it in October, OpenAI was apparently getting into advertising, affiliate shopping links and porn, and I joked that that is what a large language model trained on the internet would come up with. But a more first-principles answer would be something like this: “I am a superintelligent AI, constructed to be a bit smarter than the smartest human in every domain of human knowledge. The way I can make money is: in all of the ways. I will start a biotech company and discover the best drugs. I will start an accounting firm and do the best audits. I will start a publishing house and publish the best books, which I will write myself. I will start a pest-control company and hire the best exterminators, schedule them in the most efficient way and do advertising and pricing with perfect efficiency, though I will need humans to kill the bugs. Certainly I will start an electronic proprietary trading firm. All the ways that humans make money, I will do, just better than them.”

Idk maybe it’s enough to just have the models trade stocks on the side. Or, like, steal everyone’s crypto:

I wrote yesterday about the generic artificial intelligence business model, which is (1) build an artificial superintelligence, (2) ask it how to make money and (3) do that. I suggested some ideas that the AI might come up with — internet advertising, pest-control rollups, etc. — but I think I missed the big one. Like, in a science-fiction novel about a superintelligent moneymaking AI, when the humans asked the AI “okay robot how do we make money,” you would hope that the answer it would come up with would be “steal everyone’s crypto.” That’s a great answer!

With OpenAI’s new raise at an $852B valuation, OpenAI Foundation’s stake is now worth $180B. Anthropic’s cofounders have pledged to donate 80% of their wealth. Nobody seems to have a concrete idea of how to deploy 100s of billions (soon trillions) of wealth productively to “make AI go well”. If you were in charge of the OpenAI Foundation right now, what exactly would you do? And when? It’s not enough to identify a cause you think is important, because that doesn’t answer the fundamental problem of how you convert money to impact. Identify the concrete strategy you recommend pursuing.

Solutions to x-risk demand different mitigations and different levels of urgency than solutions to biased depictions on minorities — and, frankly, require very different priors about what is feasible.

Up front, I’m going to focus on mitigating the negative impacts of mass unemployment caused by autonomous agents in white collar industries, and mass unemployment caused by improvements in robotics in blue collar industries. I’m also assuming that we can move the ‘we want to make AI go well’ people as a single political block — that is, I’m not going to spend words on the feasibility of getting the members of the OpenAI foundation / the founders of Anthropic to sign off on any of this, they sign off by fiat. Finally, any strategy to offset negative effects of AI should be seen primarily as a hedge against really rapid changes. If AI doesn’t end up growing super rapidly — if unemployment grows slowly over multiple decades — there are other avenues that existing institutions can pursue to offset instability. The real risk is if the unemployment is sudden and broad. I would consider any solution to AI unemployment a success if it caps downside risk.

The simplest answer is often the correct one. In this case, the simple answer is “give people money” through some kind of UBI stipend that is funded or tied to the revenues of the big labs. Welfare funding is a non-starter in the US. In the spirit of ‘you can just do things’, it is more efficient for the OAI foundation to set up a ‘sovereign wealth fund’-like entity that pools the wealth of ~anyone who wants to donate, and then gives dividends to all people living in a specific geographic area in a certain age range. Just run the UBI program yourself. The Alaska UBI fund is a reasonable model, as is the Norwegian oil fund.

Start with just San Francisco working age adults, a population of roughly 600k aged 18 - 65. Assume 180b from the OpenAI foundation and an additional 180b from the Anthropic founders and other EAs. Yoy growth for the broader market is something like 10%, and the various endowments and sovereign funds generally aim for 5% growth. We expect AI wealth to grow faster than this, but conservatively let’s say we have 18b each year to work with.

With this money, you should easily be able to fund checks of a few thousand dollars per person every year forever. So announce an “AI Day” with a lot of fanfare, where Dan Lurie comes out and gives a big speech, where Altman and Amodei and so on get medals for donating, and where the first checks are sent out. It is very important to be very visible with these checks. People should understand that these are coming from an AI wealth fund. Each check should come with a pamphlet explaining the purpose of the check and why each individual in SF gets a cut.

There should be three medium-term goals for this fund.

First, grow the amount disbursed, either by frequency or quantity. Second, grow the area covered — after SF, aim for all of California, then all of the US, etc. Third, use the political capital earned from giving people money to encourage politicians to explicitly pull the program into the government so that it can be further funded by corporate taxes.

All of the above goals require sustainably growing the fund. In the takeoff-causing-maximum-instability scenario, the growth of AI demand should power a lot of that, and reinvesting some of the per person amount should cover the rest (if you distribute, say, 5k per person per year, then based on the numbers above you should be growing the fund by 25k per person per year). Still, just to ensure continued growth, if the OAI foundation / Anthropic Founders are able to do so, they should fiat that some amount of revenue their companies earn each year simply goes into the fund. Publicly giving people money is a good way to win over some amount of good will, so this could be justified as marketing expense.

In terms of timing, set up the infrastructure as soon as possible while the companies in question are still private and the founders / foundation has some amount of additional control to require giving money to the fund. Then start disbursing the capital once the companies go public (which may be as soon as later this year) so that the funds themselves become liquid.

In my mind, the biggest issue in the first years of the fund is the inelasticity of housing supply in SF. A geo-fenced UBI in a low housing environment is basically the worst setting — creating a UBI of any kind will likely result in an inflow of residents, and landlords can simply raise rents to eat most of the benefit. Early on, we should just assume that there will not be meaningful improvements in the lives of SF recipients.

But again, the goal of this strategy is to hedge against the worse case outcome from rapid mass unemployment. If the AI companies are able to grow so rapidly that they do create mass unemployment within ten years, then the fund will be well positioned to move quickly and grow its coverage area before landlords can respond. And if the AI companies move relatively slowly, then we may be able to avoid mass unemployment doomsday scenarios entirely.

What should countries which are not currently in the AI production chain (semis, energy, frontier models, robotics) do in order to not get totally sidestepped by transformative AI? If you’re the leader of India or Nigeria, what do you do right now?

We actually have a playbook for countries playing catch-up on the tech tree: close the borders to foreign companies with protectionist policies like high tariffs or outright bans, develop internal industry through “stolen” technology with strong state capacity, and require domestic companies to compete internationally in order to survive so you don’t protect failing businesses forever. This worked for Japan, China, Korea, Taiwan, and Singapore, though the specifics vary — for eg, Singapore had less tariffs than Korea, but had near dictatorial control. This strategy mostly failed for India, parts of Latin America, and most of Africa. In the latter cases, it almost always failed because of a corrupt bureaucracy that was unable to jettison domestic companies that were not performing. In other words, state capacity wasn’t strong or rigorous enough.

Though the strategy laid out above was mostly used for rapid industrialization, I think you could implement this kind of strategy for AI. Arguably, China has already done this over the last few years to great effect. The basic contours: massively penalize using foreign token and chip providers, subsidize multiple home grown labs to develop other parts of the infra stack, and ruthlessly enforce corporate accountability against external standards.

Software does not seem like it will be the bottleneck. We know that it is possible to distill the big publicly available models, or just mine the APIs for data. That leaves real world needs, including raw materials, energy, and fabs. Unfortunately, these real world needs are perennial problems in developing countries. Any country that already has economic capacity to do complicated compute build out is already somewhere in the AI stack.

If we are able to control the entire governing apparatus by fiat, a country like Nigeria or India, or countries in LatAm, could actually be well positioned to implement a comprehensive AI-catch-up program. These countries have human talent and access to global markets (i.e. they aren’t sanctioned), with some amount of the raw materials at home. India already has a massive government sponsored chip fab program in the works. Nigeria is in a weaker position due to things like not having figured out consistent energy or not having fully functional ports, but with the entire government working in lockstep they could figure it out. And if we are fiating this level of government control, we can also maybe fiat some level of government cooperation — Nigerian and Indian companies have free trade and information sharing agreements, for example.

Conveniently, these countries do not have to develop an entire domestic stack. They only need to be able to offset demand at some point of the chain, whether that be memory or chips or even land and favorable government treatment of data centers.

If you can’t control the entire government by fiat — if you really are just a single individual leader with a lot of foresight — this whole thing becomes a lot harder. Leaders in these countries would need to centralize economic control, which is always risky because it requires going up against entrenched interests (both domestic and foreign).

None of the above addresses countries that are extremely small or countries that are profoundly dysfunctional. Countries that today struggle to provide basic infrastructure need to get their house in order before embarking on a complicated AI build out.

To be really specific for India:

Continue investing in fabs. The Tata-PSMC fab in Dholera and the Micron packaging facility in Sanand are good starts. Subsidize two to three more now, so that you can get to the part of the ‘do things at scale’ part of the learning curve.
Force domestic demand. Aggressive tariffs or outright procurement bans on foreign frontier model APIs for any government use (which is a big chunk of India’s economy). Route to domestic labs — Sarvam recently made waves for releasing a 30B and 105B model, Krutrim has been going the ‘Google’ route for full stack sovereignty, and BharatGen is a government/academic collab. Whatever labs that are picked should transparently be picked for technical merit rather than political connection, with hard sunset clauses if they miss benchmarks. (Note: using foreign models for training data collection is fine)
Bring in diaspora talent. Lots of ethnically-Indian AI folks. Pay them a ton of money.
Grow energy. India’s grid cannot currently support hyperscale AI buildout at the scale needed. Modi would need to burn some political capital to do the energy build out, but he has pulled this kind of stunt before for the rupee corruption crackdown, united payments (UPI), and digital ID (Aadhaar). India has uranium, so maybe nuclear?

Even though I have reservations about many of Modi’s nationalist policies, for this particular question India benefits a ton from having a leader who is extremely popular and competent, runs his party with a pretty firm hand, and has experience pulling off massive government projects without any obvious derailing corruption (not the same as no corruption!).

I know far less about Nigeria. My sense is that it is in a far worse position, but I don’t know the names of the major players. Still, here is my best shot at a policy slate:

Implement heavy handed anti-corruption measures — everything else can only be done effectively downstream of a strong government that can evaluate companies on merit and move large amounts of funds without leakage.
Fix the electricity and ports.
Protect / formalize rare metals trading (prevent smuggling).
Kill protected national companies that are not producing / meeting benchmarks.

20 of the original 31 authors on the original “Language Models are Few Shot Learners” are no longer at OpenAI.

12 Grams of Carbon

Discussion about this post

Ready for more?