When OpenAI unveiled its o3 “reasoning” AI model in December, the company partnered with the creators of ARC-AGI, a benchmark designed to test highly capable AI, to showcase o3’s capabilities. Months later, the results have been revised, and they now look slightly less impressive than they did initially.
Last week, the Arc Prize Foundation, which maintains and administers ARC-AGI, updated its approximate computing costs for o3. The organization originally estimated that the best-performing configuration of o3 it tested, o3 high, cost around $3,000 to solve a single ARC-AGI problem. Now, the Arc Prize Foundation thinks that the cost is much higher — possibly around $30,000 per task.
The revision is notable because it illustrates just how expensive today’s most sophisticated AI models may end up being for certain tasks, at least early on. OpenAI has yet to price o3 — or release it, even. But the Arc Prize Foundation believes OpenAI’s o1-pro model pricing is a reasonable proxy.
For context, o1-pro is OpenAI’s most expensive model to date.
“We believe o1-pro is a closer comparison of true o3 cost […] due to amount of test-time compute used,” Mike Knoop, one of the co-founders of The Arc Prize Foundation, told TechCrunch. “But this is still a proxy, and we’ve kept o3 labeled as preview on our leaderboard to reflect the uncertainty until official pricing is announced.”
A high price for o3 high wouldn’t be out of the question, given the amount of computing resources the model reportedly uses. According to the Arc Prize Foundation, o3 high used 172x more computing than o3 low, the lowest-computing configuration of o3, to tackle ARC-AGI.
Moreover, rumors have been flying for quite some time about pricey plans OpenAI is considering introducing for enterprise customers. In early March, The Information reported that the company may be planning to charge up to $20,000 per month for specialized AI “agents,” like a software developer agent.
Some might argue that even OpenAI’s priciest models will cost well under what a typical human contractor or staffer would command. But as AI researcher Toby Ord pointed out in a post on X, the models may not be as efficient. For example, o3 high needed 1,024 attempts at each task in ARC-AGI to achieve its best score.
You Might Also Like
YouTube Shorts takes on TikTok with new creation tools
As YouTube Shorts continues to compete with TikTok, the platform is announcing new upcoming features to help creators publish engaging...
ChatGPT users have generated over 700M images since last week, OpenAI says
OpenAI’s new image-generation feature is on track to be one of the company’s most popular product launches ever. According to...
Spotify debuts Gen AI ads, programmatic ad buying
Spotify announced Gen AI ads, among other changes to its advertising business, at an event in New York City on...
Runway, best known for its video-generating models, raises $300M
Runway, a startup developing a range of generative AI models for media production, including video-generating models, has raised $300 million...