A Look Into Claude Opus 4.5

The race to build the best AI is speeding up. OpenAI, Google, and xAI keep launching new models and calling each one “the most powerful ever.” But Anthropic’s new release, Claude Opus 4.5, feels different. It’s not just a small upgrade; it’s a real shift in what developers can expect from a top-tier AI model.

Claude Opus 4.5 makes two big moves at the same time:

It sets a new bar for coding and software engineering.
It cuts the price so much that a “premium” model suddenly becomes something you can afford to use every day.

You get better results in code generation and complex agent-style tasks, while paying about three times less than you would for the previous version.

In this article, we’ll look at why Opus 4.5 is such an important release. You’ll see how the new effort parameter gives you real control over how much you spend versus how much quality you get.

New pricing

Price has always been the limiting factor for using the most powerful AI models. Opus 4.1 was incredibly capable, but many developers and teams reserved it only for their most critical, high-value tasks. The cost simply made it impractical for everyday use.

That calculation just changed completely. Anthropic slashed the price of Opus 4.5 by 67%, making it three times cheaper than Opus 4.1.

Pricing comparison table showing the dramatic reduction in costs for Claude Opus 4.5 compared to Opus 4.1

The new pricing structure shows exactly how dramatic this shift is:

Input tokens dropped from $15 per million tokens to $5 per million tokens
Output tokens fell from $75 per million tokens to $25 per million tokens

These aren't minor adjustments. For applications that process large amounts of context like analyzing codebases, summarizing documents, or powering RAG systems, the input token reduction is massive. For tasks that generate substantial output like detailed code or complex explanations, the output token savings become even more significant.

This isn't just a discount to gain market share. The pricing reflects genuine efficiency improvements in how the model works. Opus 4.5 solves problems in fewer steps with less backtracking and more concise reasoning. The model became smarter and more efficient simultaneously, creating a virtuous cycle where better performance enables lower costs.

Startups, independent developers, and enterprises can now consider using Opus for a much broader range of applications without constantly worrying about their API bill. The model that was once reserved for special occasions becomes viable for core workflows.

Improved coding capabilities

Benchmarks tell you one story. Watching a model generate a complete, playable game from a single prompt tells you another. To test Opus 4.5's coding capabilities, it received the same challenge previously given to Google's Gemini 3 Pro: create a Minecraft clone from a simple description.

The prompt was intentionally straightforward: build a procedurally generated voxel world game with basic mechanics like movement, block placement, and block destruction. The model needed to understand a high-level concept and translate it into a complex, multi-file software project.

Opus 4.5 didn't just generate code fragments. It produced a fully functional game it called "VoxelCraft" with surprisingly sophisticated features.

The VoxelCraft game title screen with Play Game button

The game opens with a proper title screen and immediately demonstrates its capabilities. The procedurally generated world features varied terrain with hills, trees, and natural-looking landscapes. This shows the model understands noise algorithms like Perlin noise commonly used for terrain generation.

Player controls feel polished and responsive. WASD controls movement, mouse handling provides smooth camera rotation, and the spacebar triggers jumping. The frame rate stays consistently high, making the experience genuinely playable rather than a technical proof-of-concept.

The core gameplay mechanics work exactly as they should. Left-clicking breaks blocks in the world. Right-clicking places new blocks. A hotbar at the bottom of the screen lets you select different block types like dirt, stone, wood, or leaves for placement.

But Opus 4.5 went beyond the explicit requirements. The model included a toggleable fly mode (activated with the 'F' key) for easy exploration. Even more impressively, it implemented a day/night cycle without being asked. The sky color transitions naturally from blue to purple sunset tones, demonstrating emergent capability that wasn't directly requested.

This result exceeds anything seen from other models on this test. It's not just functional code, it's a genuinely enjoyable mini-game with thoughtful features.

How Opus 4.5 compares to Gemini 3 Pro

The comparison with Gemini 3 Pro's attempt on the same prompt reveals the gap in capabilities. Gemini 3 Pro managed to generate a procedural world with trees and terrain, which is admirable for a single prompt. But it missed the core gameplay loop. Players could move around, but there was no way to break or place blocks. The movement felt less refined and more chaotic.

Opus 4.5 understood not just the explicit requirements but the implied expectations. When someone asks for a Minecraft-like game, they expect the ability to modify the world. That's the whole point. The model grasped this context and delivered a complete, interactive application rather than a tech demo.

Building a 3D LEGO application with Three.js

A single impressive result could be luck. To validate Opus 4.5's capabilities, it received another single-prompt challenge: create a web-based 3D LEGO builder using Three.js.

The prompt was simple: "Build a lego builder website that utilizes Three.js to allow the user to build from various lego pieces."

Once again, the model delivered a complete, working application. The "Brick Builder" includes an interactive 3D canvas with a grid-based build plate powered by Three.js. Users can pan and orbit the camera around their creation for different viewing angles.

The Brick Builder interface showing a 3D grid, LEGO creation in progress, and sidebar with color and brick selection options

The application handles the full lifecycle of brick management. A polished sidebar UI lets you select brick types (1x1, 2x2, 2x4 plates) and colors. The 3D view updates in real-time as you add bricks. The model correctly implemented grid-based placement and stacking, so bricks snap into position naturally. A "Remove" tool lets you delete previously placed bricks.

This demonstrates that Opus 4.5's coding abilities extend well beyond basic applications. It can architect user interfaces, manage state in modern frameworks like React, and handle complex 3D interactions with libraries like Three.js. All from a single high-level request.

The fact that AI models can now generate interactive applications of this complexity from minimal input represents a genuine shift in what's possible with code generation tools.

Dramatic improvements in token efficiency

Beyond raw performance, Opus 4.5 uses dramatically fewer tokens than its predecessors to reach similar or better outcomes. This efficiency improvement is what enables the lower pricing while maintaining superior capabilities.

The effort parameter gives you direct control over the performance-cost trade-off. You can choose between minimizing time and spend or maximizing capability depending on your use case.

Graph showing accuracy vs output tokens for different effort levels on software engineering tasks

Anthropic's research on this feature reveals some powerful insights. They tested Opus 4.5 on SWE-bench (Software Engineering Benchmark) at different effort levels and compared results to Sonnet 4.5.

At medium effort, Opus 4.5 matched Sonnet 4.5's best score while using 76% fewer output tokens. That's remarkable efficiency. You get the same accuracy while spending significantly less.

At high effort, Opus 4.5 surpassed Sonnet 4.5's performance by 4.3 percentage points. Even at this maximum performance setting, it still used 48% fewer tokens than Sonnet at its best.

This gives you operating points across the entire cost-performance curve. For a user-facing chatbot where latency matters, you might choose lower effort. For overnight batch processing or complex analysis tasks, you can dial up to high effort for the best possible result.

Combined with the new pricing, this level of control makes building sophisticated, cost-effective AI systems more feasible than ever. The three-times-cheaper pricing combined with superior token efficiency positions Opus 4.5 as a viable daily driver for many developers and organizations.

Dominating the software engineering benchmarks

Practical tests show impressive capabilities, but standardized benchmarks provide quantitative comparisons across models. Opus 4.5 sets new state-of-the-art records on several critical benchmarks, especially those related to coding and reasoning.

Leading on SWE-bench

SWE-bench tests a model's ability to resolve real-world GitHub issues within actual codebases. It's considered a strong proxy for real-world software engineering skill because it requires understanding existing code, identifying problems, and implementing correct fixes.

Bar chart comparing model accuracy on SWE-bench Verified, with Opus 4.5 in the lead

Opus 4.5 achieves the top score on SWE-bench with GPT-5.1-Codex-Max coming in second. It significantly outperforms Opus 4.1 and pulls ahead of strong competitors like Gemini 3 Pro.

This benchmark result corroborates what the practical tests showed. When it comes to understanding and writing code, Opus 4.5 leads the field.

Outperforming human candidates

Anthropic ran an internal test that provides a fascinating perspective on the model's capabilities. They evaluated Opus 4.5 on their own notoriously difficult take-home coding exam given to prospective performance engineering candidates. Claude Opus 4.5 scored higher than any human candidate ever has.

This doesn't mean AI is replacing engineers tomorrow. The test assesses specific technical skills under time pressure but doesn't evaluate crucial soft skills like collaboration, communication, or long-term architectural thinking. However, it demonstrates that on focused, important technical skills, AI models now surpass even strong human candidates.

Strong performance on ARC-AGI-2

The ARC-AGI benchmark measures novel problem-solving and reasoning, skills considered central to artificial general intelligence. Opus 4.5 scores second best overall on this leaderboard, slightly behind only the Deep Think version of Gemini 3 Pro. This represents a massive leap since Opus 4.1.

This indicates the model's intelligence isn't confined to code. It extends to more general and abstract reasoning tasks.

Where Opus wins and where it trails

Looking at the comprehensive benchmark suite, a clear pattern emerges. Opus 4.5 wins decisively in nearly every category related to coding and agentic behavior:

Agentic coding (SWE-bench)
Agentic terminal coding
Agentic tool use
Scaled tool use
Computer use
Novel problem solving (ARC-AGI-2)

But it's not a complete sweep. The model currently trails slightly behind top performers in a few specific areas:

Graduate-level reasoning (where Gemini 3 Pro leads)
Visual reasoning (where GPT-5.1 is strongest)
Multilingual Q&A (where Gemini 3 Pro has an edge)

There's even a quirky benchmark where Gemini 3 Pro edges ahead: the vending machine benchmark, where it apparently generates slightly more revenue. But for the vast majority of coding and agentic tasks, Opus 4.5 dominates.

This provides an honest picture of the model's strengths. Anthropic clearly focused on making Opus the undisputed champion of coding and agentic tasks. While it remains highly competent across the board (perhaps just 1% less capable in non-coding domains), users whose primary use case involves visual or multilingual tasks might find other models slightly better suited for those specific needs.

Final thoughts

Claude Opus 4.5 is more than just another strong AI model. By combining top-level coding performance with a huge price cut, Anthropic has turned it into a tool that’s both very powerful and much easier for people and teams to afford.

Its ability to create full, playable games and interactive web apps from a single prompt sets a new bar for code generation. The new effort parameter lets you control how much you pay versus how much quality you want. Benchmarks show it leading in coding, while also giving a realistic picture of how it compares in other areas.

The new lower price is especially attractive for companies. Teams that used to save “premium” models only for rare, critical tasks can now use Opus 4.5 in their everyday workflows. Strong coding skills plus everyday pricing make it a very compelling option. It can speed up development, automate complex work, and open up new ways to build software with AI.

Opus keeps its status as a top coding model, but is now affordable enough for regular use. The race for the most powerful model is still going, but for coding, Claude Opus 4.5 currently holds the crown.

Got an article suggestion? Let us know

Understanding STARFlow: Apple's Autoregressive Flow Model for Generative AI

Apple's STARFlow introduces a novel architecture for AI image and video generation by combining autoregressive models with normalizing flows. Learn how this hybrid approach achieves faster inference times, enables precise editing through invertibility, and differs from traditional diffusion-based models.

→