apps/openrouter/claude-opus-46

claude-opus-46

Opus 4.6 is Anthropic’s strongest model for coding and long-running professional tasks. It is built for agents that operate across entire workflows rather than single prompts, making it especially effective for large codebases, complex refactors, and multi-step debugging that unfolds over time.

run with your agent
# install belt
$curl -fsSL https://cli.inference.sh | sh
# view schema & details
$belt app get openrouter/claude-opus-46
# run
$belt app run openrouter/claude-opus-46

Anthropic's Claude family has become one of the most capable LLM lineups available, and for a lot of developers it's now the default choice for production agent systems. Four models span the full range from budget-friendly to no-compromise: Opus 4.6, Opus 4.5, Sonnet 4.5, and Haiku 4.5. Each serves a different purpose, and choosing the right one for your workload actually matters.

I've been running all four through inference.sh using the openrouter/ prefix, which means they sit alongside 150+ other tools in the same API surface. No separate Anthropic billing, no additional SDK, no second set of credentials. The same interface you'd use for image generation or video or web search works for Claude. That simplicity sounds minor until you're building a system that needs an LLM brain, an image generator, and a search tool in the same pipeline.

This article covers what each Claude model does well, where it stumbles, and how to pick the right one for your workload.

opus 4.6: the ceiling

Opus 4.6 is Anthropic's strongest model, full stop. It's built for the kind of work where you need the model to operate autonomously across complex, multi-step workflows - long coding sessions, document analysis that spans hundreds of pages, agentic tasks where the model needs to plan, execute, and course-correct without hand-holding.

Anthropic removed the long-context surcharge that earlier Claude models carried, so the full 1M token context window is billed at a flat rate regardless of how much you use. That's a meaningful improvement over previous generations where costs jumped once you crossed a 200K threshold.

Where Opus 4.6 genuinely earns its position is sustained reasoning over complex problems. I've thrown it multi-file refactoring tasks where the model needs to understand interdependencies across a dozen source files, plan a migration path, and execute changes that don't break the build. It handles these with a consistency that lower-tier models can't match. The gap isn't subtle - it's the difference between getting a coherent PR and getting a tangle of half-finished edits that require manual cleanup.

The honest caveat is that for many everyday tasks, Opus 4.6 is overkill. Summarizing an email, answering a factual question, generating a short function - these don't need the most powerful model in the lineup. Think of Opus 4.6 as the specialist you bring in for hard problems, not the daily driver.

opus 4.5: the previous generation flagship

Opus 4.5 was Anthropic's top model before 4.6 arrived, and it's still a genuinely capable system for demanding work.

The question everyone asks is whether Opus 4.5 still makes sense now that 4.6 exists. Honestly, for most use cases Sonnet 4.5 has eaten its lunch from below. Sonnet delivers comparable quality on coding and agent tasks at a fraction of the cost. And for the truly difficult problems where you need the absolute best output, 4.6 sits above it with measurable improvements in sustained reasoning and agent reliability.

That said, if you've already built workflows tuned to Opus 4.5's behavior and they're working well, there's no urgent reason to migrate. Models at this tier all perform at a high level, and the differences between Opus 4.5 and 4.6 are most visible on the hardest tasks. For routine professional work, Opus 4.5 remains more than adequate.

sonnet 4.5: the one to watch

Here's where things get interesting. Sonnet 4.5 is significantly cheaper than either Opus model, and yet it scored 77.2% on SWE-bench Verified - the highest result any model achieved on that benchmark at launch - which measures real-world software engineering ability.

A model that costs significantly less than the flagship is leading one of the most credible coding benchmarks in the industry. This is the kind of price-performance ratio that changes how you architect systems.

Sonnet 4.5 is optimized for agents and coding. It handles extended autonomous operation - the kind of task where you set a model loose on a problem and check back later. It follows complex multi-step instructions reliably, writes clean code, and catches edge cases that cheaper models miss. For building agents that need to run inside inference.sh workflows, calling other tools, processing results, and making decisions, Sonnet 4.5 is my default recommendation.

The tradeoff is on the hardest reasoning tasks. Multi-step mathematical proofs, extremely long documents requiring precise cross-referencing, or problems where the model needs to maintain a complex mental model over many turns. These are where Opus 4.6 still pulls ahead. For the other 80% of professional AI work, Sonnet 4.5 delivers results that would have been flagship-tier twelve months ago at a mid-range price.

haiku 4.5: speed and volume

Haiku 4.5 is the lightweight member of the family, but "lightweight" doesn't mean "weak." Anthropic designed Haiku to punch above its weight class, and it genuinely performs at levels that would have been impressive from a frontier model not long ago.

The use case for Haiku is anything where you need fast responses at high volume. Classification tasks, content moderation, quick summarization, routing decisions in an agent pipeline, real-time chat where latency matters to user experience. Haiku responds faster than any other Claude model and is the most affordable option in the lineup, so you can afford to run it at scale.

I use Haiku as the first-pass model in multi-tier architectures. Simple questions get answered by Haiku directly. Complex questions get identified by Haiku and routed to Sonnet or Opus for deeper processing. This pattern saves significant money compared to running everything through a top-tier model, and the user experience actually improves because simple responses arrive faster.

Where Haiku falls short is on tasks requiring deep reasoning or long, complex output. Ask it to write a detailed technical specification or refactor a large codebase and you'll see the gaps compared to Sonnet or Opus. The outputs will be plausible but less thorough, more likely to miss edge cases, occasionally inconsistent across a long generation. It's a model that trades depth for speed, and that trade works beautifully in the right context.

how claude competes

Let's be direct about the competitive picture. Claude goes up against GPT-4 from OpenAI, Gemini Pro from Google, and a growing roster of open-weight models from Meta, Mistral, and others.

Claude's strengths are real and well-documented. Instruction following is best-in-class - Claude models reliably do what you ask without creative reinterpretation of the prompt. Coding ability across the lineup is genuinely strong, with Sonnet 4.5's SWE-bench results backing that up with data rather than marketing claims. Safety and honesty are deeply integrated; Claude models are less likely to hallucinate confidently and more willing to say "I don't know" when appropriate.

Where competitors hold advantages: Google's Gemini has tighter integration with web search and real-time information, which matters for tasks requiring current knowledge. GPT-4o handles multimodal input - images, audio, video - with broader capability than Claude's current offerings. Some open-weight models running on dedicated infrastructure can beat Claude on latency for specific, narrow tasks. And for teams that need full control over model weights and deployment, open models offer flexibility that no API-based service can match.

The honest assessment is that for pure text reasoning and code generation, Claude is at or near the top. For multimodal workflows or tasks requiring live web data, the picture is more nuanced. Building on inference.sh gives you the option to use Claude for what it does best and route other tasks to models better suited - Gemini for grounded search, FLUX for image generation, specialized tools for domain-specific work. That's the practical advantage of a unified platform over going all-in on any single provider's ecosystem.

the 1M context window on opus 4.6

One of the best things about Opus 4.6 is that Anthropic removed the long-context surcharge that plagued earlier models. The full 1M token context window is available at a flat rate with no tiered pricing and no surprise cost jumps at 200K tokens like previous generations had.

This matters because context windows in agent systems tend to grow over time. A long conversation, a large codebase loaded for reference, accumulated tool call results - these add up. With flat pricing, you don't need to obsess over crossing an arbitrary threshold. That said, more context still means more tokens, which means higher absolute costs. The economics still reward keeping your context lean - summarize older history, drop irrelevant tool results - but now the motivation is straightforward cost control rather than avoiding a pricing cliff.

For reference, 1M tokens is roughly 750,000 words. Most single-turn tasks won't come close. But multi-turn agent sessions that accumulate history, or document analysis tasks where you're loading entire reports, can consume substantial context faster than you'd expect. The flat pricing means you can use that context without fear of penalty tiers, but you should still manage it deliberately.

using claude as an agent brain on inference.sh

The real power of running Claude models through inference.sh isn't just access to the models themselves. It's the ability to wire them into workflows that combine language understanding with action. An agent running on Sonnet 4.5 can call image generation tools, search the web, process documents, generate video, and make decisions based on results - all through the same API surface.

This is where model selection becomes a systems design question rather than a benchmarking exercise. Your orchestration layer might use Haiku for quick routing decisions, Sonnet for the core reasoning loop, and Opus for the rare high-stakes analysis that requires maximum capability. Each call goes through the same openrouter/ prefix, the same authentication, the same billing. The complexity lives in your prompt engineering and workflow design, not in managing multiple provider integrations.

I find this pattern especially powerful for production systems where reliability matters more than squeezing the last percentage point of benchmark performance. If one provider has an outage or a rate limit, you can route to an alternative model without changing your infrastructure. Claude for primary reasoning, GPT-4 as a fallback, Gemini for tasks that need search grounding. The unified API makes this kind of redundancy practical rather than architectural overhead.

picking the right model

The decision framework is simpler than the number of options might suggest. Start with Sonnet 4.5 for almost everything. It's the best balance of capability and cost, and it handles coding and agent tasks as well as any model available. Move up to Opus 4.6 only when you're hitting the limits of Sonnet's reasoning on genuinely difficult problems - the kind where you can see the quality difference in the output, not just assume it exists because the model is more expensive. Use Haiku 4.5 for high-volume, low-complexity tasks where speed and cost matter more than depth.

Opus 4.5 occupies an awkward middle ground at this point. It's not the strongest model and it's not the best value. If you're starting fresh, I'd skip it and choose between Sonnet 4.5 and Opus 4.6 based on whether your workload needs the extra reasoning capability.

Sonnet 4.5 is the economic center of gravity for the Claude lineup. You should have a specific, demonstrable reason to pay more before reaching for Opus. And for the vast majority of professional AI work, that reason won't materialize.

is claude opus 4.6 worth the price difference over sonnet 4.5?

For most workloads, no. Sonnet 4.5 matches or approaches Opus 4.6 quality on coding, agent tasks, and general reasoning at a significantly lower cost. Where Opus 4.6 justifies its premium is on the hardest problems: extremely long context analysis, multi-step reasoning chains where consistency over many turns is critical, and professional tasks where the cost of a wrong answer far exceeds the cost difference between models. If you can't point to specific failures in Sonnet output that Opus would fix, stick with Sonnet.

does opus 4.6 have tiered context pricing?

No. Unlike earlier Claude generations, Opus 4.6 charges a flat rate across its entire 1M token context window. There is no surcharge for long-context requests. This is a meaningful improvement over previous models that applied premium pricing above 200K tokens. You should still manage context actively to control absolute costs - more tokens still means a higher bill - but you won't hit a pricing cliff at any particular context length.

can I use claude models alongside other AI tools on inference.sh?

Yes, and this is the main reason to run Claude through inference.sh rather than directly through Anthropic's API. All Claude models are accessible via the openrouter/ prefix alongside 150+ other tools covering image generation, video, search, audio, and more. You can build agent workflows where Claude handles reasoning while other specialized models handle generation, retrieval, or domain-specific tasks. One API key, one billing system, one integration point. The ability to mix providers without managing separate accounts is what makes the unified approach practical for production systems.

api reference

about

opus 4.6 is anthropic’s strongest model for coding and long-running professional tasks. it is built for agents that operate across entire workflows rather than single prompts, making it especially effective for large codebases, complex refactors, and multi-step debugging that unfolds over time.

1. calling the api

install the client

the client provides a convenient way to interact with the api.

bash
1pip install inferencesh

setup your api key

set INFERENCE_API_KEY as an environment variable. get your key from settings → api keys.

bash
1export INFERENCE_API_KEY="inf_your_key"

run and get result

submit a request and wait for the final result. best for batch processing or when you don't need progress updates.

python
1from inferencesh import inference23client = inference()456result = client.run({7        "app": "openrouter/claude-opus-46",8        "input": {}9    })1011print(result["output"])

stream live updates

get real-time progress updates as the task runs. ideal for showing progress bars, partial results, or long-running tasks.

python
1from inferencesh import inference23client = inference()456# stream=True yields updates as they arrive7for update in client.run({8        "app": "openrouter/claude-opus-46",9        "input": {}10    }, stream=True):11    if update.get("progress"):12        print(f"progress: {update['progress']}%")13    if update.get("output"):14        print(f"output: {update['output']}")

2. authentication

the api uses api keys for authentication. see the authentication docs for detailed setup instructions.

3. files

file inputs are automatically handled by the sdk. you can pass local paths, urls, or base64 data.

automatic upload

the python sdk automatically detects local file paths and uploads them. urls are passed through as-is.

python
1# local file paths are automatically uploaded2result = client.run({3    "app": "openrouter/claude-opus-46",4    "input": {5        "image": "/path/to/local/image.png",  # detected & uploaded6        "audio": "https://example.com/audio.mp3",  # url passed through7    }8})

manual upload

you can also upload files manually and use the returned url.

python
1# upload and get a hosted URL2file = client.files.upload("/path/to/file.png")3print(file.uri)  # https://cloud.inference.sh/...

4. webhooks

get notified when a task completes by providing a webhook url. when the task reaches a terminal state (completed, failed, or cancelled), a POST request is sent to your url with the task result.

python
1result = client.run({2    "app": "openrouter/claude-opus-46",3    "input": {},4    "webhook": "https://your-server.com/webhook"5}, wait=False)

webhook payload

your endpoint receives a JSON POST with the task result:

json
1{2  "id": "task_abc123",3  "status": 9,4  "output": { ... },5  "error": "",6  "session_id": null,7  "created_at": "2024-01-15T10:30:00Z",8  "updated_at": "2024-01-15T10:30:05Z"9}
idstringtask id
statusnumberterminal status (9=completed, 10=failed, 11=cancelled)
outputobjecttask output (when completed)
errorstringerror message (when failed)
session_idstringsession id (if using sessions)
created_atstringiso timestamp
updated_atstringiso timestamp

5. schema

input

reasoning_excludeboolean

exclude reasoning tokens from response

default: false
context_sizeinteger

the context size for the model.

default: 200000
streamboolean

stream the response (true) or return complete response (false)

default: true
filesarray

the files to use for the model

imagesarray

the images to use for the model

toolsarray

tool definitions for function calling

tool_call_idstring

the tool call id for tool role messages

reasoningstring

the reasoning input of the message

reasoning_effortstring

enable step-by-step reasoning

default: "none"
options:"low""medium""high""none"
reasoning_max_tokensinteger

the maximum number of tokens to use for reasoning

system_promptstring

the system prompt to use for the model

default: "you are a helpful assistant that can answer questions and help with tasks."example: "you are a helpful assistant that can answer questions and help with tasks."
contextarray

the context to use for the model

default: []example: [{"content":[{"text":"What is the capital of France?","type":"text"}],"role":"user"},{"content":[{"text":"The capital of France is Paris.","type":"text"}],"role":"assistant"}]
rolestring

the role of the input text

default: "user"
options:"user""assistant""system""tool"
textstring*

the input text to use for the model

example: "write a haiku about artificial general intelligence"
temperaturenumber

temperature

default: 0.7min:0max:1
top_pnumber

top p

default: 0.95min:0max:1
max_tokensinteger

max tokens

default: 64000

output

imagesarray

images

output_metaobject

structured metadata about inputs/outputs for pricing calculation

responsestring*

the generated text response

usageobject

token usage statistics

tool_callsarray

tool calls for function calling

reasoningstring

the reasoning output of the model

ready to run claude-opus-46?

we use cookies

we use cookies to ensure you get the best experience on our website. for more information on how we use cookies, please see our cookie policy.

by clicking "accept", you agree to our use of cookies.
learn more.