Claude Code Enterprise (2026): Plan, SSO/Okta Rollout, Admin Controls, and the Token-Cap Problem

What the Claude Code Enterprise plan includes ($20/seat + usage at API rates), SSO/SCIM/Okta deployment to non-engineers, per-seat admin and spend controls, and why the enterprise token cap keeps getting raised. Plus how a model router cuts per-seat token spend 40-70% by routing routine requests to cheaper Claude models.

June 15, 2026 · 2 min read
Claude Code Enterprise (2026): Plan, SSO/Okta Rollout, Admin Controls, and the Token-Cap Problem

A 120-person company standardizing on Anthropic hits the same wall. Claude Code works on the engineering team, so the call becomes: roll it out org-wide, consolidate billing, govern it like enterprise software. Three problems surface at once. Distribution to non-engineers through the company identity provider. Per-seat admin control over access and spend. And a token cap on the enterprise license that finance keeps asking Anthropic to raise. The plan solves the first two. The third is the one worth engineering around.

$20/seat
Enterprise seat fee, usage billed at API rates on top
SAML / SCIM
SSO for Okta, Entra ID, Google Workspace + auto-provisioning
60-80%
Coding-agent requests that are routine, routable to cheaper models
40-70%
Per-seat token spend cut by routing within the Claude family

The Org-Wide Rollout Problem

The decision to standardize on Anthropic usually starts in engineering. Claude Code earns its keep on a few teams, the contract gets attention, and the question shifts from "does this work" to "how do we give it to everyone and keep it under control." That shift surfaces a different set of requirements than the ones that got the first seats approved.

Three requirements dominate the rollout conversation:

  • Distribution. Engineers want the CLI and IDE integration. Non-engineers want Claude in the browser. Both need to be provisioned through the company identity provider (Okta, Entra ID, Google Workspace), not through individual invites.
  • Per-seat governance. Admins need to see who is using what, scope access by role, and cap what each seat can spend. A finance team will not sign off on a tool with no per-user controls.
  • Cost predictability. The Enterprise plan is a seat fee plus usage at API rates. Usage is the variable that grows, and it grows with how hard the model works, not with headcount. The token cap on the license becomes the number everyone watches.

The Enterprise plan answers distribution and governance directly. Cost predictability is the part the plan does not solve on its own, because the usage portion of the bill is driven by which model handles each request, and by default that is the most expensive one. The sections below cover what the plan includes, how to deploy it, and how to keep the token cap from becoming a quarterly renegotiation.

What the Claude Code Enterprise Plan Includes

The Enterprise plan is priced at $20/seat per month with usage billed on top at standard API rates. The seat fee covers access plus the full admin and security surface; the usage cost scales with model and task. Pricing is negotiated rather than published per-tier the way the Team plan is.

On top of everything in the Team plan, Enterprise adds the controls a security and finance team needs before an org-wide rollout:

SSO + Domain Capture

SAML 2.0 / OIDC single sign-on with any identity provider. Domain capture claims your email domain so every login routes through SSO.

SCIM Provisioning

Automated user provisioning and de-provisioning. Accounts created and deactivated by your IdP as people join and leave.

Role-Based Access

Fine-grained permissioning with user groups. Scope what each role can do across Claude Code, chat, and Cowork.

Per-Seat Spend Limits

Admins set user and org spend limits. Caps the usage portion of the bill per seat, not just in aggregate.

Audit Logs + Compliance API

Full audit logs of user actions plus a Compliance API for programmatic activity queries. Configurable data retention.

Usage Analytics

Per-user, per-group, per-feature breakdowns, including per-user and per-repo Claude Code usage. The data you need to find where the token cap goes.

For regulated industries, the sales-assisted Enterprise plan adds a HIPAA-ready offering with a signed BAA and custom data residency on request. The seat fee is the same; the difference is contracting terms and retention controls.

The seat fee is the small number

For an org running coding agents heavily, the $20/seat fee is a fraction of the bill. Usage at API rates dominates. A team where every agent defaults to Opus 4.6 ($5/M input, $25/M output) can spend more on tokens in a week than the seat fees cost in a month. That is the number to govern, and it is the number the token cap exists to bound.

Team vs Enterprise: What Actually Changes

The Team plan covers small organizations well. The features that force the move to Enterprise are identity, governance, and per-seat cost control. If you are deploying to more than ~150-200 people, or your security team requires SCIM and a custom identity provider, Team runs out of room.

CapabilityTeamEnterprise
Seat pricing$20-25 std, $100-125 premium$20/seat + usage at API rates
Org size5 to 150 usersNo published cap (negotiated)
SSOGoogle / Microsoft onlySAML / OIDC, any IdP (Okta, Entra ID)
ProvisioningManual inviteSCIM (automated)
Role-based accessAdmin / userFine-grained roles + user groups
Per-user spend limitsNoYes
Audit logsBasic admin activityFull audit log + Compliance API
Claude Code analyticsPer workspacePer user, per repo
Data retention30 daysConfigurable
HIPAA / BAANoYes (during contracting)

For the full per-plan price breakdown including Pro and Max, see the Claude Code pricing guide. For the exact per-token API rates that drive the usage portion of an Enterprise bill, see Anthropic API pricing.

SSO and Okta Deployment

Enterprise SSO supports SAML 2.0 and OIDC, with verified setup paths for Okta, Microsoft Entra ID (Azure AD), and Google Workspace. The flow is a standard SAML application integration, configured once in your identity provider and once in the Claude admin console.

Okta setup, step by step

In the Okta Admin console, go to Applications → Applications → Create App Integration, choose SAML 2.0 (recommended) or OIDC, then enter the Single sign-on URL and Audience URI from your Claude admin console. Configure the name ID format and application username, save, and assign the app to the users or groups that should receive a seat.

Okta SAML app values (from the Claude admin console)

# Okta: Applications -> Create App Integration -> SAML 2.0

# SAML Settings (copy these from the Claude admin console)
Single sign-on URL   = https://<your-claude-sso-acs-endpoint>
Audience URI (SP)    = <your-claude-saml-entity-id>
Name ID format       = EmailAddress
Application username  = Email

# Attribute statements (map IdP profile -> Claude)
email       -> user.email
firstName   -> user.firstName
lastName    -> user.lastName

# Assign the app to a group (e.g. "claude-code-engineers")
# Members of that group get a seat on next login.

Domain capture is the piece that makes SSO enforceable. Once you claim your email domain, any login from that domain routes through SSO rather than a personal account, which prevents shadow accounts and centralizes provisioning. Combine domain capture with SCIM and the account lifecycle (create, update, deactivate) follows the identity provider automatically.

SSO is the gate, SCIM is the lifecycle

SSO decides how people log in. SCIM decides who exists. You want both: SSO so credentials live in Okta and not in Claude, and SCIM so a deactivation in Okta removes the seat (and stops the spend) without an admin touching the Claude console. Configuring only SSO leaves you de-provisioning by hand, which is exactly the manual work the Enterprise plan exists to remove.

SCIM and Distribution to Non-Engineers

The reason to consolidate on the Enterprise plan is rarely just the engineers. The pitch is one tool, one login, one bill, for the whole company. SCIM is what makes that practical at 120-150 seats without an admin manually inviting every new hire.

With SCIM enabled, you map identity-provider groups to Claude seats. Add someone to claude-code-engineers in Okta and they get a seat with the CLI and IDE access. Add someone to claude-cowork-ops and they get the web app and Cowork. The same seat works across Claude Code, Claude chat, and Cowork, so one license covers a backend engineer and a non-technical operations hire equally.

Distribution by role also gives you the hook for cost control. Non-engineering seats rarely run long agent sessions, so scoping them with role-based access and a lower per-user spend limit keeps them from drawing the same token volume as a heavy Claude Code user. The provisioning model and the cost-governance model are the same model: who is in which group.

Group-mapped provisioning

IdP group membership drives seats. Add to a group, get a seat; remove from the group, lose the seat. No console clicks per user.

One seat, every surface

The same seat covers Claude Code (CLI + IDE), Claude chat, and Cowork. Engineers and non-engineers run on the same license type.

Admin and Per-Seat Controls

The Enterprise admin surface is built around three questions a platform or finance team asks: who can do what, what did they do, and how much did it cost. Each maps to a specific control.

Who can do what: role-based access

Fine-grained permissioning with user groups lets you define roles beyond admin and member. Scope which groups can use Claude Code versus chat-only, which can install connectors, and which have elevated admin rights. This is the difference from the Team plan, where access is effectively admin or user.

What did they do: audit logs and Compliance API

Full audit logs capture user actions and system events. The Compliance API exposes that activity programmatically, so you can pull it into a SIEM or run your own monitoring rather than reading it in a console. Data retention is configurable to match your policy instead of a fixed 30-day window.

How much did it cost: spend limits and usage analytics

Admins set per-user and per-org spend limits, the control the Team plan does not have. Usage analytics break down consumption per user, per group, per feature, and for Claude Code specifically, per user and per repository. This is the data that tells you where the token cap is going: which seats and which repos drive the most usage, and therefore where routing or compaction has the most leverage.

Per-repo analytics is the diagnostic

Per-user, per-repo Claude Code usage analytics is the most actionable number on the Enterprise dashboard. It tells you that, say, three repos and eight engineers account for 60% of token spend. That is where a router pays for itself first: the heaviest seats running the most routine requests on the most expensive model.

The Token-Cap Cost Problem

The Enterprise plan is a seat fee plus usage at API rates, and seat-based plans enforce usage caps. The 5-hour and weekly limits, shared across Claude Code, chat, and Cowork, were doubled on May 6, 2026 for Pro, Max, Team, and seat-based Enterprise plans. For an org, the practical version of this is a token cap on the contract that finance keeps asking to raise.

The cap keeps getting hit for a structural reason, not a usage-discipline one. By default, every developer's Claude Code agent sends every request to the model it is configured for, usually the most capable one. Opus 4.6 costs $5/M input and $25/M output. A typical Claude Code session runs ~$0.34, but heavy users running long sessions on the frontier model spend far more, and they do it for tasks a cheaper model handles identically.

The cost math compounds the way it does for any agent. Each turn re-sends the full conversation history, so a 200K-token session pays for early context on every subsequent turn. Multiply that across 120 seats and the token bill grows faster than headcount. Raising the cap treats the symptom. The cause is that the model doing routine work is the most expensive one available.

$5 / $25
Opus 4.6 per 1M input / output tokens
$1 / $5
Haiku 4.5 per 1M input / output tokens
~$0.34
Typical Claude Code session cost
60-80%
Requests that are routine and routable to cheaper models

The lever that addresses the cap directly is reducing tokens-per-seat, not raising the ceiling. Three mechanisms do that: route routine requests to cheaper Claude models, cache repeated prefixes (90% off cache reads), and compact long agent contexts (50-70% fewer tokens). The first has the highest ROI and the lowest integration cost. See the full LLM cost optimization guide for all five levers and the combined math.

Cutting Per-Seat Spend with a Model Router

Most requests to a coding agent do not need the most expensive model. Adding a comment, formatting output, renaming a variable, generating a boilerplate test. These run identically on Haiku 4.5 at $1/M as on Opus 4.6 at $5/M. The problem is that without a router, all of them go to the frontier model, because that is what the agent is configured to use.

A model router sits between Claude Code and the Anthropic API. It reads each prompt, classifies the difficulty, and picks the cheapest Claude model that can handle it. Easy tasks go to Haiku 4.5, medium tasks to Sonnet 4.6, hard tasks to Opus 4.6. Because 60-80% of coding-agent requests are routine, the weighted cost per request drops 40-70%, and so does the token cap pressure.

$0.001
Per classification request (Morph Router)
~430ms
Router classification latency
Anthropic-only
Routes within the Claude family if required
40-70%
Per-seat token spend reduction

Morph Router classifies each request into easy, medium, hard, and needs_info in ~430ms at $0.001 per request. For an enterprise consolidating on Anthropic, the property that matters is Anthropic-only routing: the router can stay entirely within the Claude family (Haiku 4.5 / Sonnet 4.6 / Opus 4.6), so the rollout, the data, and the contract stay with one provider. Routing decisions are explainable, and a new Claude model can be added to the tiers without retraining the router.

Anthropic-only routing for Claude Code (TypeScript)

import Morph from "morphllm";

const morph = new Morph({ apiKey: process.env.MORPH_API_KEY });

// Stay entirely within the Claude family. The data and the
// contract never leave Anthropic; only the model tier changes.
const CLAUDE_TIERS = {
  easy:       { model: "claude-haiku-4-5",  inputCost: 1.00 },  // $1/M
  medium:     { model: "claude-sonnet-4-6", inputCost: 3.00 },  // $3/M
  hard:       { model: "claude-opus-4-6",   inputCost: 5.00 },  // $5/M
  needs_info: { model: "claude-sonnet-4-6", inputCost: 3.00 },
} as const;

async function routedCompletion(messages: Message[]) {
  // Classify difficulty ($0.001, ~430ms)
  const { difficulty } = await morph.router.classify({ messages });

  // Route within Anthropic only
  return morph.chat.completions.create({
    model: CLAUDE_TIERS[difficulty].model,
    messages,
  });
}

// Per seat, 200 calls/session:
//   Without routing: 200 x Opus 4.6 ($5/M) = highest token-cap draw
//   With routing (70% easy, 20% medium, 10% hard):
//     weighted input ~ $1.90/M vs $5/M = ~62% less per seat

The economic shift is the point. A per-token bill grows with how hard the org uses Claude Code, which is what makes the cap a moving target. Routing converts most of that variable spend into the cheaper tier, so total usage tracks closer to flat per-seat economics. The seat fee does not change; the usage portion, which is most of the bill, drops. For multi-agent setups where a planner and executor run separately, the same logic applies at the role level: frontier model for the planner, cheaper Claude model for the executor. See the multi-agent routing guide.

Routing addresses the cap, caching and compaction extend it

Start with routing: it is one API call per request, no change to prompts or conversation structure, and it cuts the dominant cost driver 40-70%. Then layer Anthropic prompt caching (90% off cache reads on repeated system prompts and reference docs) and context compaction (50-70% fewer tokens on long agent sessions, verbatim deletion, zero hallucination). Together these keep the token cap from being the number finance watches.

Frequently Asked Questions

How much does Claude Code Enterprise cost?

The Enterprise plan is $20/seat per month with usage billed separately at standard API rates. The seat fee covers access and admin controls; the usage cost scales with model and task. Opus 4.6 is $5/M input and $25/M output, Sonnet 4.6 is $3/M and $15/M, Haiku 4.5 is $1/M and $5/M. Pricing is negotiated rather than published per-tier, and for heavy coding-agent use the usage portion dominates the bill. See Anthropic API pricing for exact rates.

What is the difference between Claude Team and Claude Enterprise?

Team includes Claude Code, central billing, and SSO limited to Google and Microsoft, with manual invites and basic admin logs, for 5 to 150 users. Enterprise adds SAML SSO for any identity provider, SCIM provisioning, full audit logs plus a Compliance API, fine-grained role-based access, per-user and per-org spend limits, configurable retention, and per-user/per-repo Claude Code analytics. Organizations past roughly 200 users move to Enterprise for the provisioning and governance controls.

Does Claude Code Enterprise support SSO and Okta?

Yes. Enterprise supports SAML 2.0 and OIDC SSO with Okta, Microsoft Entra ID, and Google Workspace. Okta setup is a standard SAML app integration: create the app, paste the Single sign-on URL and Audience URI from the Claude admin console, set the name ID format, and assign users or groups. Domain capture claims your email domain so all logins route through SSO.

Can I deploy Claude Code to non-engineers on the Enterprise plan?

Yes. Once SSO and SCIM are configured, you provision any employee by adding them to the mapped identity-provider group. Engineers use the Claude Code CLI and IDE integrations; non-engineers use the web app and Cowork. The same seat covers all three surfaces, and role-based access plus per-user spend limits keep non-engineering seats scoped and bounded.

What admin controls does Claude Code Enterprise give per seat?

Per-user and per-org spend limits, fine-grained role-based access with user groups, full audit logs, a Compliance API for programmatic queries, configurable data retention, and per-user/per-repo Claude Code usage analytics. The analytics show which seats and repos drive the most token usage, which is where routing and compaction pay off first.

Why does the Claude Code enterprise token cap keep getting raised?

Because the plan is seat fee plus usage at API rates, and seat-based usage caps (doubled May 6, 2026) are shared across Claude Code, chat, and Cowork. When every agent defaults to Opus 4.6 for tasks a cheaper model handles identically, token consumption grows faster than headcount, so finance keeps requesting higher caps. The fix is reducing tokens per seat through routing, caching, and compaction.

How do I reduce per-seat token spend on Claude Code Enterprise?

Route by difficulty. 60-80% of coding-agent requests are routine and run identically on Haiku 4.5 ($1/M) as on Opus 4.6 ($5/M). Morph Router classifies each prompt in ~430ms at $0.001 and routes within the Anthropic family, cutting per-request cost 40-70%. Stack prompt caching (90% off cache reads) and context compaction (50-70% fewer tokens) for further reduction. See the cost optimization guide.

Is Claude Code Enterprise billed per seat or per token?

Both. There is a $20/seat monthly fee plus usage at API rates. For heavy coding-agent use, the usage portion dominates and the seat fee is a small fraction of total spend. A router does not change the seat fee; it lowers the variable usage cost that makes up most of the bill, moving the economics closer to flat per-seat.

Related Resources

Govern Per-Seat Token Spend on Claude Code Enterprise

The Enterprise plan handles SSO, SCIM, and admin controls. The token cap is the part you engineer around. Morph Router classifies each Claude Code request in ~430ms at $0.001 and routes within the Anthropic family, sending routine work to Haiku or Sonnet and reserving Opus for hard tasks. Per-seat token spend drops 40-70% with no change to output quality.