Choice of agent host

When you evaluate operational quality, reliability, and cost, consider the choice of agent host, such as Microsoft 365 Copilot (declarative agents), Copilot Studio (custom agents), or Azure. Keep this decision separate from the agent authoring method. Where an agent runs or is hosted determines its orchestration capabilities, model access, and operational features. These features directly affect response quality, performance, and the cost to operate the solution at scale.

This article explains how agent host platforms affect solution capabilities. You learn how different authoring methods can create agents on the same host platform while maintaining consistent quality and behavior, how a single authoring method can create agents on different platforms with different quality and behavioral results, and how the host shapes the cost profile of the solution.

Cost as an operability consideration

Treat cost as a steady-state operational characteristic, not a one-time procurement question. Two solutions can produce identical answers while differing by an order of magnitude in cost, because cost is driven by how the agent runs, not only what it returns. The host platform largely fixes the levers available to you:

  • Token consumption per interaction. Every instruction, knowledge snippet, and tool definition that the model processes on a given turn is billed on that turn. Standing context that loads on every interaction is paid on every interaction, whether or not it's relevant.
  • Number of model turns. The orchestrator decides how many times the model is invoked to complete a task. More tool-call loops and more re-planning mean more inference.
  • Model selection. Larger reasoning models cost more per token and add latency. The host determines which models are available and whether you can route different steps to different models.
  • Determinism. Work that's deterministic doesn't need model inference at all. Moving it into code or actions removes both the token cost and the variability.

The sections that follow break down the controls that most influence cost: the orchestration harness, model choice, and how you architect instructions versus deterministic actions.

Microsoft 365 Copilot hosting

Microsoft 365 Copilot provides a managed hosting environment for declarative agents with built-in governance, security, and compliance capabilities. This platform offers consistent performance characteristics regardless of the authoring method you use to create the agent.

For example, you can author declarative agents by using the Agent Builder feature in Microsoft 365 Copilot, Copilot Studio, or the Microsoft 365 Agents Toolkit. The agent host determines the orchestration, catalog, and language model options available to the developer. These options are the largest influencers of response quality. Authoring and creation platforms should be the secondary criterion for a solution in the operational steady state phase.

Different authoring platforms provide varying levels of operational capabilities suited to different organizational needs and development lifecycle stages. As long as the underlying agent host remains Microsoft 365 Copilot (declarative agents), the quality remains consistent as you progress through different authoring canvases to meet your operational needs.

The following table summarizes considerations for which authoring platform to use for declarative agents as an illustrative example.

Requirement Agent Builder feature in Copilot Copilot Studio Pro Code
Solution owner Individual Group Enterprise
Update and maintenance No versioning Versioning with locked editing Versioning with concurrent editing
Evaluation framework Test Panel Test Panel and Pro Code Fully customizable
CI/CD None Some Yes
Real-time monitoring None None Yes
Telemetry Limited Some Fully customizable
Cost/return on investment Included with Microsoft 365 Copilot Ranges from license to consumption Fully customizable based on pro-code choices
Work IQ consumption cost Work IQ grounding included with the Microsoft 365 Copilot license; unlicensed users are billed consumptively Consumption-based in Copilot Credits (pay-as-you-go or prepaid) Consumption-based in Copilot Credits through the Work IQ APIs; metered and capped in the Microsoft 365 admin center

For example, when an agent draws on Work IQ for context, retrieval, or actions, that usage is billed variably, with the credit cost scaling to the complexity of the scenario, including context size, reasoning depth, and number of steps.

Note

There's no separate Work IQ subscription, SKU, or per-user license. Because Chat and Context costs are variable, two functionally similar agents can consume very different credit volumes depending on how much context they ground and how much multistep reasoning they perform. Use the cost management dashboard in the Microsoft 365 admin center to monitor credit usage and set spending limits for tenants, groups, and users. This makes the cost-optimization patterns in Architecting for cost optimization - minimizing always-on context and pushing deterministic work into scripts and actions - directly relevant to controlling Work IQ spend.

Consider other factors such as developer lift and debugging tools (not shown in the table). Keep in mind that these factors are heavily influenced by your organization's security posture and its capacity for a particular development platform.

Promote Microsoft 365 Copilot declarative agents built in Agent Builder to a declarative agent authored with the Microsoft 365 Agents Toolkit. This strategy maintains Microsoft 365 Copilot as the orchestrator to ensure consistent agent behavior. If an experimental custom agent built in Copilot Studio meets the proof of concept evaluation criteria and source control is required for enterprise operations, promote the agent to a managed pipeline in Power Platform. This approach ensures the Copilot Studio orchestrator remains the primary mechanism for maintaining agent behavior.

Orchestration and the agent harness

The orchestrator, or harness, is the runtime loop that plans steps, selects and invokes tools, manages the context window, and decides when a task is complete. It's the single largest driver of both response quality and operational cost, because it controls how many model turns occur, how much context accumulates on each turn, and how tool results are fed back into the model.

Because the host platform supplies the orchestrator, the host decision largely fixes your cost and latency envelope:

  • Microsoft 365 Copilot provides a managed orchestrator. You get predictable, license-included cost and consistent behavior, with limited control over the loop itself.
  • Copilot Studio provides configurable orchestration (for example, topics and generative orchestration). Cost ranges from license-based to consumption-based depending on how much generative work you delegate to the model.
  • Azure and pro-code give you full control over the loop. Evaluate the cost of code maintenance compared to leveraging a well maintained harness or SDK like Copilot SDK.

When the host exposes them, the key orchestration levers are:

  • Turn budget. Cap or tune how many planning and tool-call iterations the orchestrator can take before returning.
  • Parallel versus sequential tool calls. Running independent tool calls concurrently reduces latency; consolidating them reduces turns.
  • Context management. Trimming, summarizing, or windowing the conversation prevents context from growing unbounded, which keeps per-turn token cost flat instead of compounding.
  • Caching. Reusing cached prompt prefixes across turns or sessions avoids re-billing for stable context.

Note

A more capable orchestrator can raise quality and cost at the same time. Match the orchestration sophistication to the task: a simple lookup agent doesn't need multi-step generative planning, and paying for it inflates cost without improving outcomes.

Model choice

The model you choose affects the per-token cost and latency, and it's largely independent of the authoring method. Larger reasoning models deliver higher-quality results on complex tasks but cost more per token and respond more slowly. Match the model to the task difficulty instead of defaulting to the most capable option for every task.

Architect for model routing when the host supports it:

  • Reserve frontier reasoning models for genuinely hard steps, such as ambiguous reasoning, synthesis, or open-ended generation.
  • Route deterministic or simple subtasks like classification, extraction, formatting, and routing decisions to smaller, cheaper, and faster models.
  • Mix models within a single agent when the orchestrator supports per-step model selection, so each step pays only for the capability it needs.

The host platform determines which models are in the catalog, whether you can route per step, the maximum context window (larger windows allow more context but cost more per turn), and whether prompt caching is available. Validate these capabilities as part of the host decision, because they cap what model-level cost optimization you can perform later.

Architecting for cost optimization

Beyond picking a host, orchestrator, and model, how you structure an agent's instructions and actions has a direct, recurring cost impact. Two principles guide cost-efficient design:

  1. Don't pay model inference for work that's deterministic. Bundle deterministic actions into scripts, actions, or connectors rather than describing them as natural-language instructions the model must interpret on every run. Code executes once, cheaply, with predictable output and no token cost or variability. Reasoning through the same procedure in natural language pays inference every time and risks inconsistent results.

  2. Don't pay standing token cost for instructions you rarely use. Preloaded agent-level instructions are billed on every turn of every interaction, even when they're irrelevant to the user's request. Loading guidance and knowledge on demand, only when the task matches, means you pay for that context when it's actually used, not continuously. This progressive-disclosure pattern keeps the baseline cost of each interaction low.

The following table summarizes when to preload instructions into the agent versus when to push work into deterministic scripts or on-demand resources.

Preload agent-level instructions when… Use scripts, actions, or on-demand resources when…
The behavior applies to nearly every interaction (core role, tone, safety guardrails). The behavior is task-specific or only occasionally relevant.
The guidance is short and always relevant. The guidance is long, or backed by large reference or knowledge material.
The model genuinely needs to reason about or adapt the behavior. The action is deterministic, repeatable, and has a well-defined output.
Latency of an extra retrieval or tool call would hurt the experience. Token cost of carrying the context on every turn outweighs an occasional load.

In practice, a cost-efficient agent keeps its always-on instructions minimal and focused on identity and safety, expresses fixed procedures as scripts or actions, and exposes specialized knowledge and task-specific guidance as on-demand resources that load only when relevant. The result is lower per-interaction token cost, more predictable behavior, and a smaller, easier-to-maintain core prompt - without sacrificing capability.

Next step

Learn how to measure agent quality, validate performance across diverse scenarios, and ensure operational readiness before deployment by using evaluation frameworks.