Post

Multi-Agent Orchestration Patterns: Building Collaborative AI Teams

Multi-Agent Orchestration Patterns: Building Collaborative AI Teams

Microsoft just shipped Agent Framework RC — merging AutoGen and Semantic Kernel into one SDK. Production teams are running 5+ specialist agents in parallel for tasks that took a single agent 45 minutes. The question isn’t whether to go multi-agent — it’s which orchestration pattern fits your problem.

Here’s the thing: there are exactly five patterns that cover 95% of multi-agent use cases. Each has clear tradeoffs. This post gives you the pattern catalog, full code examples, and a decision framework — all built on Microsoft Agent Framework v1.0.0rc1.


Bottom Line First

Before diving into implementation, here’s the full pattern catalog in one place. Bookmark this table.

PatternBest ForKey TradeoffMAF Class
SequentialOrdered pipelines, document processing, ETLSimple but no parallelism; one bottleneck stalls allSequentialBuilder
ConcurrentIndependent subtasks, parallel research, batch jobsFast but requires merging divergent outputsConcurrentBuilder
HandoffRole-based routing, triage, escalation chainsClean specialization; inter-agent latency adds upHandoffBuilder
Group ChatCollaborative reasoning, multi-perspective review, debateRich but expensive; token costs scale with agent countGroupChatBuilder
MagenticDynamic task decomposition, unknown-shape problemsMost powerful; hardest to debug and constrainMagenticBuilder

Pick the simplest pattern that solves your problem. Group Chat and Magentic are tempting — they’re also the patterns most teams over-apply.


When a Single Agent Isn’t Enough

Single agents work. Until they don’t.

The ceiling appears in three places. First, context window saturation: a long-running research or coding task accumulates tool outputs, intermediate reasoning, and retry traces until you’re burning 90% of your context on history rather than thinking. Second, tool sprawl: once you attach 15+ tools to a single agent, retrieval quality degrades — the model struggles to select the right tool and starts hallucinating tool calls that don’t exist [see Building ReAct Agents with Microsoft Agent Framework]. Third, error compounding: in a 20-step ReAct chain, a mistake at step 4 propagates silently. By step 18 you have a confident, coherent, completely wrong result.

Here’s the thing: decomposition fixes all three problems at once. Smaller agents carry smaller context, own fewer tools, and fail in bounded, observable ways.

stateDiagram-v2
    state "❌ Overloaded Single Agent" as bad {
        [*] --> Overwhelmed
        Overwhelmed --> ToolConfusion : 20+ tools attached
        Overwhelmed --> ContextSaturation : 100k+ token history
        ToolConfusion --> ErrorCompounding
        ContextSaturation --> ErrorCompounding
        ErrorCompounding --> SilentFailure
        SilentFailure --> [*]
    }

    state "✅ Decomposed Multi-Agent" as good {
        [*] --> Orchestrator
        Orchestrator --> ResearchAgent : focused context
        Orchestrator --> WritingAgent : 5-8 tools max
        Orchestrator --> ReviewAgent : clean handoff
        ResearchAgent --> Orchestrator : structured output
        WritingAgent --> Orchestrator : structured output
        ReviewAgent --> [*] : validated result
    }

The Planning Pattern for AI Agents addresses the orchestration side of this — how an agent reasons about decomposition before acting. Multi-agent architecture is the runtime manifestation of that same principle: plan at the system level, execute at the agent level.

This isn’t theoretical. In How to Get the Best from Claude Code Teams, you can see exactly how multi-agent coordination plays out in a real developer workflow — specialist agents owning scoped tasks, results merged by a team lead. The pattern holds whether you’re orchestrating Claude Code subagents or MAF agents on Azure.

When your ReAct trace exceeds 30 steps or your tool list exceeds 12 items, stop tuning the single agent. Decompose.


The Microsoft Agent Framework (What & Why Now)

For years, the Python multi-agent ecosystem was split. AutoGen owned the conversational multi-agent space. Semantic Kernel owned enterprise orchestration and plugin management. Teams running both SDKs maintained two dependency trees, two agent abstractions, two sets of idioms.

That’s over. Microsoft merged both projects into a single unified SDK: Microsoft Agent Framework [1]. AutoGen and Semantic Kernel are now in maintenance mode — bug fixes only, no new features. MAF is where the investment goes.

The RC landed on February 19, 2026. GA is imminent.

1
pip install agent-framework --pre

Core Concepts

The framework’s object model is deliberately flat:

ConceptRole
AgentEncapsulates a model client + system prompt + tool set
WorkflowDefines the orchestration pattern (sequential, concurrent, handoff, group chat)
SessionManages shared message history and state across agents
ToolsPython callables or MCP-connected tool servers
MCPNative Model Context Protocol support for external tool ecosystems
A2AAgent-to-Agent protocol for cross-framework interoperability

Model support is broad by design: Azure OpenAI, OpenAI, Anthropic, Amazon Bedrock, and Ollama all work through a unified ModelClient interface. You swap providers by changing one constructor argument.

graph TB
    subgraph Workflow["Workflow (Orchestration Layer)"]
        A1[Agent: Researcher]
        A2[Agent: Writer]
        A3[Agent: Reviewer]
    end

    subgraph Models["Model Clients"]
        M1[Azure OpenAI]
        M2[Anthropic]
        M3[Ollama]
    end

    subgraph Tooling["Tool Layer"]
        T1[Python Tools]
        T2[MCP Servers]
        T3[A2A Agents]
    end

    Session["Session (Shared State)"] --> Workflow
    A1 --> M1
    A2 --> M2
    A3 --> M3
    A1 --> T1
    A2 --> T2
    A3 --> T3

The architecture matters for a practical reason: the model client is decoupled from the agent. You can run a high-capability model for your orchestrator and a cheaper model for subtask agents without changing any orchestration logic. That’s a significant cost lever in production.

Here’s the thing: MAF isn’t just AutoGen with a new name. The Semantic Kernel lineage brings production-grade plugin management, enterprise auth, and telemetry hooks that AutoGen never had. The merger gives you conversational agent flexibility and enterprise deployment robustness in one package [2].

For a broader view of where MAF sits in the framework landscape, see Top Agentic AI Frameworks.

RC means production-ready for evaluation, not yet for regulated workloads. Pin your version (agent-framework==1.0.0rc1) and watch the GA release notes for breaking changes in the Workflow API surface.


Pattern 1 — Sequential Pipeline

The sequential pipeline is the simplest orchestration pattern and the right default when your workflow has a defined order of operations. Think: write → review → edit, translate → verify → format, generate → test → fix. Each stage depends on the output of the previous one.

The defining characteristic: every agent in the chain receives the full conversation history from all prior agents. The reviewer sees what the writer produced. The editor sees both the original draft and the reviewer’s critique. Nothing is lost between steps.

stateDiagram-v2
    direction LR
    [*] --> Writer : Initial prompt
    Writer --> Reviewer : Draft output
    note right of Writer
        Full history: [system prompt + user input]
    end note
    Reviewer --> Editor : Reviewed draft
    note right of Reviewer
        Full history: [Writer output + review request]
    end note
    Editor --> [*] : Final output
    note right of Editor
        Full history: [Draft + Review + edit request]
    end note

    style Writer fill:#27ae60,color:#fff
    style Reviewer fill:#2980b9,color:#fff
    style Editor fill:#8e44ad,color:#fff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
from agent_framework.orchestrations import SequentialBuilder

# Create specialist agents
writer = chat_client.as_agent(
    name="writer",
    instructions="You are a concise technical copywriter. Write clear, engaging content."
)
reviewer = chat_client.as_agent(
    name="reviewer",
    instructions="You are a thorough code reviewer. Check for accuracy, clarity, and completeness."
)
editor = chat_client.as_agent(
    name="editor",
    instructions="You are a final editor. Polish the text for publication quality."
)

# Build sequential pipeline
workflow = SequentialBuilder(participants=[writer, reviewer, editor]).build()

# Run and stream results
async for event in workflow.run_stream("Write a technical summary of transformer attention mechanisms."):
    if event.type == "output":
        for msg in event.data:
            print(f"[{msg.author_name}] {msg.text}")

Tradeoffs:

StrengthWeakness
Simple, predictable, easy to debugSequential bottleneck — no parallelism
Full context preserved at every stageTotal latency = sum of all agent latencies
Auditability — clear chain of custodyOverkill for single-step tasks

✅ Use sequential pipelines when: order matters, each step genuinely depends on the prior output, and you need a clear audit trail of what each agent contributed.


Pattern 2 — Concurrent (Fan-Out / Fan-In)

When your subtasks are independent of each other, running them sequentially is pure waste. The concurrent pattern fans out the same input to multiple agents simultaneously, then fans the results back in to an aggregator.

Classic use cases: multi-source research (analyst + lawyer + marketer all reading the same brief), parallel code review from security/performance/readability perspectives, or generating content variants simultaneously.

graph TB
    Input["User Input"] --> Researcher
    Input --> Marketer
    Input --> Legal

    Researcher["Researcher\n(market trends)"] --> Aggregator
    Marketer["Marketer\n(positioning)"] --> Aggregator
    Legal["Legal\n(compliance)"] --> Aggregator

    Aggregator["Synthesizer\n(executive brief)"] --> Output["Final Output"]

    style Input fill:#27ae60,color:#fff
    style Aggregator fill:#e67e22,color:#fff
    style Output fill:#2980b9,color:#fff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
from agent_framework.orchestrations import ConcurrentBuilder

# Specialist analysts
researcher = chat_client.as_agent(
    name="researcher",
    instructions="You are a market researcher. Analyze market trends and competitive landscape."
)
marketer = chat_client.as_agent(
    name="marketer",
    instructions="You are a marketing strategist. Identify positioning opportunities and messaging."
)
legal = chat_client.as_agent(
    name="legal",
    instructions="You are a compliance analyst. Flag regulatory risks and IP concerns."
)

# Custom aggregator function
async def synthesize_results(responses: list) -> str:
    summary_agent = chat_client.as_agent(
        name="synthesizer",
        instructions="Combine these independent analyses into a unified executive brief."
    )
    combined = "\n\n".join(f"**{r.author_name}**: {r.text}" for r in responses)
    result = await summary_agent.run(f"Synthesize these analyses:\n{combined}")
    return result

# Build concurrent workflow
workflow = (
    ConcurrentBuilder()
    .participants([researcher, marketer, legal])
    .with_aggregator(synthesize_results)
    .build()
)

async for event in workflow.run_stream("We are launching a new AI-powered code review tool."):
    if isinstance(event, WorkflowOutputEvent):
        print(event.data)

Your aggregator is load-bearing. A weak synthesizer will lose information from the parallel agents or introduce contradictions. Invest in the aggregator’s instructions — it’s doing the hardest reasoning in the entire pipeline.

Tradeoffs:

StrengthWeakness
Wall-clock latency = slowest single agentHigher token cost (N agents × input tokens)
Naturally parallel, scales horizontallyNo inter-agent communication during execution
Great isolation — agents can’t corrupt each otherAggregation quality determines final output quality

✅ Use concurrent patterns when: subtasks are genuinely independent, speed matters more than token cost, and you can write a meaningful aggregation strategy.


Pattern 3 — Handoff (Dynamic Routing)

The handoff pattern is what you reach for when you can’t predict the path through your workflow ahead of time. Routing decisions happen at runtime, based on conversation context. The canonical example: customer support. You don’t know whether an incoming message needs order tracking, a return label, or a refund until you read it.

The critical distinction from other patterns: in a handoff, the receiving agent takes full ownership of the conversation. This is not a tool call where the caller gets control back. When the triage agent hands off to the refund specialist, the triage agent is done. The refund specialist owns the thread from that point forward [1].

graph LR
    User["Customer Message"] --> Triage

    Triage["Triage Agent"] -->|order inquiry| OrderAgent["Order Specialist"]
    Triage -->|return request| ReturnAgent["Return Specialist"]

    ReturnAgent -->|refund needed| RefundAgent["Refund Specialist"]
    RefundAgent -->|escalate| Triage

    OrderAgent -->|resolved| Done["✓ Resolved"]
    ReturnAgent -->|resolved| Done
    RefundAgent -->|resolved| Done

    style Triage fill:#27ae60,color:#fff
    style RefundAgent fill:#e74c3c,color:#fff
    style Done fill:#2980b9,color:#fff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
from agent_framework.orchestrations import HandoffBuilder
from agent_framework import tool

@tool(approval_mode="always_require")
def process_refund(order_id: str, amount: float) -> str:
    """Process a customer refund. Requires human approval."""
    # In production: call payment API
    return f"Refund of ${amount:.2f} processed for order {order_id}"

# Define specialist agents
triage = chat_client.as_agent(
    name="triage",
    instructions="You are a customer support triage agent. Determine the customer's need and route to the appropriate specialist."
)
order_agent = chat_client.as_agent(
    name="order_specialist",
    instructions="You handle order status inquiries. Check order details and provide updates."
)
return_agent = chat_client.as_agent(
    name="return_specialist",
    instructions="You handle product returns. Guide customers through the return process."
)
refund_agent = chat_client.as_agent(
    name="refund_specialist",
    instructions="You process refunds after returns are confirmed.",
    tools=[process_refund]
)

# Build handoff workflow
workflow = (
    HandoffBuilder(
        name="customer_support",
        participants=[triage, order_agent, return_agent, refund_agent],
        termination_condition=lambda conv: any(
            "resolved" in msg.text.lower() for msg in conv[-3:]
        ),
    )
    .with_start_agent(triage)
    .add_handoff(triage, [order_agent, return_agent])
    .add_handoff(return_agent, [refund_agent])
    .build()
)

# Interactive mode — emits request_info when human input needed
async for event in workflow.run_stream("I want to return my broken headphones"):
    if event.type == "request_info":
        user_input = input(f"Agent needs input: ")
        await workflow.provide_input(user_input)
    elif event.type == "output":
        for msg in event.data:
            print(f"[{msg.author_name}] {msg.text}")

The approval_mode="always_require" on process_refund is your human-in-the-loop gate. The workflow pauses and emits a request_info event before any money moves. This is the pattern for any irreversible action — payments, database writes, external API calls — where a human sign-off is non-negotiable.

Always set explicit termination conditions in Handoff workflows. Without them, agents can bounce requests indefinitely — triage hands to return specialist, return specialist re-triages, repeat until token budget is exhausted.

Tradeoffs:

StrengthWeakness
Flexible — routing adapts to conversation contextNon-deterministic — hard to unit test exhaustively
Natural conversation flow across specializationsLoop risk without tight termination conditions
HITL approval gates on sensitive actionsDebugging requires full conversation replay

✅ Use handoff patterns when: routing logic depends on conversation context you can’t enumerate upfront, you need human approval gates on high-stakes actions, or your workflow is fundamentally conversational rather than procedural.


Pattern 4 — Group Chat

When to use: brainstorming sessions, design reviews, collaborative problem-solving where multiple simultaneous perspectives improve the outcome. Group Chat shines when no single agent has the full picture.

In this pattern, all agents share a single conversation space — every message is visible to every participant. A group manager (either rule-based or LLM-driven) determines who speaks next, preventing agents from talking over each other while preserving the richness of multi-agent dialogue.

graph TB
    Manager["Group Manager\n(Speaker Selection)"]
    Shared["Shared Context\n(All Messages Visible)"]
    Arch["Architect"]
    Sec["Security Reviewer"]
    Perf["Performance Engineer"]

    Manager -->|"Selects next speaker"| Arch
    Manager -->|"Selects next speaker"| Sec
    Manager -->|"Selects next speaker"| Perf
    Arch <-->|"reads & writes"| Shared
    Sec <-->|"reads & writes"| Shared
    Perf <-->|"reads & writes"| Shared

    style Manager fill:#27ae60,color:#fff
    style Shared fill:#2980b9,color:#fff

Here’s a full 3-agent design review in action:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from agent_framework.orchestrations import GroupChatBuilder

architect = chat_client.as_agent(
    name="architect",
    instructions="You are a senior software architect. Evaluate system design for scalability, maintainability, and correctness."
)
security = chat_client.as_agent(
    name="security_reviewer",
    instructions="You are a security engineer. Identify vulnerabilities, auth gaps, and data exposure risks."
)
perf = chat_client.as_agent(
    name="performance_engineer",
    instructions="You are a performance engineer. Analyze for bottlenecks, latency issues, and resource efficiency."
)

workflow = (
    GroupChatBuilder(
        participants=[architect, security, perf],
        max_rounds=6,
        speaker_selection="llm",  # LLM-driven turn selection
    )
    .build()
)

async for event in workflow.run_stream(
    "Review this design: A REST API with JWT auth, PostgreSQL backend, Redis cache, and S3 for file storage."
):
    if event.type == "output":
        for msg in event.data:
            print(f"[{msg.author_name}] {msg.text}")

Always set max_rounds explicitly in Group Chat. Without it, agents will keep debating until token limits are hit — and LLM-driven speaker selection will happily let them.

Tradeoffs:

StrengthWeakness
Rich multi-perspective analysisToken cost grows O(n × rounds) — all agents read all messages
Agents build on each other’s insightsSpeaker selection can be unpredictable
No rigid structure requiredHard to control conversation direction

✅ Use Group Chat when: you want diverse expert perspectives on the same problem, collaborative refinement matters more than efficiency, and there’s no clear sequential dependency between contributions.


Pattern 5 — Magentic (Hierarchical / Dynamic Planning)

Based on Microsoft Research’s MagenticOne system [3], this is the most sophisticated pattern in the toolkit. Unlike every other pattern, Magentic doesn’t require you to know the task decomposition upfront.

A dedicated Manager agent sits at the top of the hierarchy. It:

  1. Creates and maintains a dynamic task ledger
  2. Selects which specialist executes next based on current state
  3. Detects stalls — if an agent is stuck or producing garbage, the manager reroutes
  4. Synthesizes all specialist outputs into a final result

This self-correcting loop is what makes Magentic uniquely powerful for open-ended, complex tasks.

stateDiagram-v2
    [*] --> CreateLedger
    CreateLedger --> SelectSpecialist
    SelectSpecialist --> Execute
    Execute --> Evaluate
    Evaluate --> SelectSpecialist : More work needed
    Evaluate --> Synthesize : Task complete
    Synthesize --> [*]

    note right of CreateLedger
        Task ledger updated dynamically
        as new information emerges
    end note

    note right of Evaluate
        Stall detection triggers reroute
        to a different specialist
    end note

Here’s a full research pipeline using Magentic:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
from agent_framework.orchestrations import MagenticBuilder

# Specialist agents
web_searcher = chat_client.as_agent(
    name="web_searcher",
    instructions="You search the web for current information. Return structured findings with source URLs.",
    tools=[web_search_tool]
)
data_analyst = chat_client.as_agent(
    name="data_analyst",
    instructions="You analyze data and extract insights. Produce tables and statistical summaries.",
    tools=[python_executor]
)
report_writer = chat_client.as_agent(
    name="report_writer",
    instructions="You write clear, well-structured research reports with citations."
)

# The manager orchestrates dynamically
workflow = (
    MagenticBuilder(
        participants=[web_searcher, data_analyst, report_writer],
        manager_instructions="You are a research director. Break complex research questions into subtasks, assign to specialists, and synthesize findings into a final report.",
        max_rounds=15,
    )
    .build()
)

async for event in workflow.run_stream(
    "Research the current state of multi-agent AI frameworks: market leaders, adoption trends, and technical tradeoffs."
):
    if event.type == "output":
        for msg in event.data:
            print(f"[{msg.author_name}] {msg.text}")

How Magentic differs from the Planning Pattern: The Planning Pattern for AI Agents creates a fixed plan upfront — it reasons before acting and commits to a sequence. Magentic maintains a dynamic ledger that adapts mid-execution: if the web searcher returns unexpected data, the manager can re-order subsequent steps, insert new specialists, or abandon dead-end paths entirely. Planning is a blueprint; Magentic is an adaptive project manager.

The manager agent in Magentic adds a full LLM call per orchestration step. On a 15-round task with 3 specialists, you’re looking at 15 manager deliberations on top of specialist calls. Budget accordingly.

Tradeoffs:

StrengthWeakness
Handles unknown decompositionHighest token cost — manager overhead per step
Self-correcting via stall detectionSlowest pattern — manager deliberates each round
Adaptive replanning mid-executionHardest to debug — execution path is non-deterministic

✅ Use Magentic when: task decomposition is unknown upfront, you need adaptive replanning, and the task is complex enough to justify the coordination overhead.


Choosing Your Pattern — Decision Flowchart

Stop guessing. Use this flowchart to select the right pattern before you write a single line of orchestration code.

flowchart TD
    Start([New Multi-Agent Task]) --> A{"Is task decomposition\nknown upfront?"}

    A -->|Yes| B{"Do steps depend\non each other?"}
    A -->|No| C{"Is it conversational\nor routing-based?"}

    B -->|Yes| Sequential["Sequential\nOrdered pipeline,\naudit trail required"]
    B -->|No| Concurrent["Concurrent\nParallel execution,\nspeed over cost"]

    C -->|Yes| D{"Multiple specialists,\nbut one active at a time?"}
    C -->|No| Magentic["Magentic\nComplex open-ended,\nadaptive replanning"]

    D -->|Yes| Handoff["Handoff\nDynamic routing,\nhuman-in-the-loop"]
    D -->|No| GroupChat["Group Chat\nCollaborative analysis,\ndiverse perspectives"]

    style Sequential fill:#27ae60,color:#fff
    style Concurrent fill:#27ae60,color:#fff
    style Handoff fill:#27ae60,color:#fff
    style GroupChat fill:#27ae60,color:#fff
    style Magentic fill:#27ae60,color:#fff
    style Start fill:#2980b9,color:#fff

Pattern Selection Cheatsheet

Sequential — deterministic pipelines, auditability required, each step feeds the next

Concurrent — independent subtasks, speed over cost, results merged at the end

Handoff — dynamic routing, human-in-the-loop, one specialist active at a time

Group Chat — collaborative analysis, diverse perspectives, no clear sequential dependency

Magentic — complex open-ended tasks, unknown decomposition, self-correcting execution required

When in doubt, start simpler than you think you need. A Sequential pipeline you can reason about and debug is worth more in production than a Magentic workflow that burns tokens unpredictably. Reach for Magentic only when the task genuinely resists upfront decomposition — which is rarer than it looks.


Production Concerns

Error Handling & Recovery

Every multi-agent pattern fails differently, and knowing the failure mode determines your recovery strategy.

Sequential pipelines halt completely when any agent fails — if the reviewer crashes, the editor never runs. Concurrent patterns produce partial results — some branches complete, others don’t, and your aggregator must handle missing inputs gracefully. Handoff patterns get stuck when a routing decision points to an agent that’s unavailable or returns an unexpected result.

The Microsoft Agent Framework addresses this with durable checkpointing. State is serialized after each agent completes, so a failure mid-pipeline doesn’t mean starting over.

1
2
3
4
5
6
7
8
9
from agent_framework.checkpoints import FileCheckpointStorage

storage = FileCheckpointStorage(path="./checkpoints")
workflow = SequentialBuilder(
    participants=[writer, reviewer, editor]
).with_checkpoint_storage(storage).build()

# Resume from last checkpoint after failure
workflow = SequentialBuilder.restore_from_checkpoint(storage, workflow_id="abc-123")

In development, FileCheckpointStorage is sufficient. In staging, swap to Redis. In production, Cosmos DB gives you geo-redundancy and query capability for debugging. The API is identical across backends — just change the storage class.


Observability

Multi-agent systems are black boxes without instrumentation. You need to know which agent called which tool, how many tokens each consumed, and where latency spikes occurred.

MAF ships with native OpenTelemetry integration. Every agent call generates a trace span containing: agent name, model used, token counts (prompt + completion), tool calls made, and end-to-end latency. Spans are nested, so you see the full causal chain across agent boundaries.

1
2
3
4
5
6
7
from opentelemetry import trace
from agent_framework.telemetry import configure_telemetry

configure_telemetry(
    service_name="multi-agent-pipeline",
    export_endpoint="http://localhost:4317"  # OTLP endpoint
)

This works with Azure Monitor, Jaeger, Grafana Tempo, or any OTEL-compatible backend. In practice, the two metrics you’ll watch most are token usage per agent (for cost forecasting) and P95 latency per stage (for SLA compliance). Wire these up before going to production — debugging a multi-agent system without traces is a nightmare.


Security: Agent-to-Agent Trust

The attack surface of a multi-agent system scales with the number of agents and the tools they can access. A compromised or misbehaving agent can chain tool calls across the entire pipeline if permissions aren’t scoped correctly.

The foundational principle is least privilege: each agent gets only the tools it needs for its specific role. A research agent needs web search and document retrieval. It has no business touching a payment API or a database write endpoint.

1
2
3
4
5
researcher = chat_client.as_agent(
    name="researcher",
    tools=[web_search, document_retrieval],  # scoped — no write access
    instructions="You are a research agent. Search and summarize — never modify external systems."
)

The approval_mode="always_require" flag inserts a human-in-the-loop approval gate before any tool execution. Use this for sensitive operations: file writes, external API calls, anything with side effects.

For cross-organization agent communication, the A2A (Agent-to-Agent) protocol provides authenticated, auditable message passing — agents prove identity before routing is accepted.

Never give all agents access to all tools. A researcher agent has no business calling a payment API. Over-permissioned agents are the fastest path to a production incident.

For a full treatment of agent security including sandboxing and Zero Trust architecture, see Securing AI Agents with Zero Trust and Sandboxing.


Cost: Token Multiplication

Multi-agent systems don’t just use more compute — they multiply token usage in ways that surprise teams coming from single-agent architectures.

Sequential pattern: each downstream agent receives the original input plus all prior agents’ outputs. Total tokens ≈ Σ(input + all prior outputs) across N agents. A 3-agent sequential chain with 1K input tokens and 500-token outputs costs roughly 1K + 1.5K + 2K = 4.5K tokens — 4.5× the naive estimate.

Concurrent pattern: each agent processes only the original input, so total = N × input tokens + aggregation. More predictable, but scales linearly with agent count.

Group Chat: tokens grow O(agents × rounds × avg_message_length). Every new message is appended to the shared context, which every agent reads in full. This is the most expensive pattern by a significant margin. A 5-agent group chat running 10 rounds with 300-token average messages burns 15,000 tokens in context alone — before any actual work happens.

Magentic: the manager adds 15–30% overhead on top of specialist costs due to planning, routing decisions, and result aggregation passes.

Start with Sequential or Concurrent. Only move to Group Chat or Magentic when simpler patterns genuinely can’t solve the problem. Token costs compound fast in production.


Checkpointing & Long-Running Workflows

Not all workflows complete in seconds. Human-in-the-loop pipelines may pause for hours or days — a legal review agent hands off a draft, waits for a paralegal to approve, then continues. The process can’t hold an in-memory state object across that window.

MAF’s durable checkpointing handles this with full state serialization: agent outputs, message history, current pipeline position, and pending approvals are all written to the checkpoint store. The workflow can resume on a completely different machine.

Storage BackendUse Case
FileCheckpointStorageLocal development, single machine
RedisCheckpointStorageStaging, low-latency resumption
CosmosDBCheckpointStorageProduction, geo-redundancy + audit queries

For compliance workflows — KYC, legal contract review, financial approvals — checkpointing isn’t optional. You need an auditable, immutable record of which agent made which decision at what time. Cosmos DB checkpoints give you that plus queryability, so compliance teams can inspect workflow history without engineering intervention.


Resources

Framework & Documentation

Microsoft Announcements

Research


References

[1] Microsoft, “Introducing Microsoft Agent Framework,” Azure Blog, October 2025. https://azure.microsoft.com/en-us/blog/introducing-microsoft-agent-framework/

[2] Microsoft, “Agent Framework Documentation,” Microsoft Learn, February 2026. https://learn.microsoft.com/en-us/agent-framework/overview/

[3] Microsoft Research, “MagenticOne: A Generalist Multi-Agent System for Solving Complex Tasks,” 2024. https://www.microsoft.com/en-us/research/articles/magentic-one-a-generalist-multi-agent-system-for-solving-complex-tasks/

[4] Microsoft, “Microsoft Agent Framework Reaches Release Candidate,” Foundry DevBlog, February 2026. https://devblogs.microsoft.com/foundry/microsoft-agent-framework-reaches-release-candidate/

This post is licensed under CC BY 4.0 by the author.