Agent Selection

When multiple AI agents can perform a task, which one should you use? A coding agent might excel at implementation but struggle with analysis. A research agent might be thorough but slow. This tutorial shows you how to use Thompson Sampling for intelligent agent selection that learns from outcomes and improves over time.

What You Will Learn

Thompson Sampling is a contextual multi-armed bandit algorithm that:

Balances exploration vs. exploitation - Tries different agents while favoring known performers
Learns from outcomes - Updates beliefs after each task execution
Adapts to context - Selects differently based on task category

How Thompson Sampling Works

Each agent maintains a Beta distribution belief for each task category:

Agent "analyst" for "Analysis" tasks:
  Beta(alpha=15, beta=3)  ->  High success rate, confident

Agent "coder" for "Coding" tasks:
  Beta(alpha=8, beta=2)   ->  Good success rate, less experience

When selecting an agent:

Sample a random value from each agent’s Beta distribution for the task category
Select the agent with the highest sampled value
Execute the task with the selected agent
Update the belief based on the outcome (success or failure)

This naturally balances trying new agents (exploration) with using proven agents (exploitation).

Step 1: Configure Services

Add agent selection to your service registration:

services.AddAgentSelection(options => options
    .WithPrior(alpha: 2, beta: 2)  // Uninformative prior
    .WithCategories(
        TaskCategory.Analysis,
        TaskCategory.Coding,
        TaskCategory.Research,
        TaskCategory.Writing));

The prior Beta(2, 2) represents no initial bias - agents start with equal assumed performance.

Step 2: Register Agents

Define the available agents:

services.AddAgent("analyst", new AgentConfig
{
    Name = "Data Analyst",
    Capabilities = ["data-analysis", "visualization", "statistics"]
});

services.AddAgent("coder", new AgentConfig
{
    Name = "Software Developer",
    Capabilities = ["code-generation", "debugging", "refactoring"]
});

services.AddAgent("researcher", new AgentConfig
{
    Name = "Research Specialist",
    Capabilities = ["literature-review", "synthesis", "citations"]
});

Step 3: Select and Execute

Use the agent selector in your task routing:

public class TaskRouter
{
    private readonly IAgentSelector _selector;
    private readonly IAgentRegistry _agents;

    public TaskRouter(IAgentSelector selector, IAgentRegistry agents)
    {
        _selector = selector;
        _agents = agents;
    }

    public async Task<AgentResult> RouteTaskAsync(
        string taskDescription,
        CancellationToken ct)
    {
        // 1. Select agent via Thompson Sampling
        var selection = await _selector.SelectAgentAsync(new AgentSelectionContext
        {
            AvailableAgentIds = ["analyst", "coder", "researcher"],
            TaskDescription = taskDescription
        }, ct);

        // 2. Execute with selected agent
        var agent = _agents.Get(selection.SelectedAgentId);
        var result = await agent.ExecuteAsync(taskDescription, ct);

        // 3. Record outcome for learning
        await _selector.RecordOutcomeAsync(
            selection.SelectedAgentId,
            selection.TaskCategory,
            result.Success
                ? AgentOutcome.Succeeded(result.ConfidenceScore)
                : AgentOutcome.Failed(result.ErrorMessage),
            ct);

        return result;
    }
}

Task Categories

Tasks are classified into categories for context-aware selection. The library includes 7 predefined categories:

Category	Keywords	Example Tasks
Analysis	analyze, examine, evaluate, assess	”Analyze sales trends”
Coding	code, implement, debug, refactor	”Implement OAuth flow”
Research	research, investigate, explore	”Research competitor pricing”
Writing	write, draft, compose, document	”Write API documentation”
Data	data, transform, migrate, etl	”Transform CSV to JSON”
Integration	integrate, connect, api, webhook	”Connect Stripe API”
General	(fallback)	“Help with this task”

Custom Category Classification

Override default classification for domain-specific needs:

public class DomainFeatureExtractor : ITaskFeatureExtractor
{
    public TaskFeatures Extract(string taskDescription)
    {
        var lower = taskDescription.ToLowerInvariant();

        // Domain-specific classification
        if (lower.Contains("compliance") || lower.Contains("regulation"))
        {
            return new TaskFeatures(
                Category: TaskCategory.Analysis,
                Complexity: TaskComplexity.High,
                MatchedKeywords: ["compliance", "regulation"]);
        }

        if (lower.Contains("migration") || lower.Contains("upgrade"))
        {
            return new TaskFeatures(
                Category: TaskCategory.Data,
                Complexity: TaskComplexity.High,
                MatchedKeywords: ["migration"]);
        }

        // Fall back to default extraction
        return DefaultFeatureExtractor.Extract(taskDescription);
    }
}

// Register custom extractor
services.AddSingleton<ITaskFeatureExtractor, DomainFeatureExtractor>();

Belief Persistence

Development (In-Memory)

services.AddSingleton<IBeliefStore, InMemoryBeliefStore>();

Beliefs reset when the application restarts.

Production (PostgreSQL)

services.AddSingleton<IBeliefStore, PostgresBeliefStore>(sp =>
    new PostgresBeliefStore(sp.GetRequiredService<IDocumentSession>()));

Beliefs persist across restarts.

Belief Structure

public record AgentBelief(
    string AgentId,
    TaskCategory Category,
    int Alpha,      // Success count + prior
    int Beta,       // Failure count + prior
    int TotalTrials,
    DateTimeOffset LastUpdated);

// Example beliefs after training:
// Agent: analyst
//   Analysis: Beta(45, 5)   = 90% success, 50 trials
//   Coding:   Beta(3, 7)    = 30% success, 10 trials
//   Research: Beta(20, 4)   = 83% success, 24 trials

Recording Outcomes

After task execution, record the outcome to update beliefs:

// Success with confidence score
await selector.RecordOutcomeAsync(
    agentId: "analyst",
    category: TaskCategory.Analysis,
    outcome: AgentOutcome.Succeeded(confidenceScore: 0.92),
    ct);

// Failure with reason
await selector.RecordOutcomeAsync(
    agentId: "coder",
    category: TaskCategory.Coding,
    outcome: AgentOutcome.Failed("Syntax errors in generated code"),
    ct);

// Partial success
await selector.RecordOutcomeAsync(
    agentId: "researcher",
    category: TaskCategory.Research,
    outcome: AgentOutcome.Partial(completionRate: 0.7),
    ct);

Integration with Workflows

Use agent selection within workflow steps:

public class DelegateToAgent : IWorkflowStep<TaskState>
{
    private readonly IAgentSelector _selector;
    private readonly IAgentRegistry _agents;

    public DelegateToAgent(
        IAgentSelector selector,
        IAgentRegistry agents)
    {
        _selector = selector;
        _agents = agents;
    }

    public async Task<StepResult<TaskState>> ExecuteAsync(
        TaskState state,
        StepContext context,
        CancellationToken ct)
    {
        // Select best agent for the task
        var selection = await _selector.SelectAgentAsync(new AgentSelectionContext
        {
            AvailableAgentIds = state.AvailableAgents,
            TaskDescription = state.CurrentTask.Description
        }, ct);

        // Execute task
        var agent = _agents.Get(selection.SelectedAgentId);
        var result = await agent.ExecuteAsync(state.CurrentTask, ct);

        // Record outcome for learning
        await _selector.RecordOutcomeAsync(
            selection.SelectedAgentId,
            selection.TaskCategory,
            result.ToOutcome(),
            ct);

        return state
            .With(s => s.SelectedAgent, selection.SelectedAgentId)
            .With(s => s.TaskResult, result)
            .AsResult();
    }
}

Monitoring Performance

Query agent performance across categories:

var performance = await beliefStore.GetPerformanceReportAsync(ct);

// Output:
// Agent: analyst
//   Analysis: 90.0% (45/50 trials)
//   Research: 83.3% (20/24 trials)
//   Coding:   30.0% (3/10 trials)
//
// Agent: coder
//   Coding:   88.0% (44/50 trials)
//   Analysis: 45.0% (9/20 trials)

Use this data to:

Identify agent strengths and weaknesses
Decide when to add new agents
Monitor for performance degradation

The Selection Algorithm

For reference, here is how Thompson Sampling selects agents:

public class ThompsonSamplingSelector : IAgentSelector
{
    public async Task<AgentSelection> SelectAgentAsync(
        AgentSelectionContext context,
        CancellationToken ct)
    {
        // 1. Classify task
        var features = _featureExtractor.Extract(context.TaskDescription);

        // 2. Get beliefs for all available agents
        var beliefs = await _beliefStore.GetBeliefsAsync(
            context.AvailableAgentIds,
            features.Category,
            ct);

        // 3. Sample from each agent's Beta distribution
        var samples = beliefs.Select(b => new
        {
            b.AgentId,
            Sample = SampleBeta(b.Alpha, b.Beta)
        });

        // 4. Select agent with highest sample
        var selected = samples.MaxBy(s => s.Sample)!;

        return new AgentSelection(
            SelectedAgentId: selected.AgentId,
            TaskCategory: features.Category,
            SampledValue: selected.Sample,
            Features: features);
    }

    private double SampleBeta(int alpha, int beta)
    {
        return BetaDistribution.Sample(_random, alpha, beta);
    }
}

The random sampling from Beta distributions means:

Well-performing agents (high alpha, low beta) sample high values often
Uncertain agents (low trials) have high variance and get explored
Poor performers (low alpha, high beta) sample low values but occasionally get a chance

Key Points

Thompson Sampling automatically balances exploration and exploitation
Beliefs update after each task - the system learns continuously
Prior Beta(2, 2) provides an uninformative start - no initial bias
Task classification happens via keyword extraction or custom extractors
Beliefs persist across restarts with PostgreSQL storage
Performance improves as more tasks are executed
Works with any number of agents and task categories

Next Steps

You have completed the Guide tutorials. You now know how to:

Install and configure Strategos
Build sequential, branching, parallel, and iterative workflows
Incorporate human approvals
Use intelligent agent selection

For deeper understanding, explore:

Core Concepts - Theoretical foundations
API Reference - Complete API documentation
Examples - Full application examples