Sep 17, 2025

Parallel Agent Execution with AG2: Best Practices Guide

Justin Trugman

Cofounder & Head of Technology

BetterFutureLabs

“

When building multi-agent systems with AG2, you'll often need to perform multiple independent tasks simultaneously. Instead of running agents sequentially, parallel execution can dramatically reduce processing time and improve user experience. This guide shares practical lessons and best practices we've learned while implementing parallel execution in production systems.

Why Parallel Execution Matters

Consider a real-world scenario where you're building a medical diagnosis support system. Your system needs multiple specialized multi-agent teams simultaneously analyzing different aspects of a patient case:

Symptom pattern analysis team: Correlating symptoms with medical history, identifying differential diagnoses, and assessing symptom severity patterns (8-12 minutes)
Medical literature review team: Synthesizing recent research, reviewing treatment protocols, and identifying evidence-based recommendations (10-15 minutes)
Drug interaction analysis team: Checking medication safety, identifying contraindications, and assessing drug-drug interactions with current medications (6-9 minutes)
Imaging analysis team: Interpreting radiology results, detecting anomalies, and correlating findings with clinical presentation (7-10 minutes)
Treatment planning team: Developing therapy recommendations, creating treatment timelines, and considering patient-specific factors (9-12 minutes)

Each team must reach internal consensus through multi-agent discussions before contributing to the final diagnostic assessment.

Sequential execution: 40-58 minutes total (each team waits for the previous to complete)

Parallel execution: 10-15 minutes total (limited by the longest-running team)

This 70-75% reduction in processing time transforms diagnosis from a lengthy batch process into a near real-time clinical decision support tool, potentially improving patient outcomes through faster, more comprehensive analysis.

Core Implementation Pattern

We've found success using Python's concurrent.futures.ThreadPoolExecutoras the foundation for parallel agent execution. This approach provides several key advantages:

Why ThreadPoolExecutor works well for AG2 Multi-Agent Systems:

Perfect for AG2 agent operations: AG2 agents spend time waiting for LLM API calls, tool operations, web requests, etc
Shared memory space: Agents can access the same configuration objects and shared utilities without complex inter-process communication
Resource efficiency: Lower overhead than full process creation while still providing true parallelism for I/O operations
Exception handling: Clean error propagation and debugging

How it works: ThreadPoolExecutor creates a pool of worker threads that can execute functions concurrently. When you submit a task, it gets assigned to an available worker thread. The key insight is that while one thread waits for an API response, other threads can continue processing their own tasks.

import concurrent.futures
  
# Define independent diagnostic analysis tasks
  diagnostic_tasks = [ 
    "symptom_pattern_analysis", 
    "literature_review", 
    "drug_interaction_check",
    "imaging_analysis", 
    "treatment_planning"
  ]
    
# Execute diagnostic teams in parallel
  with concurrent.futures.ThreadPoolExecutor() as executor: 
    futures = [ 
      executor.submit(run_diagnostic_team, team_id, analysis_type, patient_data) 
      for analysis_type in diagnostic_tasks 
      ] 
      concurrent.futures.wait(futures) # Wait for all teams to complete

Each run_diagnostic_team call creates an independent set of AG2 agents (medical specialist, clinical reviewer, supervisor) that collaborate on their specific analysis without interfering with other parallel teams. The concurrent.futures.wait() function blocks until all teams have completed their work, ensuring synchronization before proceeding to further diagnostic synthesis.

Agent Factory Pattern

For this example, we're using an Agent Factory Pattern to showcase the power of parallel execution. Rather than sharing agent instances across parallel tasks, each parallel execution creates completely fresh agent instances. This prevents state contamination and ensures true independence.

def run_diagnostic_team(team_id, analysis_type, patient_data): 
  """ 
  Factory function that creates a complete independent agent team 
  for a specific medical analysis task 
  """ 
  
  # Create fresh user proxy for this diagnostic team 
  user_proxy = UserProxyAgent( 
    name=f"coordinator_{analysis_type}_{team_id}", 
    is_termination_msg=lambda msg: "TERMINATE" in msg["content"],
    human_input_mode="NEVER", 
    max_consecutive_auto_reply=1, 
    code_execution_config=False 
  ) 
  
  # Create specialized medical analyst with unique configuration 
  medical_specialist = GPTAssistantAgent( 
    name=f"specialist_{analysis_type}_{team_id}", 
    instructions=medical_instructions[analysis_type], 
    overwrite_instructions=True, # Ensure clean state 
    overwrite_tools=True, 
    llm_config={ 
      "config_list": config_list, 
      "tools": diagnostic_tools[analysis_type], 
      "assistant_id": specialist_assistant_ids[analysis_type] 
    } 
  ) 
  
  # Register analysis-specific functions 
  medical_specialist.register_function( 
    function_map=diagnostic_functions[analysis_type] 
  ) 
  
  # Create complete diagnostic team and execute 
  team = [user_proxy, medical_specialist, clinical_reviewer, supervisor] 
  groupchat = ag2.GroupChat(agents=team, messages=[], max_round=15) 
  chat_manager = ag2.GroupChatManager(groupchat=groupchat, llm_config={"config_list": config_list}) 
  
  # Execute diagnostic analysis with patient context 
  user_proxy.initiate_chat(chat_manager, message=f"Perform {analysis_type} for patient: {patient_data}")

Key principles of the factory pattern:

Fresh instances: Every parallel execution gets brand new agent objects
Unique naming: Prevents agent name conflicts across parallel executions
Isolated configuration: Each team gets its own tools, instructions, and assistant IDs
Clean state: overwrite_instructions=Trueensures no state leakage between executions

Common Pitfalls to Avoid

Race Conditions

The most dangerous trap in parallel execution is sharing mutable state between agents. When multiple threads modify the same data structure simultaneously, you get unpredictable behavior and data corruption. This is especially problematic with global dictionaries, shared configuration objects, or any state that gets modified during execution.

# ❌ BAD: Shared mutable state
  shared_patient_analysis = {}
  def bad_diagnostic_process(analysis_type): 
    shared_patient_analysis[analysis_type] = perform_analysis(...) # Race condition!
      
# ✅ GOOD: Independent storage
  def good_diagnostic_process(analysis_type, patient_id): 
    result = perform_analysis(...) 
    save_to_file(f"diagnosis_{patient_id}_{analysis_type}.json", result)

Resource Exhaustion

Creating unlimited threads will overwhelm your system and degrade performance for all tasks. Each thread consumes memory and system resources. Too many threads also increase context switching overhead, making everything slower rather than faster.

# ❌ BAD: Unlimited worker
swith ThreadPoolExecutor() as executor: # Could create too many threads

# ✅ GOOD: Controlled resource usage
max_workers = min(len(tasks), 5)
with ThreadPoolExecutor(max_workers=max_workers) as executor

Ignoring Failures

In parallel execution, individual tasks will fail. Network timeouts, API rate limits, and processing errors are inevitable. If you don't handle these gracefully, one failed task can crash your entire parallel workflow, wasting all the work completed by successful tasks.

# ❌ BAD: No error handling
  futures = [executor.submit(task) for task in tasks]
  results = [f.result() for f in futures] # Will crash on any failure
    
# ✅ GOOD: Graceful error handling
for future in concurrent.futures.as_completed(futures): 
  try: 
    result = future.result(timeout=300) 
    handle_success(result) 
  except Exception as e: 
    handle_error(e)

Production Implementation Tips

Error Handling

Robust error handling is essential for production parallel execution. You need to capture and log failures without stopping the entire workflow. The key is to collect both successful results and error information, then decide how to handle partial failures based on your business requirements.

def execute_parallel_tasks(doc_id, tasks, document_data): 
  results = {} 
  errors = {}

with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: 
  future_to_task = { 
    executor.submit(process_document, doc_id, task, document_data): task 
    for task in tasks 
  } 
  
  for future in concurrent.futures.as_completed(future_to_task): 
    task = future_to_task[future] 
    try: 
      results[task] = future.result(timeout=300) 
      logging.info(f"Task {task} completed successfully") 
    except Exception as e: 
      errors[task] = str(e) 
      logging.error(f"Task {task} failed: {e}") 
      
return results, errors

Resource Management

The default ThreadPoolExecutor worker count is min(32, (os.process_cpu_count() or 1) + 4). This formula preserves at least 5 workers while avoiding excessive resource usage on many-core machines. For more details, see the Python ThreadPoolExecutor documentation.

# Calculate optimal worker count based on Python's official recommendation
import os
def get_optimal_workers(task_count): 
  # Following Python's default formula 
  default_workers = min(32, (os.process_cpu_count() or 1) + 4) 
  return min(task_count, default_workers)

Best Practices for Production Systems

After implementing the core patterns, follow these production-ready guidelines:

1. Design for Complete Independence

✅ Do: Use separate data sources and storage for each parallel task
✅ Do: Create unique agent names and IDs to avoid conflicts
❌ Don't: Share mutable state between parallel agents
❌ Don't: Create dependencies between parallel tasks
❌ Don't: Reuse agent instances across parallel executions

2. Intelligent Resource Management

Calculate optimal workers: Use Python's recommended formula min(32, (os.process_cpu_count() or 1) + 4 for I/O-bound agent operations
Implement appropriate timeouts: Benchmark your specific agent tasks to determine realistic timeout values, then add a buffer for variability
Monitor system resources: Track memory and API rate limits to prevent resource exhaustion

3. Comprehensive Error Handling Strategy

Graceful degradation: Continue processing other tasks when some fail
Detailed logging: Log start times, completion status, and error details for each parallel task

4. Performance Monitoring and Optimization

Whatever monitoring tool or logging system you're using, these are valuable metrics to track for parallel agent execution:

Key metrics to track:

Total execution time: Compare sequential vs parallel performance to measure improvement
Individual task duration: Identify bottleneck tasks that may need optimization or additional resources
Resource utilization: Monitor CPU, memory, and API rate limits to prevent exhaustion
Success/failure rates: Track reliability trends over time to identify patterns in task failures
Parallel efficiency: Measure how well you're utilizing available resources compared to sequential execution
API quota consumption: Track LLM API usage across parallel tasks to avoid rate limiting

Implementation Summary

Parallel agent execution with AG2 transforms research and analysis applications from sequential batch operations into concurrent workflows. The patterns outlined in this guide provide a production-ready foundation for scaling multi-agent systems.

Recommended implementation approach:

Start simple: Begin with basic ThreadPoolExecutor patterns for independent tasks
Add sophistication gradually: Implement dynamic allocation as system complexity grows
Monitor extensively: Use comprehensive metrics to identify optimization opportunities4.
Design for failures: Assume tasks will fail and build resilience from the ground up

Expected outcomes when properly implemented:

Significant reduction in processing time for multi-task workflows
Improved user experience through concurrent execution
Better resource utilization with intelligent task allocation
Higher system reliability through graceful error handling

By applying proper resource management patterns and designing for independence, you can build systems that scale horizontally while maintaining the collaborative intelligence that makes multi-agent systems effective.

Parallel execution fundamentally changes how users interact with multi-agent systems. Instead of waiting for sequential processing, users get concurrent analysis across multiple specialized domains. Start with these patterns, measure your results, and iterate based on your specific use case requirements.