
Sep 17, 2025

Justin Trugman
Cofounder & Head of Technology
BetterFutureLabs
When building multi-agent systems with AG2, you'll often need to perform multiple independent tasks simultaneously. Instead of running agents sequentially, parallel execution can dramatically reduce processing time and improve user experience. This guide shares practical lessons and best practices we've learned while implementing parallel execution in production systems.
Why Parallel Execution Matters
Consider a real-world scenario where you're building a medical diagnosis support system. Your system needs multiple specialized multi-agent teams simultaneously analyzing different aspects of a patient case:
Symptom pattern analysis team: Correlating symptoms with medical history, identifying differential diagnoses, and assessing symptom severity patterns (8-12 minutes)
Medical literature review team: Synthesizing recent research, reviewing treatment protocols, and identifying evidence-based recommendations (10-15 minutes)
Drug interaction analysis team: Checking medication safety, identifying contraindications, and assessing drug-drug interactions with current medications (6-9 minutes)
Imaging analysis team: Interpreting radiology results, detecting anomalies, and correlating findings with clinical presentation (7-10 minutes)
Treatment planning team: Developing therapy recommendations, creating treatment timelines, and considering patient-specific factors (9-12 minutes)
Each team must reach internal consensus through multi-agent discussions before contributing to the final diagnostic assessment.
Sequential execution: 40-58 minutes total (each team waits for the previous to complete)
Parallel execution: 10-15 minutes total (limited by the longest-running team)
This 70-75% reduction in processing time transforms diagnosis from a lengthy batch process into a near real-time clinical decision support tool, potentially improving patient outcomes through faster, more comprehensive analysis.
Core Implementation Pattern
We've found success using Python's concurrent.futures.ThreadPoolExecutor
as the foundation for parallel agent execution. This approach provides several key advantages:
Why ThreadPoolExecutor works well for AG2 Multi-Agent Systems:
Perfect for AG2 agent operations: AG2 agents spend time waiting for LLM API calls, tool operations, web requests, etc
Shared memory space: Agents can access the same configuration objects and shared utilities without complex inter-process communication
Resource efficiency: Lower overhead than full process creation while still providing true parallelism for I/O operations
Exception handling: Clean error propagation and debugging
How it works: ThreadPoolExecutor creates a pool of worker threads that can execute functions concurrently. When you submit a task, it gets assigned to an available worker thread. The key insight is that while one thread waits for an API response, other threads can continue processing their own tasks.
Each run_diagnostic_team
call creates an independent set of AG2 agents (medical specialist, clinical reviewer, supervisor) that collaborate on their specific analysis without interfering with other parallel teams. The concurrent.futures.wait()
function blocks until all teams have completed their work, ensuring synchronization before proceeding to further diagnostic synthesis.
Agent Factory Pattern
For this example, we're using an Agent Factory Pattern to showcase the power of parallel execution. Rather than sharing agent instances across parallel tasks, each parallel execution creates completely fresh agent instances. This prevents state contamination and ensures true independence.
Key principles of the factory pattern:
Fresh instances: Every parallel execution gets brand new agent objects
Unique naming: Prevents agent name conflicts across parallel executions
Isolated configuration: Each team gets its own tools, instructions, and assistant IDs
Clean state:
overwrite_instructions=True
ensures no state leakage between executions
Common Pitfalls to Avoid
Race Conditions
The most dangerous trap in parallel execution is sharing mutable state between agents. When multiple threads modify the same data structure simultaneously, you get unpredictable behavior and data corruption. This is especially problematic with global dictionaries, shared configuration objects, or any state that gets modified during execution.
Resource Exhaustion
Creating unlimited threads will overwhelm your system and degrade performance for all tasks. Each thread consumes memory and system resources. Too many threads also increase context switching overhead, making everything slower rather than faster.
Ignoring Failures
In parallel execution, individual tasks will fail. Network timeouts, API rate limits, and processing errors are inevitable. If you don't handle these gracefully, one failed task can crash your entire parallel workflow, wasting all the work completed by successful tasks.
Production Implementation Tips
Error Handling
Robust error handling is essential for production parallel execution. You need to capture and log failures without stopping the entire workflow. The key is to collect both successful results and error information, then decide how to handle partial failures based on your business requirements.
Resource Management
The default ThreadPoolExecutor worker count is min(32, (os.process_cpu_count() or 1) + 4)
. This formula preserves at least 5 workers while avoiding excessive resource usage on many-core machines. For more details, see the Python ThreadPoolExecutor documentation.
Best Practices for Production Systems
After implementing the core patterns, follow these production-ready guidelines:
1. Design for Complete Independence
✅ Do: Use separate data sources and storage for each parallel task
✅ Do: Create unique agent names and IDs to avoid conflicts
❌ Don't: Share mutable state between parallel agents
❌ Don't: Create dependencies between parallel tasks
❌ Don't: Reuse agent instances across parallel executions
2. Intelligent Resource Management
Calculate optimal workers: Use Python's recommended formula
min(32, (os.process_cpu_count() or 1) + 4
for I/O-bound agent operationsImplement appropriate timeouts: Benchmark your specific agent tasks to determine realistic timeout values, then add a buffer for variability
Monitor system resources: Track memory and API rate limits to prevent resource exhaustion
3. Comprehensive Error Handling Strategy
Graceful degradation: Continue processing other tasks when some fail
Detailed logging: Log start times, completion status, and error details for each parallel task
4. Performance Monitoring and Optimization
Whatever monitoring tool or logging system you're using, these are valuable metrics to track for parallel agent execution:
Key metrics to track:
Total execution time: Compare sequential vs parallel performance to measure improvement
Individual task duration: Identify bottleneck tasks that may need optimization or additional resources
Resource utilization: Monitor CPU, memory, and API rate limits to prevent exhaustion
Success/failure rates: Track reliability trends over time to identify patterns in task failures
Parallel efficiency: Measure how well you're utilizing available resources compared to sequential execution
API quota consumption: Track LLM API usage across parallel tasks to avoid rate limiting
Implementation Summary
Parallel agent execution with AG2 transforms research and analysis applications from sequential batch operations into concurrent workflows. The patterns outlined in this guide provide a production-ready foundation for scaling multi-agent systems.
Recommended implementation approach:
Start simple: Begin with basic ThreadPoolExecutor patterns for independent tasks
Add sophistication gradually: Implement dynamic allocation as system complexity grows
Monitor extensively: Use comprehensive metrics to identify optimization opportunities4.
Design for failures: Assume tasks will fail and build resilience from the ground up
Expected outcomes when properly implemented:
Significant reduction in processing time for multi-task workflows
Improved user experience through concurrent execution
Better resource utilization with intelligent task allocation
Higher system reliability through graceful error handling
By applying proper resource management patterns and designing for independence, you can build systems that scale horizontally while maintaining the collaborative intelligence that makes multi-agent systems effective.
Parallel execution fundamentally changes how users interact with multi-agent systems. Instead of waiting for sequential processing, users get concurrent analysis across multiple specialized domains. Start with these patterns, measure your results, and iterate based on your specific use case requirements.