Build a Multi-Agent Due Diligence System with AG2 and TinyFish

Due diligence with AG2 and TinyFish

Due diligence on a company — the kind an investor, acquirer, or partner does before writing a check — takes 20 to 40 hours of manual research. You're digging into founders, funding rounds, press sentiment, tech stack, financials, and social signals. Each domain has its own sources, its own quirks, and its own set of pages that won't even load without a real browser.

A single AI agent can't do this well. It loses detail when juggling six research domains at once. It can't scrape JavaScript-rendered pages like Crunchbase or LinkedIn. And it works sequentially when most of these tasks are completely independent.

The answer isn't a bigger model. It's a team of agents — each one a specialist, working in parallel, coordinated by a simple pipeline. This is exactly what AG2 was built for.

The AG2 framework

Our open-source AG2 framework is for building systems where multiple AI agents collaborate. Instead of one monolithic prompt trying to do everything, you define specialized agents, give them tools, and let them work together.

Due diligence naturally splits into independent research domains — founders, funding history, press coverage, tech stack, financials, social signals. Each domain has its own sources and its own questions.

AG2 lets you mirror that structure in code: one agent per domain, all running in parallel. Each research domain gets its own agent with a focused prompt. A funding specialist doesn't need to think about press sentiment. A tech stack analyst doesn't care about investor names. Smaller, focused tasks produce better results than one agent trying to be an expert in everything. The cross-referencing and iterative follow-ups that a human team does naturally are harder — we'll talk about where AG2's orchestration can help with that later in the post.

The Pipeline at a Glance

The system runs five stages. Here's the high-level flow:

Pipeline at a glance: Company URL to Seed Crawler to Parallel Specialists to Validator to Synthesis to Q&A

The seed crawl must finish first — it discovers the company name, team pages, press pages, and job listings that the specialists need. After that, six specialists fan out in parallel. Once they all finish, a validator checks for gaps, then a synthesis agent writes the final report.

The Fan-Out Pattern

This is where AG2 really shines. Six independent agents run simultaneously, each scraping different sources and returning structured data:

Fan-out pattern: Seed Crawler to parallel specialists to Validator, Synthesis, and Q&A

Each specialist is an AG2 agent with access to TinyFish — a browser-as-a-service API that handles JavaScript rendering, navigation, and data extraction. The agent decides which URLs to scrape and what to look for. TinyFish does the heavy lifting of actually rendering and reading the pages.

How Each Agent Works

Every specialist follows the same pattern:

Agent orchestration: Orchestrator sends task to AG2 Agent Pair, which calls TinyFish for scraping and returns structured JSON

The orchestrator sends a task. The agent decides what to scrape (guided by its system prompt), calls TinyFish one or more times, and returns structured JSON. AG2 handles the conversation loop and tool execution — you don't need to manage that yourself.

What Each Stage Does

1. Seed Crawler

Takes a company URL and builds an initial profile: company name, description, team page URLs, press page URLs, job listings, and funding mentions. Everything downstream depends on this context.

2. Six Parallel Specialists

Each specialist gets the seed profile and focuses on one research domain:

Specialist	What it researches	Example sources
Founders & Team	Founder backgrounds, executive team, headcount	LinkedIn, company team page
Investors & Funding	Funding rounds, investors, valuations	Crunchbase, company site
Press Coverage	Media mentions, sentiment analysis	TechCrunch, news sites
Financials	Revenue, market cap, key metrics	SEC filings, financial databases
Tech Stack	Frontend, backend, infrastructure	Job postings, BuiltWith, GitHub
Social Signals	Social media presence, community size	Twitter/X, LinkedIn, GitHub

All six run concurrently. If one fails — a scrape times out, an API errors — the pipeline catches the exception, records it, and continues. The other five specialists still contribute to the final report.

3. Validator

A tool-free agent that reviews all collected data for:

Contradictions between sources
Missing critical fields that should have been found
Low-confidence data that needs a second look

The validator doesn't try to fix problems — it surfaces them. This keeps the pipeline honest and gives downstream consumers a clear picture of what's trustworthy and what needs a second look.

4. Synthesis

A "senior analyst" agent reads everything — all specialist outputs plus validation notes — and writes a structured markdown report. It doesn't paper over gaps; it calls them out. For example, a report might note missing founder details, limited press coverage, or no disclosed financials.

5. Interactive Q&A

After the report is generated, you can ask follow-up questions. The Q&A agent has a tool to read individual files from the output directory, so it loads data on demand rather than stuffing everything into context.

The Full Pipeline Flow

Here's the complete end-to-end picture showing how data flows through the system:

Discovery — Company URL → Seed Crawler + TinyFish → Company profile + URLs
Parallel Research — Founders, Investors, Press, Financials, Tech Stack, and Social all run concurrently, each producing structured JSON
Quality Check — Validator flags contradictions, gaps, and low-confidence data
Report — Synthesis agent writes a structured due diligence brief
Explore — Interactive Q&A lets you ask follow-up questions about the report

Integration

AG2 handles the multi-agent orchestration — defining specialists, registering tools, and managing the conversation loop. TinyFish handles the web — rendering JavaScript-heavy pages, navigating dynamic content, and extracting structured data from sites that would otherwise require a full browser.

The integration between them is lightweight. You register TinyFish as a tool once, and any agent in the system can call it. AG2's tool registration model means the scraping capability is shared across all specialists without duplicating configuration. Each agent decides independently when and how to use it based on its own task.

This separation of concerns is deliberate. The orchestration layer doesn't know or care how pages get scraped. The scraping layer doesn't know what research domain it's serving. That clean boundary makes it easy to swap either side — use a different scraping service, or plug the same TinyFish tool into a completely different AG2 pipeline.

To see how this all fits together, check out the full source code.

Where to Take It Next

The most obvious improvement is closing the loop between the Validator and the specialists. Right now, when the Validator flags a gap — say, missing founder data — that just shows up as a caveat in the final report. But there's no reason the orchestrator couldn't route that gap back to the Founders agent and say "try again, here's a more specific goal." You'd get a tighter feedback loop and fewer holes in the output.

Along the same lines, you could use the Validator's overall confidence score to decide whether synthesis should even run yet. If confidence comes back "low," re-run the weakest specialists before moving on. A simple threshold check is all it takes.

The more interesting direction is AG2's Group Chat pattern. Right now the six specialists are independent — they don't talk to each other. But in practice, research domains overlap. The Press agent might stumble across a funding announcement that the Investors agent would love to know about. Group Chat would let agents hand off discoveries to each other mid-run, which starts to feel more like how a real research team operates.

There's also the question of memory. Every pipeline run starts from scratch today. If you added a vector store or document database, the Q&A agent could reference prior reports, track how a company's headcount or funding has changed over time, and answer questions that span multiple runs.

Finally, adding new research domains is trivially easy. The specialist list is just a Python data structure — one dictionary per domain with a system prompt, a message template, and an output spec. Want to add regulatory filings, patent searches, or customer reviews? Add one more entry. The orchestrator doesn't change at all.

Learn More

AG2 documentation: framework reference for multi-agent orchestration
TinyFish documentation: browser-as-a-service API for JS-rendered scraping
Source code: the full due diligence example
Deep Web Research with AG2 and GPT Researcher: a related AG2 pipeline using GPT Researcher

Topics

Ag2 Tinyfish Multi-agent Due-diligence Web-scraping

Due diligence with AG2 and TinyFish

The answer isn't a bigger model. It's a team of agents — each one a specialist, working in parallel, coordinated by a simple pipeline. This is exactly what AG2 was built for.

The AG2 framework