Due diligence with AG2 and TinyFish
Due diligence on a company — the kind an investor, acquirer, or partner does before writing a check — takes 20 to 40 hours of manual research. You're digging into founders, funding rounds, press sentiment, tech stack, financials, and social signals. Each domain has its own sources, its own quirks, and its own set of pages that won't even load without a real browser.
A single AI agent can't do this well. It loses detail when juggling six research domains at once. It can't scrape JavaScript-rendered pages like Crunchbase or LinkedIn. And it works sequentially when most of these tasks are completely independent.
The answer isn't a bigger model. It's a team of agents — each one a specialist, working in parallel, coordinated by a simple pipeline. This is exactly what AG2 was built for.
The AG2 framework
Our open-source AG2 framework is for building systems where multiple AI agents collaborate. Instead of one monolithic prompt trying to do everything, you define specialized agents, give them tools, and let them work together.
Due diligence naturally splits into independent research domains — founders, funding history, press coverage, tech stack, financials, social signals. Each domain has its own sources and its own questions.
AG2 lets you mirror that structure in code: one agent per domain, all running in parallel. Each research domain gets its own agent with a focused prompt. A funding specialist doesn't need to think about press sentiment. A tech stack analyst doesn't care about investor names. Smaller, focused tasks produce better results than one agent trying to be an expert in everything. The cross-referencing and iterative follow-ups that a human team does naturally are harder — we'll talk about where AG2's orchestration can help with that later in the post.
The Pipeline at a Glance
The system runs five stages. Here's the high-level flow:
Pipeline at a glance: Company URL to Seed Crawler to Parallel Specialists to Validator to Synthesis to Q&A
The seed crawl must finish first — it discovers the company name, team pages, press pages, and job listings that the specialists need. After that, six specialists fan out in parallel. Once they all finish, a validator checks for gaps, then a synthesis agent writes the final report.
The Fan-Out Pattern
This is where AG2 really shines. Six independent agents run simultaneously, each scraping different sources and returning structured data:
Fan-out pattern: Seed Crawler to parallel specialists to Validator, Synthesis, and Q&A
Each specialist is an AG2 agent with access to TinyFish — a browser-as-a-service API that handles JavaScript rendering, navigation, and data extraction. The agent decides which URLs to scrape and what to look for. TinyFish does the heavy lifting of actually rendering and reading the pages.
How Each Agent Works
Every specialist follows the same pattern:
Agent orchestration: Orchestrator sends task to AG2 Agent Pair, which calls TinyFish for scraping and returns structured JSON
The orchestrator sends a task. The agent decides what to scrape (guided by its system prompt), calls TinyFish one or more times, and returns structured JSON. AG2 handles the conversation loop and tool execution — you don't need to manage that yourself.
What Each Stage Does
1. Seed Crawler
Takes a company URL and builds an initial profile: company name, description, team page URLs, press page URLs, job listings, and funding mentions. Everything downstream depends on this context.
2. Six Parallel Specialists
Each specialist gets the seed profile and focuses on one research domain:
| Specialist | What it researches | Example sources |
|---|---|---|
| Founders & Team | Founder backgrounds, executive team, headcount | LinkedIn, company team page |
| Investors & Funding | Funding rounds, investors, valuations | Crunchbase, company site |
| Press Coverage | Media mentions, sentiment analysis | TechCrunch, news sites |
| Financials | Revenue, market cap, key metrics | SEC filings, financial databases |
| Tech Stack | Frontend, backend, infrastructure | Job postings, BuiltWith, GitHub |
| Social Signals | Social media presence, community size | Twitter/X, LinkedIn, GitHub |
All six run concurrently. If one fails — a scrape times out, an API errors — the pipeline catches the exception, records it, and continues. The other five specialists still contribute to the final report.
3. Validator
A tool-free agent that reviews all collected data for:
- Contradictions between sources
- Missing critical fields that should have been found
- Low-confidence data that needs a second look
The validator doesn't try to fix problems — it surfaces them. This keeps the pipeline honest and gives downstream consumers a clear picture of what's trustworthy and what needs a second look.
4. Synthesis
A "senior analyst" agent reads everything — all specialist outputs plus validation notes — and writes a structured markdown report. It doesn't paper over gaps; it calls them out. For example, a report might note missing founder details, limited press coverage, or no disclosed financials.
5. Interactive Q&A
After the report is generated, you can ask follow-up questions. The Q&A agent has a tool to read individual files from the output directory, so it loads data on demand rather than stuffing everything into context.
The Full Pipeline Flow
Here's the complete end-to-end picture showing how data flows through the system:
- Discovery — Company URL → Seed Crawler + TinyFish → Company profile + URLs
- Parallel Research — Founders, Investors, Press, Financials, Tech Stack, and Social all run concurrently, each producing structured JSON
- Quality Check — Validator flags contradictions, gaps, and low-confidence data
- Report — Synthesis agent writes a structured due diligence brief
- Explore — Interactive Q&A lets you ask follow-up questions about the report
Integration
AG2 handles the multi-agent orchestration — defining specialists, registering tools, and managing the conversation loop. TinyFish handles the web — rendering JavaScript-heavy pages, navigating dynamic content, and extracting structured data from sites that would otherwise require a full browser.
The integration between them is lightweight. You register TinyFish as a tool once, and any agent in the system can call it. AG2's tool registration model means the scraping capability is shared across all specialists without duplicating configuration. Each agent decides independently when and how to use it based on its own task.
This separation of concerns is deliberate. The orchestration layer doesn't know or care how pages get scraped. The scraping layer doesn't know what research domain it's serving. That clean boundary makes it easy to swap either side — use a different scraping service, or plug the same TinyFish tool into a completely different AG2 pipeline.
To see how this all fits together, check out the full source code.
Where to Take It Next
The most obvious improvement is closing the loop between the Validator and the specialists. Right now, when the Validator flags a gap — say, missing founder data — that just shows up as a caveat in the final report. But there's no reason the orchestrator couldn't route that gap back to the Founders agent and say "try again, here's a more specific goal." You'd get a tighter feedback loop and fewer holes in the output.
Along the same lines, you could use the Validator's overall confidence score to decide whether synthesis should even run yet. If confidence comes back "low," re-run the weakest specialists before moving on. A simple threshold check is all it takes.
The more interesting direction is AG2's Group Chat pattern. Right now the six specialists are independent — they don't talk to each other. But in practice, research domains overlap. The Press agent might stumble across a funding announcement that the Investors agent would love to know about. Group Chat would let agents hand off discoveries to each other mid-run, which starts to feel more like how a real research team operates.
There's also the question of memory. Every pipeline run starts from scratch today. If you added a vector store or document database, the Q&A agent could reference prior reports, track how a company's headcount or funding has changed over time, and answer questions that span multiple runs.
Finally, adding new research domains is trivially easy. The specialist list is just a Python data structure — one dictionary per domain with a system prompt, a message template, and an output spec. Want to add regulatory filings, patent searches, or customer reviews? Add one more entry. The orchestrator doesn't change at all.
Learn More
- AG2 documentation: framework reference for multi-agent orchestration
- TinyFish documentation: browser-as-a-service API for JS-rendered scraping
- Source code: the full due diligence example
- Deep Web Research with AG2 and GPT Researcher: a related AG2 pipeline using GPT Researcher
