That's the idea behind Sakana Fugu, the freshly launched multi-agent AI system from Tokyo-based startup Sakana AI. Released June 22, 2026. Already turning heads.
Key Takeaways
- Sakana Fugu is a multi-agent orchestration system behaves like a single model through one OpenAI-compatible API
- Two variants exist: Fugu for low-latency everyday tasks, and Fugu Ultra for maximum quality on complex multi-step problems
- It's grounded in two ICLR 2026 papers , TRINITY and Conductor , on learned model orchestration
- Fugu Ultra benchmarks on par with Anthropic's Fable 5 and Mythos Preview, even though neither is in its agent pool
- Designed to reduce vendor lock-in, which matters more than most developers realize
- Around 500 beta testers report strong gains on code review, research automation, and cybersecurity analysis
- Pricing starts at $200/month for subscriptions, with usage-based billing for larger workloads
What Is Sakana Fugu?
Fugu is not a model in the traditional sense. It's a multi-agent system that presents itself as a single model. You send a request to one endpoint. Fugu decides internally how to handle it , whether that means answering directly, delegating to specialized agents, or assembling a whole team of models to collaborate on the task.
From a developer's perspective, you just call an API. What happens behind the scenes is considerably more interesting.
Fugu is itself a language model, trained to call other LLMs from a swappable pool, including in some cases copies of itself. Depending on the complexity and nature of the request, it routes, delegates, checks, and synthesizes across multiple agents. All of that coordination is invisible to the user. You get a response. That's it.
The project comes from Sakana AI, co-founded by Llion Jones and David Ha, former research director at Google Brain. These aren't newcomers.
How It Works: TRINITY and Conductor
The technical foundations here are two research papers accepted at ICLR 2026. Both tackle the same core question . How do you teach a system to orchestrate other models, rather than hand-designing the workflow yourself?
TRINITY: Roles, Not Rules
TRINITY uses a lightweight evolved coordinator to manage multiple LLMs across several turns of a conversation or task. Each participating model gets assigned one of three roles: Thinker, Worker, or Verifier.
This role-based delegation adapts based on the task type . Coding, math, reasoning, knowledge retrieval , and the system learns which combinations work best rather than being told by a human engineer. No hand-holding.And rigid playbook.
Conductor: Reinforcement Learning for Coordination
Conductor takes a different angle. It's trained with reinforcement learning to discover natural-language coordination strategies on its own. It designs how agents communicate, what prompts they receive, how information flows between them . All without a human prescribing the workflow.
The result, according to Sakana's research, is diverse pools of LLMs outperform individual workers on challenging reasoning benchmarks. Not because any single model got smarter. Because the coordination itself adds capability.
Fugu vs. Fugu Ultra
Both variants run through the same OpenAI-compatible API endpoint.
Fugu is built for everyday use. Low latency, strong general performance, natural fit for coding and code review, responsive chatbot services, and teams with data privacy constraints (you can exclude specific providers from the agent pool).
Fugu Ultra goes deeper. Larger pool of expert agents, optimized for maximum answer quality on hard high-stakes problems. Early users have deployed it for Kaggle competition workflows, scientific paper reproduction, cybersecurity analysis, and patent investigations.
One beta tester , a software developer , reported Fugu Ultra caught more than 20 issues during a code review where other tools flagged only about 3. That's not a marginal improvement.And's a different category of output entirely.
Benchmark Results
Here's where things get genuinely impressive. Sakana AI's published benchmarks show Fugu Ultra performing shoulder-to-shoulder with Anthropic's Fable 5 and Mythos Preview across coding, reasoning, science, and agentic benchmarks.
The important caveat: neither Fable 5 nor Mythos Preview is part of Fugu's agent pool, since they aren't publicly accessible. Fugu achieves these results using only publicly available frontier models. If Anthropic's models were ever included, scores would likely climb higher still.
Sakana also claims Fugu beat Gemini 3.1 Pro, Opus 4.8, and GPT-5.5 in internal tests on automated research, mechanical design, and financial forecasting tasks.
Worth knowing: Sakana AI's prior orchestration system, ALE-Agent, had already placed 21st out of 1,000 human experts in a coding competition. Fugu is the productized evolution of that research direction.
The Vendor Lock-In Problem
This is arguably the most underappreciated angle of Fugu's launch. The AI infrastructure landscape right now is fragile in ways most developers don't think about until it's too late.
Export controls, regulatory shifts, foreign policy decisions , any of these can cut off access to a model overnight. Sakana AI puts it plainly in their announcement:
"For an organization or a nation, relying on a single company's APIs for critical infrastructure, finance, or governance is a material vulnerability. This risk is no longer a hypothetical possibility, but a reality."
Fugu's swappable model pool is a direct architectural response to this. If one provider restricts access, the system reroutes to other models without the user changing anything in their integration. That kind of resilience is genuinely valuable for enterprises running critical workloads.
But let's be honest about what it isn't. If several top providers restrict access simultaneously, Fugu's options shrink too. An orchestrator adds resilience. It doesn't guarantee sovereignty. As a hedge against single-point-of-failure risk, it's a reasonable bet. Just not a complete solution.
If you want context on what "frontier performance" actually looks like day-to-day, our post on Claude vs GPT: An Honest Comparison for Developers is worth reading alongside this.
Pricing and Access
Fugu is live now.
Subscription plans start at $200/month for daily use, with usage-based billing available for larger or more variable workloads. Both variants are accessible through a single API endpoint, and the OpenAI-compatible integration means you can drop it into most existing toolchains without rewriting your API calls. Swap the endpoint.Plus the model name. Done.
One thing worth knowing: Fugu is not yet available in the EU/EEA while Sakana works toward GDPR compliance. If you're operating in region, you'll need to wait a bit longer.
Is Sakana AI Worth Watching?
Yes . And not just because of Fugu.
Sakana AI's broader research philosophy is what makes it interesting. The company is built around the idea powerful AI doesn't have to come from a single monolithic model. They draw inspiration from natural systems . Swarm behavior, evolutionary processes, collective intelligence . And apply those principles to how AI systems are designed and coordinated.
That's a genuinely different bet than what most labs are making. Most of the industry is racing to build bigger single models. Sakana is asking whether smarter coordination of smaller specialized ones might be the better answer.
Given Fugu Ultra is already matching models that cost orders of magnitude more to train and run, the early evidence is at least interesting enough to take seriously.
For more context on the open-source model ecosystem Fugu draws from, check out our breakdown of GLM-5.2: ZAI's Open Source Model Built for Agents, one of the publicly available models these kinds of systems can leverage.
Should You Try Fugu?
Honestly, it depends on what you're building.
Simple, single-turn queries . Summarization, quick Q&A, basic generation — the overhead of multi-agent orchestration probably isn't worth it. A single fast model will serve you better.
But complex multi-step workflows? Long code reviews, automated research pipelines, security analysis, competitive intelligence gathering — Fugu Ultra is worth a serious look. The beta results suggest it genuinely outperforms single-model approaches on exactly these kinds of tasks.And $200/month subscription isn't cheap for individual developers. For teams running production workloads where answer quality directly affects outcomes, it's a reasonable price point to evaluate. And since the API is OpenAI-compatible, the switching cost to try it is low. If you're already calling the OpenAI API in your app, you can test Fugu with minimal code changes.
Sakana Fugu is one of the more technically interesting AI releases of 2026. It's not trying to out-train GPT-5 or Fable 5.Yet's asking a different question entirely — what can a system of coordinated specialized models do a single model can't?
The early answer, backed by ICLR 2026 research and real beta results, is quite a lot. Matching frontier models without being one. Catching more bugs in code review. Handling messy long-running tasks break single-model pipelines.
To try it yourself, head to sakana.ai/fugu. And if you've already been experimenting with multi-agent setups, i'd genuinely love to hear how Fugu compares to your current stack — drop a comment below.
Sources
- Sakana AI — Official Fugu Announcement. Https.//sakana.ai/fugu/
- VentureBeat — "No Claude Fable 5? No problem. Sakana achieves frontier performance with new Fugu multi-model auto synthesis system". Https.//venturebeat.com/orchestration/no-claude-fable-5-no-problem-sakana-achieves-frontier-performance-with-new-fugu-multi-model-auto-synthesis-system
- The Decoder — "Sakana AI's Fugu orchestrates multiple LLMs to match Anthropic's Fable and Mythos benchmarks". Https.//the-decoder.com/sakana-ais-fugu-orchestrates-multiple-llms-to-match-anthropics-fable-and-mythos-benchmarks/
- MarkTechPost — "Sakana AI Launches Sakana Fugu. An Orchestration Model That Routes Tasks Across a Swappable Pool of Frontier LLMs". Https.//www.marktechpost.com/2026/06/22/sakana-ai-launches-sakana-fugu-an-orchestration-model-that-routes-tasks-across-a-swappable-pool-of-frontier-llms/
- AI News — "Mitigating vendor lock-in with Sakana AI Fugu multi-agent models". Https.//www.artificialintelligence-news.com/news/mitigating-vendor-lock-in-sakana-ai-fugu-multi-agent-models/
- Reddit r/LLMDevs — "Fable 5 has been beaten by Sakana.ai's Fugu in some tasks". Https.//www.reddit.com/r/LLMDevs/comments/1uca8e3/fable_5_has_been_beaten_by_sakanaais_fugu_in_some/
- Hacker News — Sakana Fugu discussion thread. Https.//news.ycombinator.com/item?id=48624782
- Sakana AI on X — Technical explanation thread. Https.//x.com/SakanaAILabs/status/2068862344684581023
- TestingCatalog — "Sakana AI releases Fugu Ultra system to rival top AI labs". Https.//www.testingcatalog.com/sakana-ai-releases-fugu-ultra-system-to-rival-top-ai-labs/
- PANews — "Japan's AI Dark Horse Emerges. How a 7B Small Model Challenges Fable and Mythos": https://www.panewslab.com/en/articles/019eef07-5781-73cd-8d49-3f6918641f98