Hermes Agent · MoA Reference

讓多個 AI 一起回答,
再由最強的整合
Stack multiple LLMs.
Let the best synthesize.

MoA(Mixture of Agents,混合代理)是一種把多個語言模型疊起來用的架構:先讓多個「提議者」獨立作答,再由一個「聚合者」讀完全部後整理出最佳答案。在 Hermes Agent 中,你可以用 shell 腳本、delegate_task 或 Kanban 三種方式建構它。 MoA (Mixture of Agents) is a stacked-LLM architecture: multiple proposer models answer the same question in parallel, then one aggregator reads them all and synthesizes the best answer. In Hermes Agent, you can wire it up three ways — shell, delegate_task, or Kanban.

Mixture of Agents — 12 specialized agents surrounding a central aggregator
v1.0.0 · 2026-07-03 19:16:56 PHP 8.3.6 本頁瀏覽Page visits:
一句講晒TL;DR

簡單講 MoA(Mixture of Agents) MoA in 30 seconds

核心概念:同時叫多個 LLM 回答同一條問題,再搵一個更強嘅模型將啲答案整合成最終結果。 The core idea: call multiple LLMs on the same question in parallel, then have a stronger model aggregate the best parts into a single final answer.


               問題 ──┬→ Claude Sonnet ──┐
                       ├→ GPT-4o ────────┤──→ Aggregator(最強模型)→ 最終答案
                       ├→ Gemini Pro ────┤
                       └→ DeepSeek ──────┘

                Query ─┬→ Claude Sonnet ──┐
                       ├→ GPT-4o ────────┤──→ Aggregator (strongest model) → Final answer
                       ├→ Gemini Pro ────┤
                       └→ DeepSeek ──────┘
            

點解有用?Why it works

  • 每個模型有盲點,多個模型互補Each model has blind spots; multiple models complement each other
  • Aggregator 可以揀最好的部分合併The aggregator picks and merges the best parts
  • 複雜推理、研究分析、需要多元觀點時效果好Best for complex reasoning, research, and multi-perspective tasks

三種實作方式Three approaches

  1. Shell scripthermes chat -q -m <model> 平行 call 多個模型parallel model calls
  2. delegate_taskPython spawn 多個 subagent,每個用不同模型spawn subagents, one per model
  3. Kanban / cron最穩固,適合正式環境most robust, production-ready
核心概念Core concepts

為什麼 MoA 比單一模型更強? Why MoA beats any single model

三個原理:多樣性、互補性、集體智慧。 Three principles: diversity, complementarity, and the wisdom of the crowd.

① 多樣性① Diversity

不同模型架構、訓練資料、對齊方式,產生截然不同的「思維路徑」。 Different architectures, training data, and alignment produce genuinely different reasoning paths.

② 互補性② Complementarity

模型 A 卡住的地方,模型 B 剛好擅長。互補讓弱點被覆蓋。 Where model A gets stuck, model B often shines. Weaknesses cancel; strengths compound.

③ 聚合優於個體③ Aggregation

由最強的「聚合者」讀完全部答案,挑出最精華的部分。 A strong aggregator reads every answer and selects the most useful signal from each.

④ 權衡④ Trade-offs

成本約 N+1 倍、延遲上升。適合複雜任務,不適合簡單問答。 ~N+1× the cost and added latency. Worth it for hard tasks, overkill for trivial ones.

~65%
AlpacaEval 2.0 提升AlpacaEval 2.0 lift
3–5
提議者甜蜜點Sweet-spot proposers
N+1×
相對成本Relative cost
最慢者 + 聚合者延遲Slowest + aggregator

流程圖Pipeline

資料如何在 MoA 中流動 How data flows through an MoA

從一個問題出發,經過平行提議者,最後由聚合者收束為單一答案。 A single question fans out to parallel proposers, then converges to one synthesized answer.

輸入Input

U
使用者User
提出問題Asks the question

提議者層(平行)Proposer layer (parallel)

C
Claude Sonnet
anthropic/claude-sonnet-4
G
GPT-4o
openai/gpt-4o
M
Gemini Pro
google/gemini-2.5-pro
D
DeepSeek
deepseek/deepseek-chat

輸出Output

A
聚合者Aggregator
anthropic/claude-sonnet-4
最終答案Final answer
整合後的輸出Synthesized output

實作方式Implementations

在 Hermes 中建立 MoA 的三種方法 Three ways to build MoA in Hermes

從最快到最穩固,依需求選一個。 From fastest to most durable — pick what fits your workload.

moa.sh
#!/bin/bash
# ~/bin/moa.sh — Mixture of Agents pipeline
PROMPT="$1"
PROPOSERS=("anthropic/claude-sonnet-4" "openai/gpt-4o" "google/gemini-2.5-pro")
AGGREGATOR="anthropic/claude-sonnet-4"

# Step 1: collect proposer responses in parallel
for m in "${PROPOSERS[@]}"; do
  hermes chat -q "$PROMPT" -m "$m" -Q >> /tmp/moa_proposers.txt
  echo "---" >> /tmp/moa_proposers.txt
done

# Step 2: aggregator synthesizes
AGG_PROMPT="You are aggregating multiple LLM responses.
Here are the responses:

$(cat /tmp/moa_proposers.txt)

Original question: $PROMPT

Synthesize a single, high-quality answer that takes the best from each:"
hermes chat -q "$AGG_PROMPT" -m "$AGGREGATOR"
moa.py
from hermes_tools import delegate_task


def moa(prompt: str, proposers: list[str], aggregator: str) -> str:
    # Fan out to N proposers in parallel
    tasks = [
        {"goal": prompt,
         "context": f"Reply directly and concisely. Model: {m}."}
        for m in proposers
    ]
    results = delegate_task(tasks=tasks)
    combined = "\n\n---\n\n".join(r["summary"] for r in results)

    # Aggregate
    synth = f"Synthesize the best answer from {len(results)} responses to: {prompt}\n\n{combined}"
    return delegate_task(goal=synth,
                          context=f"You are the aggregator. {synth}")


if __name__ == "__main__":
    print(moa(
        prompt="Explain quantum entanglement to a high-schooler",
        proposers=["anthropic/claude-sonnet-4", "openai/gpt-4o", "google/gemini-2.5-pro"],
        aggregator="anthropic/claude-sonnet-4",
    ))
terminal · bash
# 1. One profile per proposer model
hermes profile create sonnet  --model anthropic/claude-sonnet-4
hermes profile create gpt     --model openai/gpt-4o
hermes profile create gemini  --model google/gemini-2.5-pro

# 2. Initialize a Kanban board for the MoA workflow
hermes kanban init moa-board

# 3. Create one task per proposer; assign a profile
hermes kanban create --board moa-board \
    --title "propose: claude" --profile sonnet
hermes kanban create --board moa-board \
    --title "propose: gpt"    --profile gpt
hermes kanban create --board moa-board \
    --title "propose: gemini" --profile gemini

# 4. Aggregate once all proposers report done
hermes kanban create --board moa-board \
    --title "aggregate: final answer" --profile sonnet \
    --depends-on "propose: claude,propose: gpt,propose: gemini"

如何挑選模型How to pick your models

提議者要「多元」而非「最強」。混搭不同供應商(Anthropic / OpenAI / Google / DeepSeek / 本地)效果最好。聚合者則應該是你能調用的最強模型,因為它要負責分辨噪音與訊號。 Pick proposers for diversity, not raw strength — mix vendors (Anthropic / OpenAI / Google / DeepSeek / local). The aggregator should be your strongest available model: its job is to separate signal from noise.


FAQ

常見問題Common questions

MoA 跟普通的多模型投票(例如 majority vote)有什麼不同? How is MoA different from simple majority voting across models?
多數決只適用於有單一正確答案的任務(分類、選擇題)。MoA 是生成式的:聚合者讀完整段文字後,根據語意挑出最好的部分。對於開放式問答、寫作、研究分析,MoA 通常明顯優於多數決。 Voting only works for tasks with a single correct answer (classification, MCQ). MoA is generative: the aggregator reads entire responses and cherry-picks the best parts. For open-ended Q&A, writing, and research, MoA usually wins handily.
應該用幾個提議者? How many proposers should I use?
原始 Together MoA 論文用 3 層、每層 6 個模型。實務上,3–5 個異質模型是甜蜜點。少於 3 個,多樣性不足;多於 6 個,成本急劇上升但品質提升遞減。 The original Together MoA paper used 3 layers of 6 models. In practice, 3–5 heterogeneous models is the sweet spot. Below 3, diversity is thin; above 6, costs balloon while gains diminish.
聚合者本身很強時,MoA 還有幫助嗎? When the aggregator is already frontier-tier, does MoA still help?
邊際效益會遞減。當聚合者直接呼叫就能得到 95% 好的答案時,MoA 帶來的提升可能只有 2–3%。MoA 的真正價值在於取得多元觀點,而不是把一個好答案變成更好的答案。對於需要創意、辯證、或跨領域整合的任務,MoA 的相對優勢最大。 The marginal benefit shrinks. If calling the aggregator directly already gives you a 95%-good answer, MoA might only add 2–3%. MoA's real value is surfacing diverse perspectives, not polishing a single good answer. It shines brightest on creative, dialectical, or cross-domain synthesis tasks.
延遲會不會很高? What's the latency impact?
平行執行時,總延遲 ≈ 最慢的提議者 + 聚合者,不是 N 倍疊加。實測一個 4 提議者 + 1 聚合者的 MoA,端到端約 8–15 秒。對即時對話不適合,但對研究、批次分析很剛好。 With parallel execution, total latency ≈ slowest proposer + aggregator, not N× stacked. Empirically, a 4-proposer + 1-aggregator MoA finishes in ~8–15s end-to-end. Too slow for live chat, ideal for research and batch analysis.
可以用本地模型(例如 Ollama)嗎? Can I use local models (e.g. Ollama)?
可以,而且很建議。混合「雲端強模型 + 本地快模型」可以兼顧品質、成本與隱私。在 Hermes 中,只要在 config.yaml 設定自訂 base_url 即可掛上 Ollama 或其他 OpenAI 相容的本地端點。 Yes — and it's recommended. Mix a strong cloud model with fast local ones to balance quality, cost, and privacy. In Hermes, just point a custom base_url in config.yaml at Ollama or any OpenAI-compatible local endpoint.