MoA(Mixture of Agents,混合代理)是一種把多個語言模型疊起來用的架構:先讓多個「提議者」獨立作答,再由一個「聚合者」讀完全部後整理出最佳答案。在 Hermes Agent 中,你可以用 shell 腳本、delegate_task 或 Kanban 三種方式建構它。 MoA (Mixture of Agents) is a stacked-LLM architecture: multiple proposer models answer the same question in parallel, then one aggregator reads them all and synthesizes the best answer. In Hermes Agent, you can wire it up three ways — shell, delegate_task, or Kanban.
核心概念:同時叫多個 LLM 回答同一條問題,再搵一個更強嘅模型將啲答案整合成最終結果。 The core idea: call multiple LLMs on the same question in parallel, then have a stronger model aggregate the best parts into a single final answer.
問題 ──┬→ Claude Sonnet ──┐ ├→ GPT-4o ────────┤──→ Aggregator(最強模型)→ 最終答案 ├→ Gemini Pro ────┤ └→ DeepSeek ──────┘ Query ─┬→ Claude Sonnet ──┐ ├→ GPT-4o ────────┤──→ Aggregator (strongest model) → Final answer ├→ Gemini Pro ────┤ └→ DeepSeek ──────┘
hermes chat -q -m <model> 平行 call 多個模型parallel model calls三個原理:多樣性、互補性、集體智慧。 Three principles: diversity, complementarity, and the wisdom of the crowd.
不同模型架構、訓練資料、對齊方式,產生截然不同的「思維路徑」。 Different architectures, training data, and alignment produce genuinely different reasoning paths.
模型 A 卡住的地方,模型 B 剛好擅長。互補讓弱點被覆蓋。 Where model A gets stuck, model B often shines. Weaknesses cancel; strengths compound.
由最強的「聚合者」讀完全部答案,挑出最精華的部分。 A strong aggregator reads every answer and selects the most useful signal from each.
成本約 N+1 倍、延遲上升。適合複雜任務,不適合簡單問答。 ~N+1× the cost and added latency. Worth it for hard tasks, overkill for trivial ones.
從一個問題出發,經過平行提議者,最後由聚合者收束為單一答案。 A single question fans out to parallel proposers, then converges to one synthesized answer.
從最快到最穩固,依需求選一個。 From fastest to most durable — pick what fits your workload.
#!/bin/bash
# ~/bin/moa.sh — Mixture of Agents pipeline
PROMPT="$1"
PROPOSERS=("anthropic/claude-sonnet-4" "openai/gpt-4o" "google/gemini-2.5-pro")
AGGREGATOR="anthropic/claude-sonnet-4"
# Step 1: collect proposer responses in parallel
for m in "${PROPOSERS[@]}"; do
hermes chat -q "$PROMPT" -m "$m" -Q >> /tmp/moa_proposers.txt
echo "---" >> /tmp/moa_proposers.txt
done
# Step 2: aggregator synthesizes
AGG_PROMPT="You are aggregating multiple LLM responses.
Here are the responses:
$(cat /tmp/moa_proposers.txt)
Original question: $PROMPT
Synthesize a single, high-quality answer that takes the best from each:"
hermes chat -q "$AGG_PROMPT" -m "$AGGREGATOR"
from hermes_tools import delegate_task
def moa(prompt: str, proposers: list[str], aggregator: str) -> str:
# Fan out to N proposers in parallel
tasks = [
{"goal": prompt,
"context": f"Reply directly and concisely. Model: {m}."}
for m in proposers
]
results = delegate_task(tasks=tasks)
combined = "\n\n---\n\n".join(r["summary"] for r in results)
# Aggregate
synth = f"Synthesize the best answer from {len(results)} responses to: {prompt}\n\n{combined}"
return delegate_task(goal=synth,
context=f"You are the aggregator. {synth}")
if __name__ == "__main__":
print(moa(
prompt="Explain quantum entanglement to a high-schooler",
proposers=["anthropic/claude-sonnet-4", "openai/gpt-4o", "google/gemini-2.5-pro"],
aggregator="anthropic/claude-sonnet-4",
))
# 1. One profile per proposer model
hermes profile create sonnet --model anthropic/claude-sonnet-4
hermes profile create gpt --model openai/gpt-4o
hermes profile create gemini --model google/gemini-2.5-pro
# 2. Initialize a Kanban board for the MoA workflow
hermes kanban init moa-board
# 3. Create one task per proposer; assign a profile
hermes kanban create --board moa-board \
--title "propose: claude" --profile sonnet
hermes kanban create --board moa-board \
--title "propose: gpt" --profile gpt
hermes kanban create --board moa-board \
--title "propose: gemini" --profile gemini
# 4. Aggregate once all proposers report done
hermes kanban create --board moa-board \
--title "aggregate: final answer" --profile sonnet \
--depends-on "propose: claude,propose: gpt,propose: gemini"
提議者要「多元」而非「最強」。混搭不同供應商(Anthropic / OpenAI / Google / DeepSeek / 本地)效果最好。聚合者則應該是你能調用的最強模型,因為它要負責分辨噪音與訊號。 Pick proposers for diversity, not raw strength — mix vendors (Anthropic / OpenAI / Google / DeepSeek / local). The aggregator should be your strongest available model: its job is to separate signal from noise.
config.yaml 設定自訂 base_url 即可掛上 Ollama 或其他 OpenAI 相容的本地端點。
Yes — and it's recommended. Mix a strong cloud model with fast local ones to balance quality, cost, and privacy. In Hermes, just point a custom base_url in config.yaml at Ollama or any OpenAI-compatible local endpoint.