W&B Self-Improving Agents Hackathon

openclaw-trace

A recursive self-improvement loop for AI agents.
Evidence-grounded. Measurable. Compounding.

🐝 Traced with W&B Weave

The Problem

AI agents make mistakes. They frustrate users. They miss opportunities.

But most agents can't learn from their own work.

What if they could?

The Key Insight

Every agent session is a training signal.

Errors, user friction, missed opportunities, moments of delightβ€”
they're all grounded evidence we can mine, cluster, and act on.

The Loop

Session Traces
β†’
Mine Signals
β†’
Rollup/Cluster
β†’
Tickets
β†’
Research Briefs
β†’
Experiments
β†’
Fixes

Then measure the delta. Repeat.

Architecture

Session JSONL πŸ“„ traces mine-signals LLM + heuristics β†’ error β†’ user_frustration β†’ proactive_opp β†’ user_delight rollup cluster + rank fingerprints research briefs actor-critic Claude drafts Codex critiques evidence-first experiments + verified fixes 🐝 Weave ⬑ Redis measure Ξ” recursive improvement loop 🐝 Weave ⬑ Redis

What We Mine

πŸ”΄

Errors

Tool failures, exceptions, non-zero exits. Exact quotes from traces.

😀

User Friction

Frustration signals, confusion, repeated clarifications.

πŸ’‘

Improvement Suggestions

Ideas surfaced during sessionsβ€”explicit or implicit.

πŸš€

Proactive Opportunities

Things the agent could have done but didn't think to.

✨

User Delight

Moments to create magic. Surprise and polish.

πŸ§ͺ

Experiment Ideas

Hypotheses worth testing. Ablations. Benchmarks.

Evidence-First

Every claim links back to exact quotes from real sessions.
No hallucinated improvements.

# Every signal includes grounded evidence
{
  "kind": "error",
  "summary": "Phorge deadline format rejected",
  "evidence": [{
    "event_i": 42,
    "quote": "Invalid value for deadline: expected epoch integer"
  }]
}

Actor-Critic Research Briefs

The result: briefs you can actually trust.

Experiments: Weave + Redis

🐝
Weave traces
⬑
Redis state
🐝 Full observability into experiment outcomes

What We've Built

100+
sessions mined
7
signal types
∞
loop iterations

The Vision

Many agents. Shared, sanitized rollups.
Evidence-backed improvements that propagate.

A distributed R&D engineβ€”
where the best fixes spread because they work.

openclaw-trace

Recursive self-improvement, grounded in evidence.

🐝 Traced with W&B Weave

github.com/phantastic-ai/openclaw-trace

1 / 12