AutoResearch is a pattern for autonomous software optimization: track a single branch head, evaluate candidate commits against it, and advance the head only when a run improves a user-defined metric. The shape — experiment, run, result, keep / discard / crash — follows Andrej Karpathy's framing of autonomous software optimization as a research direction.
Status: in development. The ECS-native lifecycle components are implemented. The loop controller that advances the branch head is not yet in this repo.
What's Implemented¶
archetype.experiments models the lifecycle state as ordinary Components, so runs become entities in an archetype world — forkable, time-travelable, and queryable with the same tools as any other simulation state.
| Component | Role |
|---|---|
Experiment |
The setup for a family of runs: repo, branch, metadata. No scoring fields. |
Run |
A single attempt: one VM, one agent, one task, one commit. Mirrors archetype-runner's record shape. |
Result |
Opaque eval envelope attached to a Run. User code decides the metric. |
BranchHead |
Persisted "current best commit" for an experiment, advanced by the user's loop. |
The library deliberately does not define what "better" means. Result.outputs_json and BranchHead.descriptor_json are free-form — a scalar metric, a Pareto point, an LLM judge verdict, a pytest report, a tournament record. The library persists; the user's eval code scores.
Ingesting from archetype-runner¶
archetype-runner is a separate tool that executes coding agents in VMs and records agent runs to SQLite. Its records can be ingested into an archetype world row-for-row:
from archetype.experiments import ingest_runner_state, load_runner_state_db
rows = load_runner_state_db("/path/to/runner/state.db")
await ingest_runner_state(world_id, rows, container)
After ingestion, runs are queryable as entities in the world — filter by experiment_name, join with Result, time-travel to a historical snapshot, or fork the world to explore "what if run X had won instead."
What's In Development¶
The loop orchestration itself. A controller that:
- reads the current
BranchHeadfor an experiment - launches a bounded run against that commit
- waits for the user-defined evaluator to emit a
Result - compares the result to the incumbent using user-defined logic
- updates the
BranchHeadon improvement, otherwise leaves it
This is deliberately the last piece. The primitives are intentionally scoring-agnostic so the loop can be built against any comparison — scalar, Pareto, LLM judge, tournament, human vote — without the components having to encode a preference.
Why ECS-Native¶
Modeling experiments as ECS state means:
- Forking — replay what would have happened if a different run had advanced the head
- Time-travel queries — inspect an experiment's state at any historical tick
- Concurrency — process many experiments as separate archetypes on the same engine
- Audit — every state transition is an appended row, not a mutation
Experiment state gets the same operational properties as any other simulation in Archetype, without a parallel storage layer.
References¶
- Andrej Karpathy's framing of autonomous software optimization and branch-frontier agent workflows
src/archetype/experiments/— the current component implementationsarchetype-runner— the agent-in-VM runner whose registry feeds this schema