Rust core plan
Document type: Implementation plan.
Scope: crates/, src/archetype/core/aio/, and the Python adapter layer that
will bridge them.
This plan describes the migration from the current Python async core prototype
to a Rust core engine built on arrow-rs, tokio, the Arrow C Data Interface,
and an append-only Parquet store. It is an HTN-style plan: each phase decomposes
into ordered tasks, explicit products, acceptance checks, and dependencies.
Current Definition of Done¶
The next feature is not "rewrite Archetype in Rust." The feature is:
A Rust Arrow/Parquet tick kernel, callable from the existing Python runtime boundary, passing core parity tests against the Python async world, with benchmarks showing reduced per-tick overhead without taking over app, runtime, auth, audit, or broker semantics.
Rust Owns¶
- Arrow schema composition after Python supplies component schemas.
- Archetype table descriptors after Python supplies table names.
- Spawn, despawn, update, component migration, and tick materialization.
- Active live snapshots below the service layer.
- Processor scheduling for native processors.
- Local append-only Parquet storage for the benchmark/control backend.
- Arrow C Data Interface import/export and ABI diagnostics.
Python Owns¶
ArchetypeRuntime, service container, API, CLI, and docs examples.- RBAC, quotas, audit emission, command broker ordering, and command policy.
- Python component classes and table-name hashing.
- Python processors, hooks, resources, and object lifetimes.
- Daft/Iceberg/LanceDB production paths until a later storage boundary is explicitly accepted.
Non-Goals for This Feature¶
- No Rust ownership of auth, audit, command routing, or runtime handles.
- No Iceberg catalog/transaction implementation.
- No LanceDB implementation.
- No DataFusion query planner unless parity proves simple Parquet scans are the bottleneck.
- No
cuTile-rsintegration inside the engine. GPU work is a processor backend experiment, not world-state ownership. - No PyO3 requirement. Arrow C Data remains the native boundary unless packaging forces a separate decision.
0. Constraints¶
The following decisions are fixed for this migration:
- The async Python implementation is the behavioral reference.
- A component is an Arrow schema. Rust does not model Python component classes as the canonical component identity.
- Rust does not own table-name hashing. Table names are supplied by the caller, preserving the current Python naming policy until a separate policy change is made.
- Data crosses the native boundary through the Arrow C Data Interface.
- Lance storage is out of scope for the first Rust engine. The first native store is append-only Parquet.
- Python remains the runtime, API, CLI, auth, audit, and beginner-facing surface.
cuTile-rsis not part of the core engine plan. It remains a possible future processor accelerator for dense numeric component columns.
Daft's native extension authoring guide is the reference for the boundary shape:
use ArrowSchema and ArrowArray as the stable ABI, and convert into
arrow-rs types inside Rust.
1. Target Architecture¶
Python runtime / app / API / CLI
|
| Arrow C Data Interface
v
crates/archetype-ffi
|
v
crates/archetype-core
- Arrow schema composition
- world state machine
- mutation buffers
- tick materialization
- query/update/store traits
|
v
crates/archetype-parquet
- append-only local Parquet store
The Python service layer continues to enforce governance and audit. Rust owns the engine invariants below the service layer.
2. HTN Root Task¶
Task: make Rust the canonical implementation of Archetype's core engine semantics without breaking the Python public surface.
The root decomposes into these phases:
- Preserve and name current async semantics.
- Add the Rust workspace and core crates.
- Implement Arrow-native schema and world primitives.
- Implement append-only Parquet storage.
- Add the Arrow C Data Interface boundary.
- Bridge Python to Rust behind the existing async core API.
- Migrate runtime/service paths incrementally.
- Retire duplicated Python core semantics.
- Explore optional native processor acceleration.
2.1 Execution HTN¶
The executable task network is intentionally narrower than the long-term migration. Each leaf has a concrete artifact and can run when its dependencies are satisfied.
R0 Define done for Rust backend
├── R0.1 Record ownership boundary
├── R0.2 Record non-goals
└── R0.3 Record crate split
R1 Harden native processor ABI
├── R1.1 Add ABI error diagnostics
├── R1.2 Add Python adapter error surfacing
├── R1.3 Add negative Arrow C tests
└── R1.4 Document build/load path
R2 Move movement into shared Rust kernel
├── R2.1 Add movement processor module in archetype-core
├── R2.2 Reuse it from archetype-ffi
├── R2.3 Reuse it from archetype-bench
└── R2.4 Keep benchmark output stable
R3 Establish parity harness
├── R3.1 Mark Python async tests that define engine semantics
├── R3.2 Add Rust parity tests for materialization edge cases
├── R3.3 Add native-mode Python tests through the service boundary
└── R3.4 Keep native mode opt-in
R4 Benchmark the hot loop
├── R4.1 Validate final DataFrames match by backend
├── R4.2 Split timing into plan/query/process/persist phases
├── R4.3 Compare Daft live-read, Daft cached-read, and Rust Parquet
└── R4.4 Publish reproducible JSON fixtures
R5 Decide storage graduation
├── R5.1 Stay local Parquet if tick-loop overhead is already solved
├── R5.2 Add object_store only for remote/local abstraction
├── R5.3 Add DataFusion only for Rust-owned query planning
└── R5.4 Add Iceberg only for Rust-owned table transactions
Parallel Leaves¶
R0can run immediately and is documentation-only.R1.3can run in parallel withR2because adapter tests define behavior without changing the Rust kernel shape.R2.1throughR2.3are sequential inside one Rust ownership lane.R3depends onR1andR2because parity tests need the hardened ABI and shared movement kernel.R4depends onR3.1for correctness checks, but timing collection can evolve in parallel after the benchmark schema is stable.R5is a decision gate only. It must not add crates beforeR4shows the actual bottleneck.
Crate Admission Rules¶
The current crates stay minimal:
| Crate | Responsibility |
|---|---|
archetype-core |
Engine invariants, Arrow schemas, materialization, processor traits |
archetype-parquet |
Local append-only Parquet Store implementation |
archetype-ffi |
C ABI and Arrow C Data Interface |
archetype-bench |
Benchmark binaries only |
New crates require a concrete boundary:
| Candidate crate | Admission trigger |
|---|---|
archetype-object-store |
Local filesystem is no longer enough for storage tests or deployment |
archetype-datafusion |
Rust must own query planning, expressions, or predicate pushdown |
archetype-iceberg |
Rust must own Iceberg commits, catalogs, snapshots, or transactions |
archetype-pyo3 |
C ABI is insufficient for packaging or lifecycle management |
Dependencies should not leak upward into archetype-core; the core crate should
stay free of storage engine, Python, and catalog policy.
Execution Status¶
| Node | Status | Notes |
|---|---|---|
R0 |
Done | The ownership boundary, non-goals, crate split, and definition of done are recorded in this document. |
R1 |
Done for movement ABI | The FFI boundary reports thread-local last errors. The dedicated movement ABI now treats missing required columns as errors. |
R2 |
Done | Movement is implemented once in archetype-core and reused from FFI and benchmark paths. |
R3 |
Kernel done, service bridge pending | Rust core parity tests cover spawn, despawn, reserved IDs, metadata, live snapshots, filters, and component-table migration. Python service native-mode parity remains pending because the runtime adapter is not implemented yet. |
R4 |
Done for movement envelope | Movement benchmark records correctness for both backends. Rust reports setup, read-prior, materialize, process, append, live-snapshot, profiled tick, query, and total phases; Python reports the existing setup/run/query phases. |
R5 |
Done as decision gate | No new storage/query crates are admitted yet. Local Parquet remains the control backend until benchmarks prove a storage/planner bottleneck. |
Decisions Made During Execution¶
- Dedicated processor ABI functions are strict.
arct_movement_processvalidates its required columns before scheduling throughNativeSystem; missing columns are caller errors, not silent skips. - Generic system scheduling remains permissive. A processor registered with
NativeSystemstill skips batches that do not contain its required columns, matching the archetype-subset scheduling model. - ABI diagnostics are thread-local strings exposed through
arct_last_error_message(). The primary ABI still returns integer status codes so C callers stay simple. - Benchmarks now carry correctness fields. Timing claims are not considered valid unless the final row count and final position sums match the expected movement model.
- Rust world profiling belongs in the core executor, not in benchmark-only
wrappers.
step_profiled()keepsstep()behavior intact while exposing the phase timings needed to analyze the hot loop. - Component migration is represented as a caller-supplied new table plus caller-supplied component batch. Rust preserves entity IDs and materializes the old table tombstone and new table active row; Python still owns deciding table names and component schemas.
- No
object_store, DataFusion, Iceberg, PyO3, orcuTile-rscrate was added. The current bottleneck must be demonstrated before widening dependencies.
3. Phase 1: Preserve Current Async Semantics¶
Intent: prevent the rewrite from changing behavior accidentally.
Tasks¶
- Inventory the async core contracts from
src/archetype/core/aio/. - Add contract tests for any behavior currently covered only implicitly.
- Mark sync/async divergences as migration risks, not Rust requirements.
- Decide which current behaviors are bugs before porting them.
Required Contracts¶
AsyncWorld.step()lifecycle: pre-tick hook, query previous tick, materialize mutations, execute processors, persist, increment tick, post-tick hook.active_signaturesis the union of registered entity signatures, pending spawn tables, and pending despawn tables.- Spawn materialization deduplicates same-entity rows with last-write-wins.
- Despawn materialization marks previous rows inactive; it does not delete rows.
update_entityoverlays values without changing archetype/table identity.add_componentsandremove_componentsmigrate an entity between tables by appending an inactive row to the old table and an active row to the new table.- Store writes are append-only.
- Persistence failures must become observable. The current Python updater logs and returns a stamped DataFrame; Rust must return an error.
Acceptance¶
- A contract matrix exists mapping Python tests to Rust test cases.
- Known semantic divergences are documented with a decision: preserve, fix, or defer.
- The Phase-6 entry gate suite
crates/archetype-core/tests/migration_gate.rspasses. That suite is the normative contract matrix: each test cites its Python counterpart and gaps are marked#[ignore]with rationale.
4. Phase 2: Rust Workspace¶
Intent: introduce Rust without changing Python behavior.
Tasks¶
- Add a root
Cargo.tomlworkspace. - Add
crates/archetype-core. - Add
crates/archetype-parquet. - Add
crates/archetype-ffi. - Add
cargo test --workspaceto local validation docs and eventually CI.
Crate Boundaries¶
| Crate | Responsibility |
|---|---|
archetype-core |
Arrow schemas, world state, mutation materialization, traits |
archetype-parquet |
Append-only local Parquet Store implementation |
archetype-ffi |
C ABI over Arrow C Data Interface |
Acceptance¶
cargo test --workspacepasses.- Python tests still pass without importing Rust.
5. Phase 3: Arrow-Native Core¶
Intent: move the actual engine semantics into Rust.
Tasks¶
- Define base columns:
world_id,run_id,entity_id,tick,is_active. - Define component schemas as caller-provided Arrow schemas.
- Define archetype table descriptors as caller-provided table name plus Arrow schema.
- Validate no missing required base columns after composition.
- Implement
WorldState. - Implement
MutationBuffer. - Implement spawn queueing from Arrow batches.
- Implement despawn queueing.
- Implement materialization over prior tick Arrow batches.
- Return explicit errors for schema mismatch, missing columns, invalid types, and failed appends.
Non-Goals¶
- No table-name hashing.
- No Python class lookup.
- No LanceDB.
- No Daft processor execution in Rust.
Acceptance¶
- Rust tests cover spawn, duplicate spawn overwrite, despawn, empty prior batch, and metadata stamping.
- Rust materialization returns Arrow
RecordBatchvalues that Python can ingest through Arrow C.
6. Phase 4: Append-Only Parquet Store¶
Intent: provide a simple native durable backend for the Rust core.
Layout¶
<root>/<namespace>/<table_name>/part-<uuid>.parquet
Tasks¶
- Implement
Store::append. - Implement
Store::read_table. - Implement filter application for
world_id,run_id,tick,entity_id, andactive_only. - Keep writes append-only; never rewrite existing files.
- Defer predicate pushdown until correctness is stable.
Acceptance¶
- Appending twice produces two files.
- Reading a table returns the concatenation of all parts.
- Filters match the Python async store semantics.
7. Phase 5: Arrow C Data Interface¶
Intent: make the native boundary stable and Daft-compatible.
Tasks¶
- In
archetype-ffi, define exported C ABI functions overArrowSchemaandArrowArray. - Convert
ArrowSchema/ArrowArraytoarrow-rsSchemaRefandRecordBatch. - Convert returned Rust
RecordBatchvalues back toArrowArray. - Define ownership rules for every pointer crossing the boundary.
- Add a memory-safety test harness.
Phase 5's split-step ownership, error-code, panic, and partial-failure contract
is specified in docs/design/split-step-ffi.md.
Acceptance¶
- A caller can pass an Arrow batch into Rust and receive an Arrow batch back without Python object serialization.
- FFI functions never panic across the boundary.
8. Phase 6: Python Adapter¶
Intent: route existing Python async core operations through Rust while preserving Python APIs.
The normative ABI for this phase is specified in
docs/design/split-step-ffi.md: Rust owns
materialization, stamping, persistence, and live snapshots; Python processors
execute between arct_step_begin and arct_step_commit.
Tasks¶
- Add an internal adapter under
src/archetype/core/native/. - Convert Daft/PyArrow batches to Arrow C.
- Call
archetype-ffi. - Convert returned Arrow C data back into PyArrow/Daft.
- Gate native engine usage behind an explicit feature flag or config setting.
Phase 6's adapter shape, native fallback behavior, and parity/benchmark gate are
specified in docs/design/split-step-ffi.md.
Phase-6 Entry Gate¶
Before any Phase-6 Python-adapter work begins, the migration-gate suite
crates/archetype-core/tests/migration_gate.rs must pass in full. The suite
encodes every behavioral contract the Python async reference pins:
| Contract | Test | Python counterpart |
|---|---|---|
| x₀ raw at spawn tick / f next tick (first spawn) | contract1a_x0_raw_at_spawn_tick_f_applied_next_tick |
test_initial_conditions_persist_at_spawn_tick |
| x₀ raw at spawn tick / f next tick (mid-run) | contract1b_mid_run_spawn_lands_raw_older_entities_keep_advancing |
test_mid_run_spawn_persists_initial_conditions |
| Same-tick spawn-cancel: no tombstone, no active row | contract2_same_tick_spawn_cancel_no_tombstone_no_active_row |
AsyncWorld.remove_entity same-tick cancel semantics |
| Spawn order deterministic (first-seen, last-write-wins) | contract3_spawn_order_deterministic_first_seen_last_write_wins |
test_duplicate_spawn_same_entity_overwrites |
| Despawn tombstone at current tick; active_only excludes it | contract4_despawn_tombstone_at_current_tick_active_only_excludes_it |
despawn_marks_prior_row_inactive / live_snapshot_after_despawn_excludes_inactive_entities |
| Update/overlay semantics | contract5_update_overlay_semantics_gap (#[ignore] — not yet implemented) |
AsyncWorld.update_entity |
| Metadata stamping (world/run/tick) on every row | contract6_metadata_stamped_on_every_persisted_row |
async_world_persists_world_run_and_tick_metadata |
The archetype-bench crate also includes a mid_run_spawn scenario that
exercises the tick-zero-correct spawn ordering for entities arriving at
different ticks. Its correctness output ("correct": true) must hold before
Phase-6 work proceeds.
Acceptance¶
- The Phase-6 entry gate suite (
crates/archetype-core/tests/migration_gate.rs) passes:cargo test --package archetype-core --test migration_gateis green. cargo run --package archetype-bench --bin mid_run_spawnreports"correct": true.- Existing async world tests pass with native mode off.
- A narrow native-mode test passes for spawn/materialize/update.
- Failures surface as Python exceptions, not logged-only errors.
9. Phase 7: Incremental Migration¶
Intent: migrate behavior in dependency order.
Order¶
- Schema composition and metadata validation.
- Spawn/despawn materialization.
- Update materialization.
- Add/remove component migration.
- Query filtering.
- Parquet append/read.
- Runtime integration.
Each step must leave the Python public surface stable.
Acceptance¶
- At each migration point, Python and Rust contract tests both pass.
- The app/runtime layers still call through
CommandService.
10. Phase 8: Retirement¶
Intent: remove duplicated Python semantics only after Rust parity is proven.
Tasks¶
- Replace Python async core state-machine logic with native calls.
- Keep Python processors and hooks as adapter-level features.
- Remove or freeze sync-core duplication.
- Keep Python fallback behind a temporary compatibility switch until release.
Acceptance¶
- No semantic drift remains between sync and async implementations because there is only one native engine path.
- Rust is the normative core implementation.
11. Phase 9: Optional Processor Acceleration¶
Intent: leave room for GPU work without contaminating core storage/state.
Candidate Work¶
- Native Rust processors over Arrow arrays.
- Dense tensor component views for numeric columns.
cuTile-rsexperiments for fused numeric processors.
Rule¶
GPU acceleration must be an optional processor backend. It must not own world state, append-only history, table naming, or governance.
12. First Milestone¶
The first implementation milestone was:
Given a caller-supplied table name, an Arrow schema, a prior tick Arrow batch, and queued spawn/despawn mutations, Rust produces the next tick Arrow batch and can append/read it through an append-only Parquet store.
This proves the kernel before touching Python runtime behavior.
The next implementation milestone is:
Given a single Arrow batch from Python, Rust executes the shared movement processor through the Arrow C Data Interface with stable errors, then the same processor is reused by the Rust benchmark world.
Acceptance:
cargo test --workspacepasses.cargo clippy --workspace --all-targets -- -D warningspasses.uv run pytest tests/core/test_native_movement_adapter.py tests/bench/test_movement_compare.pypasses.- Missing columns, wrong dtypes, multi-batch tables, missing libraries, and ABI mismatch produce explicit Python failures.
- The benchmark JSON schema stays stable.