Rust core plan

Document type: Implementation plan. Scope: crates/, src/archetype/core/aio/, and the Python adapter layer that will bridge them.

This plan describes the migration from the current Python async core prototype to a Rust core engine built on arrow-rs, tokio, the Arrow C Data Interface, and an append-only Parquet store. It is an HTN-style plan: each phase decomposes into ordered tasks, explicit products, acceptance checks, and dependencies.

Current Definition of Done¶

The next feature is not "rewrite Archetype in Rust." The feature is:

A Rust Arrow/Parquet tick kernel, callable from the existing Python runtime boundary, passing core parity tests against the Python async world, with benchmarks showing reduced per-tick overhead without taking over app, runtime, auth, audit, or broker semantics.

Rust Owns¶

Arrow schema composition after Python supplies component schemas.
Archetype table descriptors after Python supplies table names.
Spawn, despawn, update, component migration, and tick materialization.
Active live snapshots below the service layer.
Processor scheduling for native processors.
Local append-only Parquet storage for the benchmark/control backend.
Arrow C Data Interface import/export and ABI diagnostics.

Python Owns¶

ArchetypeRuntime, service container, API, CLI, and docs examples.
RBAC, quotas, audit emission, command broker ordering, and command policy.
Python component classes and table-name hashing.
Python processors, hooks, resources, and object lifetimes.
Daft/Iceberg/LanceDB production paths until a later storage boundary is explicitly accepted.

Non-Goals for This Feature¶

No Rust ownership of auth, audit, command routing, or runtime handles.
No Iceberg catalog/transaction implementation.
No LanceDB implementation.
No DataFusion query planner unless parity proves simple Parquet scans are the bottleneck.
No cuTile-rs integration inside the engine. GPU work is a processor backend experiment, not world-state ownership.
No PyO3 requirement. Arrow C Data remains the native boundary unless packaging forces a separate decision.

0. Constraints¶

The following decisions are fixed for this migration:

The async Python implementation is the behavioral reference.
A component is an Arrow schema. Rust does not model Python component classes as the canonical component identity.
Rust does not own table-name hashing. Table names are supplied by the caller, preserving the current Python naming policy until a separate policy change is made.
Data crosses the native boundary through the Arrow C Data Interface.
Lance storage is out of scope for the first Rust engine. The first native store is append-only Parquet.
Python remains the runtime, API, CLI, auth, audit, and beginner-facing surface.
cuTile-rs is not part of the core engine plan. It remains a possible future processor accelerator for dense numeric component columns.

Daft's native extension authoring guide is the reference for the boundary shape: use ArrowSchema and ArrowArray as the stable ABI, and convert into arrow-rs types inside Rust.

1. Target Architecture¶

Python runtime / app / API / CLI
        |
        | Arrow C Data Interface
        v
crates/archetype-ffi
        |
        v
crates/archetype-core
  - Arrow schema composition
  - world state machine
  - mutation buffers
  - tick materialization
  - query/update/store traits
        |
        v
crates/archetype-parquet
  - append-only local Parquet store

The Python service layer continues to enforce governance and audit. Rust owns the engine invariants below the service layer.

2. HTN Root Task¶

Task: make Rust the canonical implementation of Archetype's core engine semantics without breaking the Python public surface.

The root decomposes into these phases:

Preserve and name current async semantics.
Add the Rust workspace and core crates.
Implement Arrow-native schema and world primitives.
Implement append-only Parquet storage.
Add the Arrow C Data Interface boundary.
Bridge Python to Rust behind the existing async core API.
Migrate runtime/service paths incrementally.
Retire duplicated Python core semantics.
Explore optional native processor acceleration.

2.1 Execution HTN¶

The executable task network is intentionally narrower than the long-term migration. Each leaf has a concrete artifact and can run when its dependencies are satisfied.

R0 Define done for Rust backend
├── R0.1 Record ownership boundary
├── R0.2 Record non-goals
└── R0.3 Record crate split

R1 Harden native processor ABI
├── R1.1 Add ABI error diagnostics
├── R1.2 Add Python adapter error surfacing
├── R1.3 Add negative Arrow C tests
└── R1.4 Document build/load path

R2 Move movement into shared Rust kernel
├── R2.1 Add movement processor module in archetype-core
├── R2.2 Reuse it from archetype-ffi
├── R2.3 Reuse it from archetype-bench
└── R2.4 Keep benchmark output stable

R3 Establish parity harness
├── R3.1 Mark Python async tests that define engine semantics
├── R3.2 Add Rust parity tests for materialization edge cases
├── R3.3 Add native-mode Python tests through the service boundary
└── R3.4 Keep native mode opt-in

R4 Benchmark the hot loop
├── R4.1 Validate final DataFrames match by backend
├── R4.2 Split timing into plan/query/process/persist phases
├── R4.3 Compare Daft live-read, Daft cached-read, and Rust Parquet
└── R4.4 Publish reproducible JSON fixtures

R5 Decide storage graduation
├── R5.1 Stay local Parquet if tick-loop overhead is already solved
├── R5.2 Add object_store only for remote/local abstraction
├── R5.3 Add DataFusion only for Rust-owned query planning
└── R5.4 Add Iceberg only for Rust-owned table transactions

Parallel Leaves¶

R0 can run immediately and is documentation-only.
R1.3 can run in parallel with R2 because adapter tests define behavior without changing the Rust kernel shape.
R2.1 through R2.3 are sequential inside one Rust ownership lane.
R3 depends on R1 and R2 because parity tests need the hardened ABI and shared movement kernel.
R4 depends on R3.1 for correctness checks, but timing collection can evolve in parallel after the benchmark schema is stable.
R5 is a decision gate only. It must not add crates before R4 shows the actual bottleneck.

Crate Admission Rules¶

The current crates stay minimal:

Crate	Responsibility
`archetype-core`	Engine invariants, Arrow schemas, materialization, processor traits
`archetype-parquet`	Local append-only Parquet `Store` implementation
`archetype-ffi`	C ABI and Arrow C Data Interface
`archetype-bench`	Benchmark binaries only

New crates require a concrete boundary:

Candidate crate	Admission trigger
`archetype-object-store`	Local filesystem is no longer enough for storage tests or deployment
`archetype-datafusion`	Rust must own query planning, expressions, or predicate pushdown
`archetype-iceberg`	Rust must own Iceberg commits, catalogs, snapshots, or transactions
`archetype-pyo3`	C ABI is insufficient for packaging or lifecycle management

Dependencies should not leak upward into archetype-core; the core crate should stay free of storage engine, Python, and catalog policy.

Execution Status¶

Node	Status	Notes
`R0`	Done	The ownership boundary, non-goals, crate split, and definition of done are recorded in this document.
`R1`	Done for movement ABI	The FFI boundary reports thread-local last errors. The dedicated movement ABI now treats missing required columns as errors.
`R2`	Done	Movement is implemented once in `archetype-core` and reused from FFI and benchmark paths.
`R3`	Kernel done, service bridge pending	Rust core parity tests cover spawn, despawn, reserved IDs, metadata, live snapshots, filters, and component-table migration. Python service native-mode parity remains pending because the runtime adapter is not implemented yet.
`R4`	Done for movement envelope	Movement benchmark records correctness for both backends. Rust reports setup, read-prior, materialize, process, append, live-snapshot, profiled tick, query, and total phases; Python reports the existing setup/run/query phases.
`R5`	Done as decision gate	No new storage/query crates are admitted yet. Local Parquet remains the control backend until benchmarks prove a storage/planner bottleneck.

Decisions Made During Execution¶

Dedicated processor ABI functions are strict. arct_movement_process validates its required columns before scheduling through NativeSystem; missing columns are caller errors, not silent skips.
Generic system scheduling remains permissive. A processor registered with NativeSystem still skips batches that do not contain its required columns, matching the archetype-subset scheduling model.
ABI diagnostics are thread-local strings exposed through arct_last_error_message(). The primary ABI still returns integer status codes so C callers stay simple.
Benchmarks now carry correctness fields. Timing claims are not considered valid unless the final row count and final position sums match the expected movement model.
Rust world profiling belongs in the core executor, not in benchmark-only wrappers. step_profiled() keeps step() behavior intact while exposing the phase timings needed to analyze the hot loop.
Component migration is represented as a caller-supplied new table plus caller-supplied component batch. Rust preserves entity IDs and materializes the old table tombstone and new table active row; Python still owns deciding table names and component schemas.
No object_store, DataFusion, Iceberg, PyO3, or cuTile-rs crate was added. The current bottleneck must be demonstrated before widening dependencies.

3. Phase 1: Preserve Current Async Semantics¶

Intent: prevent the rewrite from changing behavior accidentally.

Tasks¶

Inventory the async core contracts from src/archetype/core/aio/.
Add contract tests for any behavior currently covered only implicitly.
Mark sync/async divergences as migration risks, not Rust requirements.
Decide which current behaviors are bugs before porting them.

Required Contracts¶

AsyncWorld.step() lifecycle: pre-tick hook, query previous tick, materialize mutations, execute processors, persist, increment tick, post-tick hook.
active_signatures is the union of registered entity signatures, pending spawn tables, and pending despawn tables.
Spawn materialization deduplicates same-entity rows with last-write-wins.
Despawn materialization marks previous rows inactive; it does not delete rows.
update_entity overlays values without changing archetype/table identity.
add_components and remove_components migrate an entity between tables by appending an inactive row to the old table and an active row to the new table.
Store writes are append-only.
Persistence failures must become observable. The current Python updater logs and returns a stamped DataFrame; Rust must return an error.

Acceptance¶

A contract matrix exists mapping Python tests to Rust test cases.
Known semantic divergences are documented with a decision: preserve, fix, or defer.
The Phase-6 entry gate suite crates/archetype-core/tests/migration_gate.rs passes. That suite is the normative contract matrix: each test cites its Python counterpart and gaps are marked #[ignore] with rationale.

4. Phase 2: Rust Workspace¶

Intent: introduce Rust without changing Python behavior.

Tasks¶

Add a root Cargo.toml workspace.
Add crates/archetype-core.
Add crates/archetype-parquet.
Add crates/archetype-ffi.
Add cargo test --workspace to local validation docs and eventually CI.

Crate Boundaries¶

Crate	Responsibility
`archetype-core`	Arrow schemas, world state, mutation materialization, traits
`archetype-parquet`	Append-only local Parquet `Store` implementation
`archetype-ffi`	C ABI over Arrow C Data Interface

Acceptance¶

cargo test --workspace passes.
Python tests still pass without importing Rust.

5. Phase 3: Arrow-Native Core¶

Intent: move the actual engine semantics into Rust.

Tasks¶

Define base columns: world_id, run_id, entity_id, tick, is_active.
Define component schemas as caller-provided Arrow schemas.
Define archetype table descriptors as caller-provided table name plus Arrow schema.
Validate no missing required base columns after composition.
Implement WorldState.
Implement MutationBuffer.
Implement spawn queueing from Arrow batches.
Implement despawn queueing.
Implement materialization over prior tick Arrow batches.
Return explicit errors for schema mismatch, missing columns, invalid types, and failed appends.

Non-Goals¶

No table-name hashing.
No Python class lookup.
No LanceDB.
No Daft processor execution in Rust.

Acceptance¶

Rust tests cover spawn, duplicate spawn overwrite, despawn, empty prior batch, and metadata stamping.
Rust materialization returns Arrow RecordBatch values that Python can ingest through Arrow C.

6. Phase 4: Append-Only Parquet Store¶

Intent: provide a simple native durable backend for the Rust core.

Layout¶

<root>/<namespace>/<table_name>/part-<uuid>.parquet

Tasks¶

Implement Store::append.
Implement Store::read_table.
Implement filter application for world_id, run_id, tick, entity_id, and active_only.
Keep writes append-only; never rewrite existing files.
Defer predicate pushdown until correctness is stable.

Acceptance¶

Appending twice produces two files.
Reading a table returns the concatenation of all parts.
Filters match the Python async store semantics.

7. Phase 5: Arrow C Data Interface¶

Intent: make the native boundary stable and Daft-compatible.

Tasks¶

In archetype-ffi, define exported C ABI functions over ArrowSchema and ArrowArray.
Convert ArrowSchema/ArrowArray to arrow-rs SchemaRef and RecordBatch.
Convert returned Rust RecordBatch values back to ArrowArray.
Define ownership rules for every pointer crossing the boundary.
Add a memory-safety test harness.

Phase 5's split-step ownership, error-code, panic, and partial-failure contract is specified in docs/design/split-step-ffi.md.

Acceptance¶

A caller can pass an Arrow batch into Rust and receive an Arrow batch back without Python object serialization.
FFI functions never panic across the boundary.

8. Phase 6: Python Adapter¶

Intent: route existing Python async core operations through Rust while preserving Python APIs.

The normative ABI for this phase is specified in docs/design/split-step-ffi.md: Rust owns materialization, stamping, persistence, and live snapshots; Python processors execute between arct_step_begin and arct_step_commit.

Tasks¶

Add an internal adapter under src/archetype/core/native/.
Convert Daft/PyArrow batches to Arrow C.
Call archetype-ffi.
Convert returned Arrow C data back into PyArrow/Daft.
Gate native engine usage behind an explicit feature flag or config setting.

Phase 6's adapter shape, native fallback behavior, and parity/benchmark gate are specified in docs/design/split-step-ffi.md.

Phase-6 Entry Gate¶

Before any Phase-6 Python-adapter work begins, the migration-gate suite crates/archetype-core/tests/migration_gate.rs must pass in full. The suite encodes every behavioral contract the Python async reference pins:

Contract	Test	Python counterpart
x₀ raw at spawn tick / f next tick (first spawn)	`contract1a_x0_raw_at_spawn_tick_f_applied_next_tick`	`test_initial_conditions_persist_at_spawn_tick`
x₀ raw at spawn tick / f next tick (mid-run)	`contract1b_mid_run_spawn_lands_raw_older_entities_keep_advancing`	`test_mid_run_spawn_persists_initial_conditions`
Same-tick spawn-cancel: no tombstone, no active row	`contract2_same_tick_spawn_cancel_no_tombstone_no_active_row`	`AsyncWorld.remove_entity` same-tick cancel semantics
Spawn order deterministic (first-seen, last-write-wins)	`contract3_spawn_order_deterministic_first_seen_last_write_wins`	`test_duplicate_spawn_same_entity_overwrites`
Despawn tombstone at current tick; active_only excludes it	`contract4_despawn_tombstone_at_current_tick_active_only_excludes_it`	`despawn_marks_prior_row_inactive` / `live_snapshot_after_despawn_excludes_inactive_entities`
Update/overlay semantics	`contract5_update_overlay_semantics_gap` (`#[ignore]` — not yet implemented)	`AsyncWorld.update_entity`
Metadata stamping (world/run/tick) on every row	`contract6_metadata_stamped_on_every_persisted_row`	`async_world_persists_world_run_and_tick_metadata`

The archetype-bench crate also includes a mid_run_spawn scenario that exercises the tick-zero-correct spawn ordering for entities arriving at different ticks. Its correctness output ("correct": true) must hold before Phase-6 work proceeds.

Acceptance¶

The Phase-6 entry gate suite (crates/archetype-core/tests/migration_gate.rs) passes: cargo test --package archetype-core --test migration_gate is green.
cargo run --package archetype-bench --bin mid_run_spawn reports "correct": true.
Existing async world tests pass with native mode off.
A narrow native-mode test passes for spawn/materialize/update.
Failures surface as Python exceptions, not logged-only errors.

9. Phase 7: Incremental Migration¶

Intent: migrate behavior in dependency order.

Order¶

Schema composition and metadata validation.
Spawn/despawn materialization.
Update materialization.
Add/remove component migration.
Query filtering.
Parquet append/read.
Runtime integration.

Each step must leave the Python public surface stable.

Acceptance¶

At each migration point, Python and Rust contract tests both pass.
The app/runtime layers still call through CommandService.

10. Phase 8: Retirement¶

Intent: remove duplicated Python semantics only after Rust parity is proven.

Tasks¶

Replace Python async core state-machine logic with native calls.
Keep Python processors and hooks as adapter-level features.
Remove or freeze sync-core duplication.
Keep Python fallback behind a temporary compatibility switch until release.

Acceptance¶

No semantic drift remains between sync and async implementations because there is only one native engine path.
Rust is the normative core implementation.

11. Phase 9: Optional Processor Acceleration¶

Intent: leave room for GPU work without contaminating core storage/state.

Candidate Work¶

Native Rust processors over Arrow arrays.
Dense tensor component views for numeric columns.
cuTile-rs experiments for fused numeric processors.

Rule¶

GPU acceleration must be an optional processor backend. It must not own world state, append-only history, table naming, or governance.

12. First Milestone¶

The first implementation milestone was:

Given a caller-supplied table name, an Arrow schema, a prior tick Arrow batch, and queued spawn/despawn mutations, Rust produces the next tick Arrow batch and can append/read it through an append-only Parquet store.

This proves the kernel before touching Python runtime behavior.

The next implementation milestone is:

Given a single Arrow batch from Python, Rust executes the shared movement processor through the Arrow C Data Interface with stable errors, then the same processor is reused by the Rust benchmark world.

Acceptance:

cargo test --workspace passes.
cargo clippy --workspace --all-targets -- -D warnings passes.
uv run pytest tests/core/test_native_movement_adapter.py tests/bench/test_movement_compare.py passes.
Missing columns, wrong dtypes, multi-batch tables, missing libraries, and ABI mismatch produce explicit Python failures.
The benchmark JSON schema stays stable.