soma — a fractal language for verified distributed systems

A fractal cell language where the agent's lifecycle is a state machine the compiler proves terminates. State machines are proofs. Infrastructure is a constraint. Pipelines are pipes.

cell agent + state machine + set_budget — protocol verified at compile time, LLM loop bounded at runtime

$ curl -fsSL https://soma-lang.dev/install.sh | sh
cell Fund {
    memory {
        positions: Map<String, Position> [persistent, consistent]   // → SQLite. Automatically.
        prices:    Map<String, Float>    [ephemeral, ttl(30s)]     // → in-memory, auto-expire
    }

    state regime {
        initial: neutral
        neutral → risk_on
        neutral → risk_off
        * → crisis
        crisis → neutral
    }

    every 1h { rebalance() }

    on rebalance() {
        let portfolio = universe()
            |> map(s => score(s))
            |> filter(s => s.composite > 50)
            |> sort_by("composite", "desc")
            |> top(15)

        for stock in portfolio { execute(stock) }
    }

    on request(method: String, path: String, body: String) {
        match path {
            "/"html(dashboard())
            "/api/risk" → risk_report()
            _           → response(404, map("error", "not found"))
        }
    }
}

An agent whose lifecycle is formally verified

Every agent framework has the same problem: agents get stuck in loops, call tools in wrong order, never terminate. Soma is the only language where the agent's lifecycle is a state machine the compiler can prove terminates — combined with a hard set_budget token cap, every think() call is bounded by construction. Verification covers the protocol; set_budget covers the LLM loop.

The agent — 30 lines

cell agent Researcher {
    face {
        signal research(topic: String) -> Map
        tool search(q: String) -> String
        tool summarize(text: String) -> String
    }

    state workflow {
        initial: idle
        idle -> researching
        researching -> analyzing
        analyzing -> done
        * -> failed
    }

    on research(topic: String) {
        set_budget(5000)                          // hard token cap
        transition("task", "researching")
        let data = think("Research: {topic}")

        transition("task", "analyzing")
        let summary = think("Synthesize: {data}")

        transition("task", "done")
        map("summary", summary, "tokens", tokens_used())
    }
}

The proof — 0 lines of test code

$ soma verify agent.cell

State machine 'workflow': 5 states, initial 'idle'
  States: [analyzing, done, failed, idle, researching]

   5 states, 7 transitions
   all states reachable from 'idle'
   terminal states: [failed]
   no deadlocks
   liveness: every state can eventually reach a terminal state
   wildcard transitions: * -> [failed]

  Temporal properties for 'workflow':
   deadlock_free
   eventually(state in [done, failed])
   after('researching', state in [analyzing, failed])
   after('analyzing', state in [done, failed])

Temporal: 4 passed, 0 failed

// The compiler PROVED this lifecycle
// always terminates. set_budget(N) caps
// the LLM loop inside it. Together: a
// bounded, terminating agent.

cell agent — an AI agent is a cell. think() — calls any OpenAI-compatible LLM and runs the tool-calling loop with retries. tool — declares what the LLM is allowed to call (handler-backed). state — the lifecycle protocol, model-checked at compile time. The proof is over the lifecycle, not over the LLM's tokens; set_budget bounds the latter.

Multi-agent pipelines with verified handoffs

cell agent Pipeline {
    state pipeline {
        initial: idle
        idle -> researching -> writing -> reviewing -> done
        * -> failed
    }

    on run(topic: String) {
        transition("job", "researching")
        let facts = think("Research '{topic}'. List 5 key facts.")

        transition("job", "writing")
        let article = think("Write an article from: {facts}")

        transition("job", "reviewing")
        let review = think("Review for accuracy: {article}")

        transition("job", "done")
        map("article", article, "review", review)
    }
}

// soma verify PROVES, on the state machine:
//   - every reachable state has a path to done or failed (eventually)
//   - no deadlocks
// What it does NOT prove: that the handler body calls transition() in
// the right order. That's still on you (and on `soma check`, which
// catches missing handlers, contradictory properties, signal mismatches).
LangChainCrewAIAutoGenSoma
Lifecycle proven to terminateNoNoNoYes (CTL on the state machine)
Illegal tool order rejected at compile timeNoNoNoState machine + face contract
Token budget enforced as a hard capNoNoNoYes (set_budget)
Verified handoffsNoNoNoSignal types + state machines
Agent memory persistencePluginPluginPluginBuilt-in: [persistent]
DistributionNoNoNosoma serve --join

The problem

A quantitative fund today needs Python + Pandas + Redis + PostgreSQL + Kafka + Flask + React + Celery + Airflow + Terraform. Ten tools, five languages, three teams. Fifty thousand lines of glue code. The specification lives in Confluence. The code lives somewhere else. They drift apart.

An AI agent can write a function. Maybe a file. But it cannot write a system — services that coordinate, maintain state, handle failures, and evolve. The gap is not intelligence. The gap is that programming languages separate intent from infrastructure, specification from implementation, contract from code.

Soma closes that gap.

Thesis I — after Lamport

The specification is the program

In Soma, you don't write specs then implement them. The face contract is the API. The state machine is the protocol. The memory properties are the infrastructure requirements. The compiler checks all three. The same artifact that describes what the system should do is the artifact that runs.

Everyone thinks they mean the same thing by "specification" until they try to make it precise.

— Leslie Lamport, Turing Lecture 2014
Thesis II — after Hellerstein

Infrastructure is a constraint, not a configuration

[persistent, consistent] resolves to SQLite. [ephemeral] resolves to in-memory. The program never names a database. It declares what it needs. The runtime resolves how. The provider protocol is extensible — custom backends can be added as .cell files.

memory {
    orders:   Map<String, Order>   [persistent, consistent]     // → SQLite
    cache:    Map<String, Price>   [ephemeral, local]          // → in-memory HashMap
    audit:    Log<Event>           [persistent]               // → SQLite (WAL mode)
}

The future of data management lies in the declarative specification of what is wanted, not the procedural specification of how to get it.

— Joseph Hellerstein, The Declarative Imperative, 2010
Thesis III — after Kay

Agents build systems, not functions

Soma's cell is fractal. The same structure — face, memory, state, handlers — works at every scale. A function is a cell. A service is a cell. An AI agent doesn't need to learn "how to deploy a distributed system." It needs to learn one thing: the cell. Independent cells communicate via the TCP signal bus, configured in soma.toml.

// Two independent programs, connected via soma.toml

// exchange/app.cell
cell Exchange {
    on order(data: Map) { place_order(data) }
    every 500ms { signal trade(match_orders()) }
}

// trader/app.cell
cell Trader {
    every 3s { signal order(quote) }
    on trade(fill: Map) { record_fill(fill) }
}

// trader/soma.toml: [peers] exchange = "localhost:8082"

The best way to predict the future is to invent it.

— Alan Kay, Turing Award 2003

Contracts are enforced

The face section is not documentation. It is a machine-checked contract. The compiler verifies it. Break the contract — the program does not compile.

cell API {
    face {
        signal create(name: String) -> Map
        signal delete(id: String)            // ← declared but no handler
        promise all_persistent
    }
    on create(name: String) { return map("name", name) }
}
error: face contract: signal 'delete' declared in cell 'API' has no handler
→ Every signal in the face MUST have a matching on handler.
→ Param counts are verified. Contradictory properties are rejected.
→ Descriptive promises generate warnings. Structural promises are enforced.

This is what separates Soma from every other dynamic language. The face is a structural contract: it declares what signals the cell handles, and the compiler verifies the cell implements them. Signal existence, parameter counts, property contradictions, and structural promises are all checked at compile time. Return types and runtime invariants are not yet verified.

Formal verification

soma verify is a model checker. It exhaustively explores every reachable state in your state machine and proves temporal properties. Not testing — proving.

The code — app.cell

state order {
    initial: pending
    pending → validated
    validated → sent
    sent → filled
    sent → rejected
    filled → settled
    * → cancelled
}

The spec — soma.toml

[verify]
deadlock_free = true
eventually = ["settled", "cancelled"]
never = ["invalid"]

[verify.after.sent]
eventually = ["filled", "rejected"]

[verify.after.pending]
eventually = ["validated", "cancelled"]
$ soma verify app.cell

State machine 'order': 7 states, initial 'pending'
  States: [cancelled, filled, pending, rejected, sent, settled, validated]

   7 states, 11 transitions
   all states reachable from 'pending'
   terminal states: [cancelled]
   no deadlocks
   liveness: every state can eventually reach a terminal state
   validated -> sent has no guard (consider adding a guard condition)
   wildcard transitions: * -> [cancelled]

  Temporal properties for 'order':
   deadlock_free — no deadlocks in any reachable state
   eventually(state in [settled, cancelled])
   never(state == 'invalid')
   after('pending', state in [validated, cancelled])
   after('sent', state in [filled, rejected]) — after reaching 'sent', predicate can be avoided
    counter-example: pending → validated → sent → cancelled
    → The wildcard * → cancelled bypasses the expected filled/rejected

The model checker found a real bug: after sent, the spec requires the order to reach filled or rejected. But the wildcard * → cancelled allows skipping directly to cancelled — violating the spec. The counter-example shows the exact execution path.

This is an explicit-state CTL model checker over the state-machine graph. It exhaustively enumerates reachable states — not data values, not guard expressions. For state machines with a few dozen states, verification runs in milliseconds and is complete over that abstraction. The spec lives in soma.toml, the implementation lives in .cell, and the proof is one command: soma verify. What it cannot prove: anything that depends on guard predicates over runtime data, or anything inside a handler body that doesn't show up as a transition.

PropertyMeaningCTL equivalent
deadlock_freeNo reachable state has zero exitsAG(EX true)
eventually = ["X"]All paths reach XAF(X)
never = ["X"]X is never reachedAG(¬X)
after.S.eventually = ["X"]After S, all paths reach XAG(S → AF(X))

Intellectual foundations

Soma is not invented from nothing. It stands on decades of research in programming language theory, distributed systems, and formal methods.

The Actor Model

Cells are actors with contracts

Hewitt's Actor model (1973) introduced the idea that computation is message passing between autonomous agents. Soma cells are actors: they have private state (memory), respond to messages (signals), and can spawn children (interior). What Soma adds: typed contracts on the messages (face), and declarative infrastructure on the state (properties).

One Actor can send messages, create new Actors, and determine how to handle the next message it receives.

— Carl Hewitt, Peter Bishop, Richard Steiger. A Universal Modular ACTOR Formalism. 1973
Specification as Program

TLA+ made executable

Lamport showed that distributed systems need formal specifications. But specifications lived in separate documents, drifting from code. In Soma, the state machine IS the protocol. The face contract IS the API. The promise IS the invariant. There is no drift because there is no separation.

If you're thinking without writing, you only think you're thinking.

— Leslie Lamport, Turing Award 2013
Declarative Infrastructure

What, not how

Codd's relational model (1970) proved that declaring what you want beats specifying how to get it. SQL replaced procedural data access. Soma applies the same principle to infrastructure: [persistent, consistent] declares the intent. The runtime resolves the backend — today, that means SQLite for [persistent] and an in-memory HashMap for [ephemeral]. The provider protocol is extensible: new backends are cell backend declarations under stdlib/, so a Redis or Postgres backend is added the same way SQLite was.

Future users of large data banks must be protected from having to know how the data is organized in the machine.

— E.F. Codd, A Relational Model of Data for Large Shared Data Banks. 1970
Self-Growing Systems

The language extends itself

Kay's vision for Smalltalk was a system where everything — including the language itself — could be modified from within. In Soma, properties, checkers, and backends are defined as .cell files, not compiler code. Add a new property? Write a cell. Add a new storage backend? Write a cell. The compiler reads them and enforces them.

The best way to predict the future is to invent it.

— Alan Kay, Turing Award 2003
Pipe Composition

Data flows, not data structures

Thompson and McIlroy's Unix pipes (1973) showed that complex programs are built by composing simple ones. Soma's |> operator is the same idea applied to data: filter |> map |> sort |> top. Each stage is a pure transform. The pipeline reads like a sentence.

Write programs that do one thing and do it well. Write programs to work together.

— Doug McIlroy, Unix Philosophy. 1978

Errors that help

Inspired by Elm and Rust. Every error shows the source line, points to the exact expression, and suggests a fix.

error: expected '=', found number '10'
  --> app.cell:3:15
  |
3 |         let x 10
  |               ^

error: cannot add String and Int: hello + 5
  --> app.cell:7:17
  |
7 |         let r = "hello" + 5
  |                 ^

error: undefined function: lenght (did you mean 'len'?)
  --> app.cell:4:9

error: face contract: signal 'delete' declared in cell 'API' has no handler
  → The face is a contract. Break it, and the compiler tells you.

Auto-repair with soma fix

The compiler doesn't just report errors — it fixes them. soma fix reads the errors, computes repairs, and writes them back to the source file.

$ soma fix app.cell
   removed contradictory property: ephemeral on slot 'items'
   added handler: on delete(id: String) { ... }
   added handler: on update(id: String, data: Map) { ... }
   3 fix(es) applied, re-checking...
   All checks passed.

Lint for anti-patterns

$ soma lint app.cell
   line 7: redundant to_json() — storage auto-serializes maps
   line 11: unchecked .get() — may return ()
   line 15: consider using match instead of if-chain (3 branches)

Built for AI agents

Soma is designed around the generate → fix → check → verify → serve loop. Every tool outputs structured JSON (--json). soma fix repairs trivial errors agents commonly produce: missing handlers referenced from face, contradictory property combinations, common builtin typos. Storage auto-serializes maps and lists. The language is intentionally regular so the compiler can give specific repairs.

soma fix

Auto-repairs missing handlers, contradictory properties. One command.

soma lint

Catches redundant to_json, unchecked .get(), if-chains that should be match.

soma describe

Rich JSON: handler signatures, memory schema, state machines, face contracts.

Pattern matching that writes itself

Map destructuring, string prefix, guard clauses, range patterns, or-patterns — all composable. Agents generate correct HTTP routing on the first try.

on request(method: String, path: String, body: String) {
    let req = map("method", method, "path", path)
    match req {
        {method: "GET", path: "/"}                     -> home()
        {method: "GET", path: "/api/" + resource}      -> list(resource)
        {method: "POST", path: "/api/" + resource}     -> create(resource, body)
        {method: "DELETE", path: "/api/" + resource}   -> delete(resource)
        {method} if method == "OPTIONS"              -> cors()
        _                                            -> response(404, map("error", "not found"))
    }
}

Auto-serialize storage

No more to_json / from_json. Store maps directly. Read them back as maps.

users.set("alice", map("name", "Alice", "score", 95))
let user = users.get("alice")
print(user.name)     // "Alice" — auto-deserialized

Design by contract

on withdraw(balance: Int, amount: Int) {
    let result = balance - amount
    ensure result >= 0     // postcondition — fails if violated
    result                 // implicit return
}

memory {
    accounts: Map<String, String> [persistent]
    invariant _slot_len <= 10000   // checked on every .set()
}

What one Soma program replaces

ToolRoleIn Soma
Airflow / CronSchedulingevery 30s { ... }
Pandas (in-memory)Data pipelines|> filter() |> map() |> sort_by() |> agg() — in-memory only; no out-of-core / cluster query engine
SQLite / DB configPersistence[persistent, consistent] -> SQLite auto-resolved
In-memory cacheFast storage[ephemeral, local] -> HashMap
Express / FlaskAPI serveron request() { match path { ... } }
ReactDashboardhtml(""" ... """)
CeleryBackground jobsevery 1h { rebalance() }
TLA+ (a CTL fragment)Verificationsoma verify + soma.toml [verify] — covers deadlock_free, eventually, never, and conditional after.S.eventually; not the full TLA
ESLint / ClippyLintingsoma lint — catches anti-patterns
Copilot fixesAuto-repairsoma fix — writes missing handlers

Traditional stack

10 tools
5 languages
50,000 lines of glue
Specification: Confluence
Implementation: somewhere else

Soma

1 language
1 file (or a few)
1 command: soma serve
~300 lines
The spec IS the program

The language

Soma has one structure: the cell. Face (contract), memory (state), state machines (protocol), signal handlers (behavior). Everything is a cell, from a function to a datacenter.

Match expressions

on request(method: String, path: String, body: String) {
    let req = map("method", method, "path", path)
    match req {
        {method: "GET", path: "/"}                  → html(dashboard())
        {method: "GET", path: "/api/" + resource}   → list(resource)
        {method: "POST", path: "/api/" + resource}  → create(resource, body)
        _                                         → response(404, map("error", "not found"))
    }
}

// Also: guard clauses, range patterns, or-patterns, if-expressions
let grade = match score {
    90..100"A"
    80..89"B"
    n if n >= 70"C"
    _                  → "F"
}
let x = if cond { a } else { b }   // if/match are expressions

Lambdas + higher-order pipes

let portfolio = stocks
    |> filter(s => s.market_cap > 1e9)
    |> map(s => score(s))
    |> sort_by("alpha", "desc")
    |> top(15)

let names = users |> map(u => u.name) |> join(", ")
let active = users |> filter(u => u.status == "active")
let found = users |> find(u => u.id == target)
let valid = orders |> all(o => o.total > 0)

Multi-line strings

let html = """
<div class="card">
    <h2>{stock.ticker}</h2>
    <p class="price">${stock.price}</p>
    <span class="change {cls}">{stock.momentum}%</span>
</div>
"""

State machines

state order {
    initial: pending
    pending → validated { guard { risk_check_passed } }
    validated → sent
    sent → filled
    sent → rejected
    filled → settled
    * → cancelled
}

// Runtime enforces: you cannot reach 'sent' without passing 'validated'.
// The auditor reads the .cell file and sees the policy.
// The policy IS the code.

Declarative storage

memory {
    data:     Map<String, String> [persistent, consistent]     // → SQLite
    cache:    Map<String, String> [ephemeral, local]          // → in-memory
    sessions: Map<String, String> [ephemeral, ttl(30min)]    // → auto-expire
}

// The compiler verifies: [persistent, ephemeral] → contradiction.
// Properties are defined in .cell files. The language grows itself.

Types & variables

let x = 42                             // Int (64-bit)
// BigInt — declare in face/handler signatures, e.g. signal compute(n: BigInt)
// Promotion is automatic when an Int operation overflows.
let price = 3.14                       // Float (64-bit)
let sci = 1.5e3                        // Scientific: 1500.0
let name = "world"                     // String with {interpolation}
let user = User { name: "Alice", age: 30 }  // Typed record
let dur = 5s                            // Duration → 5000ms
let nothing = ()                       // Unit — null equivalent

for i in range(0, 10) { /* ... */ }   // range, break, continue
while running { if done { break } }   // loops

// Conversion: to_int("abc") returns () (NOT 0). Always null-check user input.
// Integer division promotes to Float when non-exact: 7 / 2 == 3.5
// Use floor(7 / 2) for truncating division.

Cell kinds

The cell keyword takes a kind modifier. Each kind is the same five-section structure (face, memory, state, scale, handlers); the kind tells the compiler what role the cell plays.

KindPurpose
cell Foo { }Regular cell — functions, services, web apps
cell agent Foo { }Agent cell — unlocks think, set_budget, tool declarations, and the agent verification story
cell property Foo { }Define a new memory property (see stdlib/durability.cell)
cell backend Foo { }Define a storage backend implementation
cell type Foo<T> { }Define a custom type
cell checker Foo { }A custom validation rule the checker will run
cell builtin Foo { }FFI bridge to a Rust builtin (stdlib only)
cell test Foo { }Test cell — rules { assert … }, run with soma test

Self-growing language

// Properties, checkers, and backends are .cell files, not compiler code.

cell property geo_replicated {
    rules {
        implies [persistent, consistent]
        contradicts [ephemeral, local]
    }
}

cell backend redis {
    rules {
        matches [ephemeral, ttl]
        native "redis"
    }
}

// Add geo_replicated to your memory — the compiler now checks it.
// Define a redis backend — the runtime can resolve [ephemeral, ttl] to it.
// No compiler changes needed. The language grew.

Composition: interior and runtime

Cells compose by nesting. A parent cell can declare interior children and wire their signals together explicitly.

cell System {
    interior {
        cell Worker { /* … */ }
        cell Cache  { /* … */ }
    }
    runtime {
        start Worker                              // bring up the child
        connect Worker.done -> Cache              // wire signals
        emit initialize()                         // fire startup signal
    }
}

This is how a system grows from one cell. soma describe dumps the full interior graph as JSON.

Three execution backends

BackendHow to invokeWhen to use
Tree-walking interpretersoma run file.cell (default)Development, the reference semantics. All other backends must agree with it.
Bytecode VMsoma run file.cell --jitCompiled-once-per-process speedup; better for hot loops without committing to native.
Native (Rust cdylib)[native] handler annotation, or soma build for a Rust skeletonTight numeric loops; whole-loop BigInt dispatches to GMP via rug. Single BigInt ops cross FFI and are slower — the speedup is shaped, not uniform.

Data pipelines

Soma replaces Pandas for in-memory analytics. Same operations, fraction of the code. Every pipeline is composable via |>.

Pipeline operators

OperationSyntax
map(list, lambda)data |> map(s => s.name)
filter(list, lambda)data |> filter(s => s.score > 50)
find(list, lambda)data |> find(s => s.id == target)
any / all / countdata |> any(s => s.active)
filter_by(field, op, val)data |> filter_by("price", ">", 100)
sort_by(field, dir)data |> sort_by("score", "desc")
top(n) / bottom(n)data |> top(10)
agg(group, "col:func"...)data |> agg("sector", "vol:sum", "price:avg")
group_by(field)data |> group_by("region")
join(left, right, key)orders |> join(prices, "ticker")
with(key, val)record |> with("score", 95)
describe(field)data |> describe("price") → {sum, avg, min, max, count}
flatten / zip / reversenested |> flatten()

Real example: portfolio construction

let portfolio = universe()
    |> filter(s => s.market_cap > 1e9)
    |> map(s => score(s))
    |> sort_by("composite", "desc")
    |> top(15)
    |> map(s => {
        let weight = clamp(s.composite * exposure / total, 0, 15)
        s |> with("weight", weight)
    })

When one handler is the bottleneck

You write Soma for the architecture: cells, state machines, persistence, web server. Then one handler does a million iterations and takes seconds. In every other language, you rewrite it in C++ or Rust. In Soma, you add one word: [native]. The compiler emits a Rust cdylib, caches it under .soma_cache/native/, and the handler runs as native code. Same file, same syntax. On a tight integer loop, the speedup over the tree-walking interpreter is in the 100–300× range; numerically heavy workloads with whole BigInt loops dispatch to GMP via rug and run within a small constant factor of hand-written Rust.

The code — hot.cell

cell P {
    on hot(n: Int) [native] {
        let s = 0
        let i = 0
        while i < n {
            s = s + i
            i += 1
        }
        return s
    }
    on slow(n: Int) {
        let s = 0
        let i = 0
        while i < n {
            s = s + i
            i += 1
        }
        return s
    }
}

Same handler, two backends

$ soma run hot.cell --signal slow 10000000
# interpreted: ~1,250 ms

$ soma run hot.cell --signal hot 10000000
[native] compiling 1 handler(s) for cell 'P'...
[native] compiled → .soma_cache/native/...
# native:      ~3–5 ms

# Speedup on this loop: ~300×.
# Reproduce: bench/compare.sh — Soma vs Python
# side-by-side on 90+ math/CS challenges.
ModeSum 0..107 (single run)vs interpreted
Tree-walking interpreter~1,250 ms
[native] (Rust cdylib, sequential)~3–5 ms~300×

Measured on an Apple M-series with the bundled [native] codegen. Numbers vary with workload — this is a tight integer loop, the best case for the JIT path. The reproducible benchmark suite under bench/compare.sh runs 90+ math/CS challenges Soma-vs-Python with both wall-time and inner timings; [compute.parallel] for cross-handler thread fan-out is wired up but the headline parallel numbers are workload-specific — check the suite for your shape.

The code is the same in all three rows. [native] compiles to machine code. Adding [compute.parallel] in soma.toml splits the work across cores automatically. No threads, no mutexes, no async/await, no rewrite. One word in the code, two lines in the config.

This is the fourth axis of the Soma property system:

AxisCodeConfigurationResolved to
Storage[persistent]soma.toml [storage]SQLite
Transportsignal / onsoma.toml [peers]TCP bus
Verificationstate { }soma.toml [verify]Model checker
Compute[native]soma.toml [compute]LLVM + threads

Four axes. Same pattern. The code declares what. The configuration declares how. The compiler resolves.

Text processing via Rust builtins

let files = read_files("data", 10000)
let counts = files |> word_count()

read_files and word_count are Rust builtins exposed through the interpreter, so the work happens at native speed even without [native]. The same shape applies to other heavy collection operations: the interpreter dispatches to a Rust implementation under the hood, and the pipe just composes them. Numeric hot paths still use [native]; everything else stays in the interpreter where readability and the rest of the cell model live.

What if I use [native] wrong?

on bad(name: String) [native] {
    print("hello {name}")
}

error: handler 'bad' is marked [native] but uses unsupported
       parameter type 'String' for 'name'
  → [native] handlers can only use Int, Float, Bool

The compiler catches it before any code runs. [native] is for numeric hot paths — the interpreter handles everything else. Same file, same language: the interpreter orchestrates, [native] computes.

Web applications

soma serve app.cell — threaded HTTP server, SQLite, CORS, auto-routing, HTML templates. Zero framework. Storage auto-serializes — no to_json/from_json needed.

cell App {
    memory { tasks: Map<String, String> [persistent, consistent] }

    on request(method: String, path: String, body: String) {
        let req = map("method", method, "path", path)
        match req {
            {method: "GET", path: "/"}    → html(dashboard())
            {method: "GET", path: "/api"}  → tasks.values     // auto-deserialized
            {method: "POST", path: "/api"} → {
                let data = from_json(body)
                let id = to_string(next_id())
                tasks.set(id, data)                // auto-serialized
                map("id", id)
            }
            _                           → response(404, map("error", "not found"))
        }
    }

    on dashboard() {
        let rows = tasks.values
            |> map(t => """<tr><td>{t.name}</td><td>{t.status}</td></tr>""")
            |> join("")
        """<html><body><h1>Tasks</h1><table>{rows}</table></body></html>"""
    }
}

// $ soma serve app.cell
// listening on http://localhost:8080

Inter-process signals

Independent Soma programs communicate via signals. No WebSocket code. No HTTP polling. No serialization. signal to send, on to receive. Peers declared in soma.toml. Zero transport code in your program.

Exchange — app.cell

cell Exchange {
    on order(data: Map) {
        place_order(data)
    }

    every 500ms {
        let fill = match_orders()
        signal trade(fill)
    }
}

Trader — app.cell

cell Trader {
    every 3s {
        signal order(data)
    }

    on trade(fill: Map) {
        record_fill(fill)
    }
}

// soma.toml
// [peers]
// exchange = "localhost:8082"

The exchange emits signal trade(fill). The trader's on trade(fill) fires automatically. Two independent processes. No shared code. No ws_connect, no http_get, no to_json. The signal is the transport.

Peers are declared in soma.toml, not in code. Change the address — zero code changes. Replace TCP with Kafka — zero code changes. The program contains logic. The configuration contains topology.

// soma.toml — topology is configuration, not code
[package]
name = "trader"

[peers]
exchange = "localhost:8082"    // auto-connects on soma serve startup
risk = "localhost:7082"        // add more peers — no code changes

Three transports, one keyword

signal trade(fill) delivers to all listeners: browsers via SSE, programs via TCP bus, WebSocket clients. Same keyword, runtime resolves the transport.

signal trade(fill)     // one emit → three transports:
                       //   → SSE: browser EventSource('/stream')
                       //   → TCP bus: connected peers (soma.toml)
                       //   → WS: WebSocket clients on :8081

Fractal Distribution

Same code. One machine or twenty thousand. The only difference: --join.

A cell declares what it needs. The runtime figures out how.

cell PricingEngine {

    memory {
        trades: Map<String, String> [persistent, consistent]
        cache:  Map<String, String> [ephemeral, local]
    }

    scale {
        replicas: 50
        shard: trades
        consistency: strong
        tolerance: 2
        cpu: 4
        memory: "8Gi"
    }

    on book_trade(data: Map) {
        trades.set(data.id, data)            // auto-serialized, replicated to all nodes
    }

    on price(data: Map) {
        return cache.get(data.symbol)         // node-local, microseconds
    }
}

Run it alone:

$ soma serve app.cell -p 8080

Scale it to a cluster:

$ soma serve app.cell -p 8080                          # node 1 (leader)
$ soma serve app.cell -p 8081 --join localhost:8082     # node 2
$ soma serve app.cell -p 8083 --join localhost:8082     # node 3

The code doesn't change. Not one line. The scale { } section declares the distribution contract. The runtime honors it:

shard: tradesSlot is the routing key for the consistent hash ring (FNV, 128 vnodes/node). The compiler checks the slot is non-[ephemeral]; the current runtime distributes via signal-bus broadcast rather than true partitioning.
[ephemeral, local]Memory stays node-local. Fast path. Never touches the network.
consistency: strongCompile-time contract: linearizable reads/writes, CP under partition. Quorum is computed and CAP contradictions are rejected. Runtime status: best-effort — the current cluster broadcasts writes over the signal bus and does not yet run a consensus protocol (Raft is on the roadmap).
consistency: eventualWrite locally, propagate in background over the signal bus. Reads may be stale. This is what the runtime actually delivers today, regardless of the declared mode.
tolerance: 2Declared failure budget. The compiler rejects values larger than N − quorum; the runtime detects peer loss via 15s heartbeat timeout.
cpu: 4, memory: "8Gi"Resource hints per instance, surfaced to deployment targets (Dockerfile, fly.toml).

Honesty note. The scale block is a verified contract, not yet a hardened consensus runtime. soma verify proves your declared distribution is internally consistent (no strong on an [ephemeral] slot, no impossible tolerance, quorum math checks out). What the runtime currently enforces is replication-via-broadcast on the signal bus. Treat strong as a specification the compiler checks you meant, not as a Raft-level guarantee.

Signals are (the current) replication

There is no special replication protocol. When trades.set(key, val) executes on node 1, it broadcasts an EVENT on the signal bus. Every connected node receives it and applies it locally. The same bus that carries inter-cell signals carries data replication. The infrastructure is the language — with the trade-off that today this gives you eventual replication regardless of the declared consistency: mode. A real Raft/Paxos backend will plug into the same protocol.

The verifier knows about distribution

soma verify proves distributed properties at compile time, before deployment:

$ soma verify app.cell

scale 'PricingEngine':
   replicas: 50 instances declared
   shard 'trades' is [persistent] — eligible for distribution
   consistency: strong — declared linearizable
   CAP: CP mode — consistent + partition-tolerant (declared)
   quorum: 26/50 nodes needed
   tolerance: 2 ≤ (50 − 26) = 24, accepted
   memory 'cache' is [ephemeral, local] — not distributed (fast path)
   runtime: strong is currently delivered via signal-bus broadcast
       (no consensus protocol yet) — treat as a contract, not a guarantee

Contradictions are caught at compile time:

memory { data: Map [ephemeral, local] }
scale  { shard: data, consistency: strong }
// → error: shard 'data' uses [ephemeral] but scale declares
//          consistency: strong — contradictory

Why this is different

KubernetesErlang/OTPAkkaSoma
Distribution modelExternal YAMLIn the VMIn the libraryIn the language
Verified before deployNoNoNoYes (CTL + CAP)
Consistency declaredNoNoNoPer memory slot
Same code local/clusterNoAlmostAlmostYes, zero changes

Read the paper: Scale as a Type: Verified Distribution in a Fractal Cell Language

Built with Soma

Distributed Pricing Engine

50-node trade booking, option pricing, greek reconciliation. Strong consistency, verified quorum. --join to scale.

Distributed Job Queue

A mini Celery. Submit jobs, work-stealing scheduler, retry logic. Verified: every job completes or expires. 120 lines.

Distributed Chat

Multi-room messaging across nodes. Eventual consistency. Messages replicated via signal bus. 90 lines.

Mini Kubernetes

Cells are pods. State machines are lifecycles. The scheduler is a controller loop. Verified deadlock-free. 200 lines.

Exchange + Trader

Order book, matching engine, market maker bot. Two programs communicating via signal bus. Real-time fills.

100+ Examples

From hello world to HTMX apps to Monte Carlo pricing. Every example parses, checks, and runs.

CLI reference

soma run file.cell [args]Execute the entry handler in the tree-walking interpreter
soma run file.cell --signal name [args]Execute a specific named handler
soma run file.cell --jit [args]Execute via the bytecode VM (faster startup for hot loops)
soma build file.cell [-o out.rs]Generate Rust skeleton from a cell (native codegen frontend)
soma serve file.cellHTTP :8080, WS :8081, Bus :8082
soma serve file.cell --join host:portJoin a cluster
soma serve file.cell --watchHot reload on file change
soma check file.cell [--json]Verify contracts & properties
soma fix file.cellAuto-repair: adds missing handlers, fixes properties
soma lint file.cell [--json]Anti-patterns: redundant to_json, unchecked .get(), if-chains
soma verify file.cell [--json]State machine + distribution proofs
soma describe file.cellRich JSON: handlers, memory, state machines, face, scheduled
soma test file.cellRun test cells
soma init [name]Create project
soma add pkg --git urlAdd dependency
soma installInstall deps
soma propsList registered properties + backends
soma replInteractive evaluator
soma ast file.cellDump the parsed AST as JSON
soma tokens file.cellDump the lexer token stream
soma envShow stdlib path, cache dir, and resolved config
soma deploy file.cell --target fly|cloudflare|awsGenerate deployment scaffolding (Dockerfile / fly.toml / etc.) and shell out to the cloud CLI — you still need flyctl/wrangler auth set up

Architecture

Source (.cell)
    → Lexer → Parser → AST
        → Checker (contracts, properties, scale coherence)
        → Fixer (auto-repair: missing handlers, bad properties)
        → Linter (anti-patterns: redundant to_json, if-chains)
        → Verifier (state machines, temporal logic, CAP analysis)
        → Describe (rich JSON: handlers, memory, state, face)
        → Interpreter (soma run)
            → Auto-serialize storage (maps → JSON → maps)
            → Memory invariants (checked on every .set())
            → Ensure postconditions (checked on handler exit)
        → Native codegen ([native] → Rust → .dylib via cached cdylib build)
    → Registry (stdlib/*.cell — properties, backends, builtins)
    → Runtime
        → Storage (SQLite | Memory | auto-serialize)
        → HTTP server (soma serve)
        → Cluster (--join → hash ring → signal replication)
        → SSE + WebSocket + TCP signal bus

Agent cheat sheet

Everything an AI agent needs to write Soma. Copy-paste ready. See llms.txt for the full machine-readable reference.

Hello world

// app.cell
cell App {
    on run() { print("hello soma") }
}
// $ soma run app.cell

Web app (5 lines)

cell App {
    memory { items: Map<String, String> [persistent] }
    on request(method: String, path: String, body: String) {
        let req = map("method", method, "path", path)
        match req {
            {method: "GET", path: "/"}                 -> items.values
            {method: "POST", path: "/api/" + resource}  -> items.set(resource, from_json(body))
            _                                         -> response(404, map("error", "not found"))
        }
    }
}
// $ soma serve app.cell

Verified agent (10 lines)

cell agent Bot {
    face { tool search(q: String) -> String "Search the web" }
    state w { initial: idle  idle -> thinking -> done  * -> failed }
    on search(q: String) { "Results for: {q}" }
    on run(topic: String) {
        set_budget(5000)
        transition("t", "thinking")
        let answer = think("Research: {topic}")
        transition("t", "done")
        answer
    }
}
// $ soma verify bot.cell          ← PROVES it terminates
// $ SOMA_LLM_MOCK=echo soma run bot.cell "AI"  ← works offline
// $ SOMA_LLM_KEY=sk-... soma run bot.cell "AI"  ← real LLM
//
// Ollama (local, free):
// $ export SOMA_LLM_KEY=ollama
// $ export SOMA_LLM_URL=http://localhost:11434/v1/chat/completions
// $ export SOMA_LLM_MODEL=gemma3:12b
// $ soma run bot.cell "AI"

Multi-agent coordination

cell agent Researcher {
    state w { initial: idle  idle -> done  * -> failed }
    on research(topic: String) {
        transition("t", "done")
        let r = think("Research: {topic}")
        emit findings(map("topic", topic, "data", r))  // -> Writer receives
    }
}
cell agent Writer {
    state w { initial: idle  idle -> done  * -> failed }
    on findings(data: Map) {
        transition("t", "done")
        print(think("Write about: {data.topic}"))
    }
}
// emit dispatches to all cells with matching handler
// delegate("Writer", "findings", data) for direct calls
// gather(items, "Worker", "process") for fan-out
// broadcast("alert", data) for all agents

Syntax cheat sheet

// Variables
let x = 42                // Int
x += 10                   // compound: += -= *= /=
let s = "hi {x}"          // String interpolation
let m = map("a", 1)       // Map (not {a: 1})
let l = list(1, 2, 3)     // List (not [1,2,3])
let n = ()                // null (not null/nil)

// Control flow (if/match are expressions)
let y = if x > 0 { "pos" } else { "neg" }
let z = match x { 0..10 -> "small"  _ -> "big" }

// Pipes
data |> filter(x => x.score > 50) |> sort_by("score") |> top(10)

// Error handling
let v = try { risky() }?       // ? propagates error
ensure balance >= 0            // postcondition
let val = x ?? 0               // null coalesce

// Storage (auto-serializes)
data.set("k", map("a", 1))  // stores map directly
let v = data.get("k")         // returns map, not string

// Agent builtins
think("prompt")               // LLM + tool loop
think_json("prompt")          // returns Map
delegate("Cell", "sig", args) // cross-agent call
set_budget(5000)              // token cap
trace()                      // execution log
approve("action")            // human gate

Match patterns (all composable)

match value {
    "literal"                    -> expr
    "a" || "b"                   -> expr          // or-pattern
    name                         -> use(name)     // variable binding
    "/api/" + rest               -> api(rest)     // string prefix
    {method: "GET", path}        -> get(path)     // map destructure
    0..100                       -> "small"       // range
    n if n > 0                   -> "positive"   // guard
    ()                           -> "null"        // unit
    _                            -> "default"     // wildcard
}

Don't / Do

WrongRight
function foo() {}on foo() {}
null()
[1, 2, 3]list(1, 2, 3)
{key: val}map("key", val)
items.set(k, to_json(m))items.set(k, m)
from_json(items.get(k))items.get(k)
import xuse lib::x
console.log(x)print(x)