soma

A language where infrastructure is a constraint, compliance is a state machine, and pipelines are pipes.

the specification is the program

$ curl -fsSL https://soma-lang.dev/install.sh | sh
cell Fund {
    memory {
        positions: Map<String, Position> [persistent, consistent]   // → SQLite. Automatically.
        prices:    Map<String, Float>    [ephemeral, ttl(30s)]     // → in-memory, auto-expire
    }

    state regime {
        initial: neutral
        neutral → risk_on
        neutral → risk_off
        * → crisis
        crisis → neutral
    }

    every 1h { rebalance() }

    on rebalance() {
        let portfolio = universe()
            |> map(s => score(s))
            |> filter(s => s.composite > 50)
            |> sort_by("composite", "desc")
            |> top(15)

        for stock in portfolio { execute(stock) }
    }

    on request(method: String, path: String, body: String) {
        match path {
            "/"html(dashboard())
            "/api/risk" → risk_report()
            _           → response(404, map("error", "not found"))
        }
    }
}

The problem

A quantitative fund today needs Python + Pandas + Redis + PostgreSQL + Kafka + Flask + React + Celery + Airflow + Terraform. Ten tools, five languages, three teams. Fifty thousand lines of glue code. The specification lives in Confluence. The code lives somewhere else. They drift apart.

An AI agent can write a function. Maybe a file. But it cannot write a system — services that coordinate, maintain state, handle failures, and evolve. The gap is not intelligence. The gap is that programming languages separate intent from infrastructure, specification from implementation, contract from code.

Soma closes that gap.

Thesis I — after Lamport

The specification is the program

In Soma, you don't write specs then implement them. The face contract is the API. The state machine is the protocol. The memory properties are the infrastructure requirements. The compiler checks all three. The same artifact that describes what the system should do is the artifact that runs.

Everyone thinks they mean the same thing by "specification" until they try to make it precise.

— Leslie Lamport, Turing Lecture 2014
Thesis II — after Hellerstein

Infrastructure is a constraint, not a configuration

[persistent, consistent] resolves to SQLite. [ephemeral] resolves to in-memory. The program never names a database. It declares what it needs. The runtime resolves how. The provider protocol is extensible — custom backends can be added as .cell files.

memory {
    orders:   Map<String, Order>   [persistent, consistent]     // → SQLite
    cache:    Map<String, Price>   [ephemeral, local]          // → in-memory HashMap
    audit:    Log<Event>           [persistent]               // → SQLite (WAL mode)
}

The future of data management lies in the declarative specification of what is wanted, not the procedural specification of how to get it.

— Joseph Hellerstein, The Declarative Imperative, 2010
Thesis III — after Kay

Agents build systems, not functions

Soma's cell is fractal. The same structure — face, memory, state, handlers — works at every scale. A function is a cell. A service is a cell. An AI agent doesn't need to learn "how to deploy a distributed system." It needs to learn one thing: the cell. Independent cells communicate via the TCP signal bus, configured in soma.toml.

// Two independent programs, connected via soma.toml

// exchange/app.cell
cell Exchange {
    on order(data: Map) { place_order(data) }
    every 500ms { signal trade(match_orders()) }
}

// trader/app.cell
cell Trader {
    every 3s { signal order(quote) }
    on trade(fill: Map) { record_fill(fill) }
}

// trader/soma.toml: [peers] exchange = "localhost:8082"

The best way to predict the future is to invent it.

— Alan Kay, Turing Award 2003

Contracts are enforced

The face section is not documentation. It is a machine-checked contract. The compiler verifies it. Break the contract — the program does not compile.

cell API {
    face {
        signal create(name: String) -> Map
        signal delete(id: String)            // ← declared but no handler
        promise all_persistent
    }
    on create(name: String) { return map("name", name) }
}
error: face contract: signal 'delete' declared in cell 'API' has no handler
→ Every signal in the face MUST have a matching on handler.
→ Param counts are verified. Contradictory properties are rejected.
→ Descriptive promises generate warnings. Structural promises are enforced.

This is what separates Soma from every other dynamic language. The face is a structural contract: it declares what signals the cell handles, and the compiler verifies the cell implements them. Signal existence, parameter counts, property contradictions, and structural promises are all checked at compile time. Return types and runtime invariants are not yet verified.

Formal verification

soma verify is a model checker. It exhaustively explores every reachable state in your state machine and proves temporal properties. Not testing — proving.

The code — app.cell

state order {
    initial: pending
    pending → validated
    validated → sent
    sent → filled
    sent → rejected
    filled → settled
    * → cancelled
}

The spec — soma.toml

[verify]
deadlock_free = true
eventually = ["settled", "cancelled"]
never = ["invalid"]

[verify.after.sent]
eventually = ["filled", "rejected"]

[verify.after.pending]
eventually = ["validated", "cancelled"]
$ soma verify app.cell

State machine 'order': 7 states, initial 'pending'

   7 states, 11 transitions
   all states reachable from 'pending'
   no deadlocks
   liveness: every state can eventually reach a terminal state

  Temporal properties:
   deadlock_free
   eventually(settled or cancelled)
   never(invalid)
   after('pending', validated or cancelled)
   after('sent', filled or rejected)
    counter-example: pending → validated → sent → cancelled
    → The wildcard * → cancelled bypasses the expected filled/rejected

The model checker found a real bug: after sent, the spec requires the order to reach filled or rejected. But the wildcard * → cancelled allows skipping directly to cancelled — violating the spec. The counter-example shows the exact execution path.

This is bounded model checking inspired by CTL temporal logic. It exhaustively explores finite state machine graphs — not data values or guard expressions. For small state machines (5–15 states), the verification is instant and complete. The spec lives in soma.toml, the implementation lives in .cell, and the proof is one command: soma verify.

PropertyMeaningCTL equivalent
deadlock_freeNo reachable state has zero exitsAG(EX true)
eventually = ["X"]All paths reach XAF(X)
never = ["X"]X is never reachedAG(¬X)
after.S.eventually = ["X"]After S, all paths reach XAG(S → AF(X))

Intellectual foundations

Soma is not invented from nothing. It stands on decades of research in programming language theory, distributed systems, and formal methods.

The Actor Model

Cells are actors with contracts

Hewitt's Actor model (1973) introduced the idea that computation is message passing between autonomous agents. Soma cells are actors: they have private state (memory), respond to messages (signals), and can spawn children (interior). What Soma adds: typed contracts on the messages (face), and declarative infrastructure on the state (properties).

One Actor can send messages, create new Actors, and determine how to handle the next message it receives.

— Carl Hewitt, Peter Bishop, Richard Steiger. A Universal Modular ACTOR Formalism. 1973
Specification as Program

TLA+ made executable

Lamport showed that distributed systems need formal specifications. But specifications lived in separate documents, drifting from code. In Soma, the state machine IS the protocol. The face contract IS the API. The promise IS the invariant. There is no drift because there is no separation.

If you're thinking without writing, you only think you're thinking.

— Leslie Lamport, Turing Award 2013
Declarative Infrastructure

What, not how

Codd's relational model (1970) proved that declaring what you want beats specifying how to get it. SQL replaced procedural data access. Soma applies the same principle to infrastructure: [persistent, consistent] declares the intent. The runtime resolves SQLite, DynamoDB, or Firestore. Change the provider — zero code changes.

Future users of large data banks must be protected from having to know how the data is organized in the machine.

— E.F. Codd, A Relational Model of Data for Large Shared Data Banks. 1970
Self-Growing Systems

The language extends itself

Kay's vision for Smalltalk was a system where everything — including the language itself — could be modified from within. In Soma, properties, checkers, and backends are defined as .cell files, not compiler code. Add a new property? Write a cell. Add a new storage backend? Write a cell. The compiler reads them and enforces them.

The best way to predict the future is to invent it.

— Alan Kay, Turing Award 2003
Pipe Composition

Data flows, not data structures

Thompson and McIlroy's Unix pipes (1973) showed that complex programs are built by composing simple ones. Soma's |> operator is the same idea applied to data: filter |> map |> sort |> top. Each stage is a pure transform. The pipeline reads like a sentence.

Write programs that do one thing and do it well. Write programs to work together.

— Doug McIlroy, Unix Philosophy. 1978

Errors that help

Inspired by Elm and Rust. Every error shows the source line, points to the exact expression, and suggests a fix.

error: expected '=', found number '10'
  --> app.cell:3:15
  |
3 |         let x 10
  |               ^

error: cannot add String and Int: hello + 5
  --> app.cell:7:17
  |
7 |         let r = "hello" + 5
  |                 ^

error: undefined function: langth (did you mean 'length'?)
  --> app.cell:4:9

error: face contract: signal 'delete' declared in cell 'API' has no handler
  → The face is a contract. Break it, and the compiler tells you.

What one Soma program replaces

ToolRoleIn Soma
Airflow / CronSchedulingevery 30s { ... }
Spark / PandasData pipelines|> filter() |> map() |> sort_by()
SQLite / DB configPersistence[persistent, consistent] → SQLite auto-resolved
In-memory cacheFast storage[ephemeral, local] → HashMap
Express / FlaskAPI serveron request() { match path { ... } }
ReactDashboardhtml(""" ... """)
CeleryBackground jobsevery 1h { rebalance() }
TLA+ (basic)Verificationsoma verify + soma.toml [verify]

Traditional stack

8 tools
4 languages
50,000 lines of glue
Specification: Confluence
Implementation: somewhere else

Soma

1 language
1 file (or a few)
1 command: soma serve
~300 lines
The spec IS the program

The language

Soma has one structure: the cell. Face (contract), memory (state), state machines (protocol), signal handlers (behavior). Everything is a cell, from a function to a datacenter.

Match expressions

on request(method: String, path: String, body: String) {
    match path {
        "/"html(dashboard())
        "/api/portfolio" → get_portfolio()
        "/api/risk"      → risk_report()
        _                → response(404, map("error", "not found"))
    }
}

Lambdas + higher-order pipes

let portfolio = stocks
    |> filter(s => s.market_cap > 1e9)
    |> map(s => score(s))
    |> sort_by("alpha", "desc")
    |> top(15)

let names = users |> map(u => u.name) |> join(", ")
let active = users |> filter(u => u.status == "active")
let found = users |> find(u => u.id == target)
let valid = orders |> all(o => o.total > 0)

Multi-line strings

let html = """
<div class="card">
    <h2>{stock.ticker}</h2>
    <p class="price">${stock.price}</p>
    <span class="change {cls}">{stock.momentum}%</span>
</div>
"""

State machines

state order {
    initial: pending
    pending → validated { guard { risk_check_passed } }
    validated → sent
    sent → filled
    sent → rejected
    filled → settled
    * → cancelled
}

// Runtime enforces: you cannot reach 'sent' without passing 'validated'.
// The auditor reads the .cell file and sees the policy.
// The policy IS the code.

Declarative storage

memory {
    data:     Map<String, String> [persistent, consistent]     // → SQLite
    cache:    Map<String, String> [ephemeral, local]          // → in-memory
    sessions: Map<String, String> [ephemeral, ttl(30min)]    // → auto-expire
}

// The compiler verifies: [persistent, ephemeral] → contradiction.
// Properties are defined in .cell files. The language grows itself.

Types & variables

let x = 42                             // Int
let price = 3.14                       // Float
let sci = 1.5e3                        // Scientific: 1500.0
let name = "world"                     // String with {interpolation}
let user = User { name: "Alice", age: 30 }  // Typed record
let dur = 5s                            // Duration → 5000ms

for i in range(0, 10) { /* ... */ }   // range, break, continue
while running { if done { break } }   // loops

Self-growing language

// Properties, checkers, and backends are .cell files, not compiler code.

cell property geo_replicated {
    rules {
        implies [persistent, consistent]
        contradicts [ephemeral, local]
    }
}

cell backend redis {
    rules {
        matches [ephemeral, ttl]
        native "redis"
    }
}

// Add geo_replicated to your memory — the compiler now checks it.
// Add a redis backend — the runtime now resolves to it.
// No compiler changes needed. The language grew.

Data pipelines

Soma replaces Pandas for in-memory analytics. Same operations, fraction of the code. Every pipeline is composable via |>.

Pipeline operators

OperationSyntax
map(list, lambda)data |> map(s => s.name)
filter(list, lambda)data |> filter(s => s.score > 50)
find(list, lambda)data |> find(s => s.id == target)
any / all / countdata |> any(s => s.active)
filter_by(field, op, val)data |> filter_by("price", ">", 100)
sort_by(field, dir)data |> sort_by("score", "desc")
top(n) / bottom(n)data |> top(10)
agg(group, "col:func"...)data |> agg("sector", "vol:sum", "price:avg")
group_by(field)data |> group_by("region")
join(left, right, key)orders |> join(prices, "ticker")
with(key, val)record |> with("score", 95)
describe(field)data |> describe("price") → {sum, avg, min, max, count}
flatten / zip / reversenested |> flatten()

Real example: portfolio construction

let portfolio = universe()
    |> filter(s => s.market_cap > 1e9)
    |> map(s => score(s))
    |> sort_by("composite", "desc")
    |> top(15)
    |> map(s => {
        let weight = clamp(s.composite * exposure / total, 0, 15)
        s |> with("weight", weight)
    })

When one handler is the bottleneck

You write Soma for the architecture: cells, state machines, persistence, web server. Then one handler does a million iterations and takes 8 seconds. In every other language, you rewrite it in C++ or Rust. In Soma, you add one word: [native]. It compiles to machine code. Same file. Same syntax. 275x faster.

The code — app.cell

on simulate(n: Int) [native] {
    return range(0, n)
        |> map(i => compute_hit(i))
        |> reduce(0.0, p => p.acc + p.val)
        / n * 4.0
}

The config — soma.toml

# Sequential: 177ms for 50M paths
# Just add:

[compute]
backend = "threads"
threads = 8

[compute.parallel]
handlers = ["simulate"]

# Now: 29ms for 50M paths
Mode50M Monte Carlo pathsvs interpreted
Interpreted~8,000ms1x
[native]177ms45x faster
[native] + parallel (8 cores)29ms275x faster

The code is the same in all three rows. [native] compiles to machine code. Adding [compute.parallel] in soma.toml splits the work across cores automatically. No threads, no mutexes, no async/await, no rewrite. One word in the code, two lines in the config.

This is the fourth axis of the Soma property system:

AxisCodeConfigurationResolved to
Storage[persistent]soma.toml [storage]SQLite
Transportsignal / onsoma.toml [peers]TCP bus
Verificationstate { }soma.toml [verify]Model checker
Compute[native]soma.toml [compute]LLVM + threads

Four axes. Same pattern. The code declares what. The configuration declares how. The compiler resolves.

Web applications

soma serve app.cell — threaded HTTP server, SQLite, CORS, auto-routing, HTML templates. Zero framework.

cell App {
    memory { tasks: Map<String, String> [persistent, consistent] }

    on request(method: String, path: String, body: String) {
        match path {
            "/"html(dashboard())
            "/api"  → tasks.values() |> map(s => from_json(s))
            _       → response(404, map("error", "not found"))
        }
    }

    on dashboard() {
        let rows = tasks.values()
            |> map(s => from_json(s))
            |> map(t => """<tr><td>{t.name}</td><td>{t.status}</td></tr>""")
            |> join("")
        return """
        <html>
            <body>
                <h1>Tasks</h1>
                <table>{rows}</table>
            </body>
        </html>
        """
    }
}

// $ soma serve app.cell
// listening on http://localhost:8080

Inter-process signals

Independent Soma programs communicate via signals. No WebSocket code. No HTTP polling. No serialization. signal to send, on to receive. Peers declared in soma.toml. Zero transport code in your program.

Exchange — app.cell

cell Exchange {
    on order(data: Map) {
        place_order(data)
    }

    every 500ms {
        let fill = match_orders()
        signal trade(fill)
    }
}

Trader — app.cell

cell Trader {
    every 3s {
        signal order(data)
    }

    on trade(fill: Map) {
        record_fill(fill)
    }
}

// soma.toml
// [peers]
// exchange = "localhost:8082"

The exchange emits signal trade(fill). The trader's on trade(fill) fires automatically. Two independent processes. No shared code. No ws_connect, no http_get, no to_json. The signal is the transport.

Peers are declared in soma.toml, not in code. Change the address — zero code changes. Replace TCP with Kafka — zero code changes. The program contains logic. The configuration contains topology.

// soma.toml — topology is configuration, not code
[package]
name = "trader"

[peers]
exchange = "localhost:8082"    // auto-connects on soma serve startup
risk = "localhost:7082"        // add more peers — no code changes

Three transports, one keyword

signal trade(fill) delivers to all listeners: browsers via SSE, programs via TCP bus, WebSocket clients. Same keyword, runtime resolves the transport.

signal trade(fill)     // one emit → three transports:
                       //   → SSE: browser EventSource('/stream')
                       //   → TCP bus: connected peers (soma.toml)
                       //   → WS: WebSocket clients on :8081

Built with Soma

Exchange + Trader

Order book, matching engine, market maker bot. Two independent programs communicating via signal bus. Real-time fills.

Quant Fund

Multi-regime adaptive momentum. Factor scoring, regime state machine, auto-rebalancing. 300 lines.

Semaphore Agent

2-intersection traffic controller. State machines, night mode, emergency, live SSE dashboard. 4 files.

Trading Desk

20-stock universe, screener, ranker, state machine orders, real-time dashboard. 413 lines.

Car Rental

Multi-file booking system. Vehicle inventory, reservations, HTML pages. 5 files.

100 Examples

From hello world to HTMX apps. Every example parses, checks, and runs. Trains models.

CLI reference

soma run file.cell [args]Execute a signal
soma run file.cell --jitBytecode VM (faster)
soma serve file.cellHTTP :8080, WS :8081, Bus :8082
soma serve file.cell --watchHot reload
soma check file.cellVerify contracts & properties
soma test file.cellRun test cells
soma init [name]Create project
soma add pkg --git urlAdd dependency
soma installInstall deps
soma propsList properties + backends
soma replInteractive evaluator

Architecture

Source (.cell)
    → Lexer → Parser → AST
        → Checker (properties, signals, promises)
        → Interpreter (soma run)
        → Bytecode VM (soma run --jit)
    → Registry (stdlib/*.cell — properties, backends, builtins)
    → Runtime
        → Storage (SQLite | Memory | HTTP Provider)
        → HTTP server (soma serve)
        → SSE streaming (real-time browser push)
        → WebSocket server (bidirectional)
        → TCP signal bus (inter-process signals)