# Soma Language — Complete Reference for AI Agents

> Soma is a fractal cell language for verified distributed systems and AI agents.
> An agent's lifecycle is a state machine the compiler proves terminates;
> set_budget(N) caps the LLM loop inside it. Verification covers the protocol;
> set_budget covers the LLM. The compiler fixes errors, catches anti-patterns,
> and auto-serializes storage values. Three execution backends: tree-walking
> interpreter (default, the reference semantics), bytecode VM (--jit), and
> native Rust codegen ([native] handler annotation).

## Agent Workflow

```
1. Generate code      → write .cell file
2. Auto-fix errors    → soma fix app.cell
3. Lint               → soma lint app.cell
4. Check contracts    → soma check app.cell --json
5. Verify behavior    → soma verify app.cell --json (PROVES termination)
6. Serve              → soma serve app.cell -p 8080
```

## Quick Syntax Rules

- No semicolons. Newlines separate statements.
- No `function`/`def`. Use `on handler_name(params) { }`. Handlers don't take return-type annotations (`-> Int` is invalid on `on`); they DO take them on `signal` declarations inside `face { }`.
- No `null`. Use `()` for null/unit. `to_int("abc")` returns `()`, NOT `0`.
- No `[1,2,3]`. Use `list(1, 2, 3)`.
- No `{key: val}`. Use `map("key", val, "key2", val2)` — must have an even arg count.
- Strings: `"hello {name}"` (interpolation with `{}`); `"""raw"""` for triple-quoted raw strings.
- `if`/`match` are expressions: `let x = if cond { a } else { b }`. The `else` is required when used as an expression.
- `return` inside a `for` exits the entire handler, not just the loop. Use a flag + `break`.
- `match` is an expression — don't write `return match ...`, just write the match.
- Last expression in handler is the return value (implicit return).
- Storage auto-serializes: no `to_json`/`from_json` needed when storing maps/lists. Use them only when the caller explicitly wants a string.
- Slot **properties** (`.keys`, `.values`, `.len`, `.entries`, `.all`) take NO parens. Slot **methods** (`.get(k)`, `.set(k, v)`, `.delete(k)`, `.has(k)`) take parens.
- `7 / 2 == 3.5` (Float promotion). Use `floor(7 / 2)` for truncating division.
- Private handlers start with `_`: `on _helper()` is not exposed as an HTTP route.
- Use `len`, not `length`. Use `distinct`, not `unique`. Use `push`, not `append_to`. Use `nth`, not `at`.

## Cell Kinds

The `cell` keyword takes a kind modifier — same five-section structure (face, memory, state, scale, handlers), the kind tells the compiler what role the cell plays.

| Kind                       | Purpose                                                     |
|----------------------------|-------------------------------------------------------------|
| `cell Foo { }`             | Regular cell — functions, services, web apps                |
| `cell agent Foo { }`       | Agent cell — unlocks `think`, `set_budget`, `tool` decls    |
| `cell property Foo { }`    | Define a memory property (see `stdlib/durability.cell`)     |
| `cell backend Foo { }`     | Define a storage backend implementation                     |
| `cell type Foo<T> { }`     | Define a custom type                                        |
| `cell checker Foo { }`     | Custom validation rule run by the checker                   |
| `cell builtin Foo { }`     | FFI bridge to a Rust builtin (stdlib only)                  |
| `cell test Foo { }`        | Test cell — `rules { assert ... }`, run with `soma test`    |

## Composition: interior + runtime

```soma
cell System {
    interior {
        cell Worker { /* ... */ }
        cell Cache  { /* ... */ }
    }
    runtime {
        start Worker                       // bring up the child
        connect Worker.done -> Cache       // wire signals
        emit initialize()                  // fire startup signal
    }
}
```

`soma describe` dumps the full interior graph as JSON.

## Execution Backends

| Backend                  | How to invoke                                          | When to use                                                  |
|--------------------------|--------------------------------------------------------|--------------------------------------------------------------|
| Tree-walking interpreter | `soma run file.cell` (default)                         | Development. Reference semantics — other backends agree with it. |
| Bytecode VM              | `soma run file.cell --jit`                             | Better startup-amortized cost for hot loops without going native. |
| Native (Rust `cdylib`)   | `[native]` handler annotation, or `soma build`         | Tight numeric loops. Whole-loop BigInt dispatches to GMP via `rug`. Single BigInt ops cross FFI and are slower. |

## Verified AI Agent — the specialty

```soma
cell agent Researcher {
    face {
        signal research(topic: String) -> Map
        tool search(query: String) -> String "Search the web"
        tool summarize(text: String) -> String "Summarize findings"
    }

    memory {
        findings: Map<String, String> [persistent]
    }

    state workflow {
        initial: idle
        idle -> researching
        researching -> analyzing
        analyzing -> done
        * -> failed
    }

    // Tool implementations — LLM calls these automatically
    on search(query: String) {
        http_get("https://api.search.com?q={query}")
    }

    on summarize(text: String) {
        think("Summarize concisely: {text}")
    }

    on research(topic: String) {
        set_budget(5000)                    // hard token cap
        transition("t", "researching")

        // think() reads tool declarations, sends to LLM as function-calling tools
        // LLM can call search() and summarize() — auto-dispatched to handlers above
        // Loops until final answer. Retries on rate limits.
        let facts = think("Research '{topic}' thoroughly. Use search tool.")

        transition("t", "analyzing")
        // Multi-turn: this think() shares conversation context with the previous one
        let summary = think("Synthesize your findings into 3 key insights.")

        transition("t", "done")
        findings.set(topic, map("summary", summary, "facts", facts))
        map("status", "done", "summary", summary, "tokens", tokens_used())
    }
}

// soma verify PROVES: every path reaches done or failed. No infinite loops.
```

## Agent Builtins

| Builtin                        | Description                                        |
|--------------------------------|----------------------------------------------------|
| `think(prompt)`                | Call LLM with tool-calling loop                    |
| `think_json(prompt)`           | Call LLM, return structured Map                    |
| `delegate(cell, signal, args)` | Call another agent's handler                       |
| `set_budget(max_tokens)`       | Hard cap on LLM token spend                        |
| `tokens_used()`                | Tokens consumed so far                             |
| `tokens_remaining()`           | Budget remaining (-1 = unlimited)                  |
| `remember(key, value)`         | Persistent agent memory                            |
| `recall(key)`                  | Recall from agent memory                           |
| `approve(action)`              | Human-in-the-loop gate                             |
| `trace()`                      | Full execution log (think/tool/transition events)  |
| `clear_context()`              | Reset multi-turn conversation                      |
| `clear_trace()`                | Reset trace log                                    |

Config via env vars (always override `[agent]` in soma.toml):
- `SOMA_LLM_KEY` or `OPENAI_API_KEY` — API key (required for hosted providers)
- `SOMA_LLM_URL` — endpoint (default: OpenAI; for ollama use `http://localhost:11434/v1/chat/completions`)
- `SOMA_LLM_MODEL` — model name
- `SOMA_LLM_RETRIES` — retry count (exponential backoff)
- `SOMA_LLM_MOCK=echo` — offline testing; `think()` returns the prompt verbatim. Use this in tests and CI.

For Anthropic (Claude) configuration in `soma.toml`:

```toml
[agent]
provider = "anthropic"
model    = "claude-opus-4-6"     # use the latest Claude 4.6 model IDs
key      = "${ANTHROPIC_API_KEY}" # ${VAR} expands env vars — never inline raw keys
```

## Cell Structure

```soma
cell AppName {
    face {
        signal create(payload: Map) -> Map
        promise "all items are tracked"
    }
    memory {
        items: Map<String, String> [persistent, consistent]
        invariant _slot_len <= 10000
    }
    state workflow {
        initial: draft
        draft -> active
        active -> completed
        * -> cancelled
    }
    every 30s { cleanup() }
    after 5s { initialize() }

    on create(payload: Map) {
        let id = to_string(next_id())
        items.set(id, payload)            // auto-serialized
        ensure items.len > 0
        map("id", id, "status", "ok")
    }

    on request(method: String, path: String, body: String) {
        let req = map("method", method, "path", path)
        match req {
            {method: "GET", path: "/"}                    -> html(home())
            {method: "GET", path: "/api/" + resource}     -> list(resource)
            {method: "POST", path: "/api/" + resource}    -> create(resource, body)
            {method: "DELETE", path: "/api/" + resource}   -> delete(resource)
            _ -> response(404, map("error", "not found"))
        }
    }
}
```

## Match Patterns (all composable)

```soma
match value {
    "literal"                    -> expr
    "a" || "b"                   -> expr          // or-pattern
    name                         -> use(name)     // variable binding
    "/api/" + rest               -> api(rest)     // string prefix
    {method: "GET", path}        -> get(path)     // map destructuring
    {method: "POST", path: "/api/" + r} -> post(r) // nested
    0..17                        -> "minor"       // range pattern
    n if n > 100                 -> "big"         // guard clause
    ()                           -> "null"        // unit/null
    _                            -> "default"     // wildcard
}
```

## Types

| Type      | Example                              | Notes                                                   |
|-----------|--------------------------------------|---------------------------------------------------------|
| Int       | `42`, `-1`                           | 64-bit                                                  |
| BigInt    | `signal compute(n: BigInt)`          | Arbitrary precision; declared in face / handler params  |
| Float     | `3.14`, `1.5e3`                      | 64-bit. `7 / 2 == 3.5` (use `floor` for truncation)     |
| String    | `"hello {name}"`, `"""raw"""`        | `{}` interpolation; triple-quote is raw                 |
| Bool      | `true`, `false`                      |                                                         |
| List      | `list(1, 2, 3)`                      | Ordered, length via `.len`                              |
| Map       | `map("key", val, "key2", val2)`      | MUST have even arg count                                |
| Unit      | `()`                                 | Null equivalent                                         |
| Duration  | `5s`, `1min`, `500ms`, `1h`, `2years`| Converts to ms internally                               |
| Record    | `User { name: "Alice", age: 30 }`    | Typed map with `_type` field                            |

`to_int("abc")` returns `()`, NOT `0`. Always null-check the result of `to_int`/`to_float` on user input.

## Storage (auto-serializes)

```soma
data.set("key", map("name", "Alice", "score", 95))  // auto-serialized
let user = data.get("key")                            // auto-deserialized
print(user.name)                                      // "Alice"
data.delete("key")
data.keys              // list of keys (no parens — these are properties)
data.values            // list of values
data.len               // count
data.has("key")        // bool — methods do take parens
```

Storage rules:
- `slot.get(k)` returns `()` for missing keys, NOT an error. Always check `if raw == ()` before calling `from_json` on raw strings.
- Storage auto-serializes Maps and Lists. `slot.set(k, some_map)` stores it directly; `slot.get(k)` returns the structured value. Don't wrap in `to_json` first — the lint flags it.
- Properties (`.keys`, `.values`, `.len`, `.entries`, `.all`) take NO parentheses. Methods (`.get(k)`, `.set(k, v)`, `.delete(k)`, `.has(k)`) take parentheses.

## Pipes

```soma
data |> filter(x => x.score > 50)
data |> map(x => x.name)
data |> sort_by("score", "desc")
data |> top(10)
data |> reduce(0, p => p.acc + p.val)
data |> group_by("dept")
data |> with("new_field", value)
```

## Error Handling

```soma
let result = try { risky() }
if result.error != () { handle_error() }

let value = try { risky() }?    // ? propagates errors
ensure balance >= 0              // postcondition
```

## CLI Commands

| Command                                 | Description                                                          |
|-----------------------------------------|----------------------------------------------------------------------|
| `soma run file.cell [args]`             | Execute the entry handler in the tree-walking interpreter            |
| `soma run file.cell --signal name [args]`| Execute a specific named handler                                    |
| `soma run file.cell --jit [args]`       | Execute via the bytecode VM                                          |
| `soma serve file.cell [-p port]`        | HTTP server (HTTP `:port`, WS `:port+1`, TCP bus `:port+2`)          |
| `soma serve file.cell --watch`          | Hot reload on `.cell` change                                         |
| `soma serve file.cell --join host:port` | Join an existing cluster via that seed's bus port                    |
| `soma check file.cell [--json]`         | Contract + property + signal checking                                |
| `soma fix file.cell`                    | Auto-repair (missing handlers, contradictory properties, typos)      |
| `soma lint file.cell [--json]`          | Anti-pattern checks                                                  |
| `soma verify file.cell [--json]`        | Prove state machine + distribution properties (CTL)                  |
| `soma describe file.cell`               | Rich JSON: handlers, memory, state, face, scheduled tasks, scale     |
| `soma test file.cell`                   | Run test cells (`cell test Foo { rules { assert ... } }`)            |
| `soma build file.cell [-o out.rs]`      | Generate Rust skeleton (native codegen frontend)                     |
| `soma init [name]`                      | Create project (`soma.toml`, `main.cell`, `.soma_env/`)              |
| `soma add pkg [--git URL] [--path DIR]` | Add dependency to `[dependencies]`                                   |
| `soma install`                          | Install dependencies                                                 |
| `soma props`                            | List registered properties + backends                                |
| `soma repl`                             | Interactive evaluator                                                |
| `soma ast file.cell`                    | Dump AST as JSON                                                     |
| `soma tokens file.cell`                 | Dump lexer token stream                                              |
| `soma env`                              | Show stdlib path, cache dir, resolved config                         |

## Links

- GitHub: https://github.com/soma-dev-lang/soma
- Agent guide: https://github.com/soma-dev-lang/soma/blob/main/AGENT.md
- Language reference: https://github.com/soma-dev-lang/soma/blob/main/SOMA_REFERENCE.md
- Spec: https://github.com/soma-dev-lang/soma/blob/main/SOMA_SPEC.md
- Paper: https://soma-lang.dev/paper