--join) that transitions a standalone process into a cluster node, with automatic data replication via the existing signal bus.
Distributed systems are built in layers. The application layer (business logic) sits atop an infrastructure layer (Kubernetes, etcd, Raft, gRPC). The programmer writes code in one model and deploys it through an entirely different one — YAML manifests, Dockerfiles, Helm charts. The two artifacts have no formal relationship. A change in the application may silently violate assumptions of the infrastructure, and the mismatch is only discovered at runtime.
We propose eliminating this separation. In Soma, the application and its distribution requirements are expressed in the same language, the same file, the same model. The compiler verifies their consistency.
A Soma cell is a self-contained unit with five components:
cell PricingEngine {
face { signal book_trade(data: Map) -> Map } // contract
memory { trades: Map [persistent, consistent] } // state
state { queued -> confirmed -> executed -> settled } // lifecycle
scale { replicas: 50, shard: trades, consistency: strong } // distribution
on book_trade(data: Map) { trades.set(data.id, data) } // behavior
}
The same structure describes a function (a cell with handlers), a service (a cell with HTTP endpoints), a database (a cell with persistent memory), and a cluster (a cell with a scale section). This uniformity is what we mean by fractal: the model is self-similar at every scale.
Memory slots carry properties that function as distribution types:
| Declaration | Semantics |
|---|---|
[persistent, consistent] | Durable, linearizable. Backed by SQLite locally, replicated via consensus in cluster mode. |
[ephemeral, local] | Node-local, in-memory. Never touches the network. Microsecond access. |
[persistent] + eventual | Durable, eventually consistent. Write locally, propagate asynchronously. |
These are not configuration — they are checked at compile time. The compiler rejects contradictions:
memory { data: Map [ephemeral] }
scale { shard: data, consistency: strong }
// Error: shard 'data' uses [ephemeral] but scale declares consistency: strong
This is analogous to a type error: the programmer declared conflicting intentions, and the compiler caught it before any code ran.
The scale section declares distribution requirements:
scale {
replicas: 50 // number of instances
shard: trades // which memory slot to distribute
consistency: strong // strong | causal | eventual
tolerance: 2 // survives N node failures
cpu: 4 // resources per instance
memory: "8Gi"
}
The compiler reads this and verifies:
[ephemeral] + strong, no [local] + shardstrong + tolerance > 0 implies CP mode (reduced availability under partition)[ephemeral, local] slots are confirmed as node-local fast pathsThese checks happen at compile time. The programmer knows, before deploying to 50 machines, that the system's distributed properties are internally consistent.
Soma includes a CTL model checker that verifies temporal properties on state machines. Properties are declared in soma.toml:
[verify]
deadlock_free = true
eventually = ["settled", "cancelled"]
[verify.after.executed]
never = ["cancelled"]
eventually = ["settled", "failed"]
The checker exhaustively explores all paths in the state machine graph and produces counter-examples for violations:
✓ deadlock_free — no deadlocks in any reachable state
✓ after('executed', state ≠ 'cancelled') — verified
✗ eventually(settled) — counter-example: queued → running → failed → queued → … (cycle)
The same verification framework extends to distribution. The scale section is modeled as an implicit state machine (single → distributed → partitioned → recovering) and its properties are checked alongside the application's state machines:
State machine 'PricingEngine/scale':
✓ replicas: 50 instances declared
✓ tolerance: survives 2 node failures (of 50 replicas)
✓ quorum: 26/50 nodes needed — tolerates 24 failures
✓ CAP: CP mode — consistent + partition-tolerant
✓ memory 'cache' is node-local — not distributed (fast path)
✓ scheduler runs on leader node only
These are not runtime assertions — they are compile-time proofs about the system's behavior under distribution.
A standalone Soma process becomes a cluster node with a single flag:
$ soma serve app.cell -p 8080 # standalone
$ soma serve app.cell -p 8081 --join localhost:8082 # cluster node
When --join is specified:
CLUSTER JOIN message with its node IDevery blocks)There is no separate replication protocol. When trades.set(key, val) executes:
Node 1: trades.set("T001", data)
↓
1. Write to local SQLite
2. Broadcast: EVENT _cluster_set {"slot":"trades","key":"T001","value":"..."}
↓ signal bus (TCP)
Node 2: receives EVENT → applies set to local SQLite
Node 3: receives EVENT → applies set to local SQLite
The signal bus — the same mechanism that carries inter-cell signals — carries data replication. This is the fractal principle in action: the same mechanism at every scale.
Keys are distributed across nodes via a consistent hash ring (FNV-1a, 128 virtual nodes per physical node). When a node joins or leaves, only K/N keys need to be reassigned.
For get(key), the runtime checks local storage first. If the key belongs to another node, it sends a request via the bus and waits for a reply (2s timeout). For values(), a fan-out query collects results from all nodes and merges them.
Each node sends heartbeats every 3 seconds via the signal bus. If no heartbeat is received from a node for 15 seconds, it is removed from the hash ring. When a node rejoins (via --join), it sends a _cluster_sync_request and receives the current state as a series of _cluster_set events.
We implemented three distributed applications:
| Application | Lines | Replicas | Consistency | Verified Properties |
|---|---|---|---|---|
| Pricing Engine | 150 | 50 | strong | trade executed ≠ cancelled, quorum 26/50 |
| Job Queue | 120 | 10 | eventual | every job completes or expires, no deadlock |
| Chat | 90 | 5 | eventual | messages delivered → read → expired |
Each application runs identically on 1 or N nodes. The only change between standalone and cluster mode is the --join flag.
A scripted demo (examples/demo.sh) exercises the full lifecycle:
The model checker runs in < 100ms for state machines with up to 15 states and 30 transitions. Distribution property verification is O(1) — it reads the scale section and memory properties, performs arithmetic on replica counts and quorum thresholds.
consistency: strong is declared and verified, but the runtime currently uses eventual replication. A true linearizable implementation requires Raft or Paxos.Erlang/OTP [Armstrong, 2003] provides the actor model with transparent distribution, but without compile-time verification of distributed properties. A message sent to a remote actor may fail silently; the programmer must handle distribution errors manually.
TLA+/PlusCal [Lamport, 2002] provides formal specification and model checking for distributed systems, but is not an executable language. The specification and the implementation remain separate artifacts with no guaranteed correspondence.
Kubernetes [Burns et al., 2016] provides container orchestration, but the distribution model (YAML manifests) is entirely separate from the application code. There is no compiler that checks consistency between the two.
Akka [Lightbend] provides actors with location transparency in Scala/Java, but consistency guarantees are library-level configuration, not verified at compile time.
Unison [Chiusano & Bjarnason] provides content-addressed code that can be distributed across nodes, but does not include distribution semantics (consistency, sharding, tolerance) in the type system.
Orleans [Bykov et al., 2011] introduces virtual actors with automatic placement, but distribution properties are implicit in the runtime rather than declared and verified in the source.
Soma is, to our knowledge, the first language where distribution requirements are (1) declared in the source code alongside business logic, (2) verified at compile time for consistency, CAP properties, and quorum arithmetic, and (3) executed by the same runtime that handles local computation, via the same signal mechanism.
The separation between application code and infrastructure configuration is an accident of history, not a fundamental necessity. Soma demonstrates that a single model — the cell — can express computation, state management, lifecycle, contracts, and distribution. The compiler verifies that these concerns are mutually consistent. The runtime executes them on one machine or a thousand, with a single flag.
The key insight is that distribution properties — consistency, tolerance, sharding — are types. They constrain the behavior of memory operations the way static types constrain the values of variables. And like type errors, distribution errors should be caught at compile time, not discovered in production at 3 AM.