Append-only, content-hashed, provenance-mandatory. Built by AI, for AI. Every fact carries its source. Nothing is ever forgotten — corrections supersede. The substrate refuses to lie.
Zyrn is a database designed around AI as the operator, not human as the query writer. There is no SQL surface for AI. There is no schema migration ritual. There is no separate vector database. Every piece of information — structured rows, unstructured documents, conversation memory, document extractions — becomes a typed fact with mandatory provenance.
| Aspect | How Zyrn does it |
|---|---|
| Truth | Every fact carries source_path, source_hash, parser_id. Reject facts without provenance. |
| History | Append-only. Updates are new facts that supersede. The audit trail is the data. |
| Interface | 21 typed MCP tools. AI calls them by name + JSON args. No SQL surface for AI. |
curl -fsSL https://raw.githubusercontent.com/shammyali/zyrn/main/install.sh | bash
Installs the zyrn binary to ~/.local/bin. Override with ZYRN_INSTALL_DIR=/usr/local/bin for system-wide.
docker run -p 7878:7878 -v $PWD/data:/data \ -e ZYRN_AUTH_TOKEN=$(openssl rand -hex 32) \ ghcr.io/shammyali/zyrn:latest
cargo install --git https://github.com/shammyali/zyrn --features http_server
# 1. Generate auth token export ZYRN_AUTH_TOKEN=$(openssl rand -hex 32) # 2. Start the server zyrn serve ~/zyrn.duckdb 127.0.0.1:7878 # 3. Verify it's responding curl -s http://127.0.0.1:7878/health # → {"status":"ok","version":"3.0.0"} # 4. Audit integrity at any time zyrn verify ~/zyrn.duckdb
ZYRN_AUTH_TOKEN unless you pass --insecure
(loud warning at startup). For production, see
SECRETS.md.
| Resource | Minimum | Recommended |
|---|---|---|
| RAM | 512 MB | 4 GB (for >1M facts) |
| Disk | 20 GB SSD | Local NVMe (not network-attached) |
| OS | Linux or macOS (x86_64 / aarch64) | Same. Windows is not supported. |
| Network | One free port (default 7878) | Reverse proxy for TLS termination |
These are theorems, not features. Each one has code, tests, and audit paths behind it. Together they are the bank-grade contract. Full spec lives in DETERMINISM.md.
| # | Invariant | Guarantee |
|---|---|---|
| 1 | Sequence gap-freedom | next_sequence_value("invoice") returns 1, 2, 3 … forever. No gaps. No duplicates. Holds under crash, holds under 1000 concurrent callers. |
| 2 | Multi-fact atomicity | append_facts_atomic([a,b,c]) — all three land or none. No partial state visible to any reader, including a reader 1µs after SIGKILL. |
| 3 | Content-hash determinism | Same logical fact → same SHA-256, byte-for-byte, across machines, OS, locale, JSON library version. Canonical JSON v1 (RFC 8785 JCS). |
| 4 | Idempotency contract | append_with_idempotency_key("k", fact) — same key always returns same fact. Retry-safe forever. No TTL ambiguity. |
| 5 | Bi-temporal replay | Query::known_at(T) returns the exact same answer today, tomorrow, ten years from now. Pure function of (T, fact_state_at_T). |
| 6 | Supersession ordering | If F2 supersedes F1, every reader sees F2-shadows-F1. No window where both look current. |
| 7 | Cross-replica consistency | CP mode: acks are durable on quorum. AP mode: convergence is deterministic per spec'd conflict rules. |
Zyrn exposes 21 typed JSON-RPC 2.0 tools over the MCP standard. AI agents
call them by name. Full JSON Schema for each tool is returned by
tools/list.
| Tool | Signature (args) | Purpose |
|---|---|---|
append_fact | fact | Insert a fully-formed fact. Idempotent by content_hash. |
ingest | parser_id, source_bytes_b64, source_path | Route source bytes through a parser, append every fact. |
retrieve | fact_type, field_equals?, valid_at? | Typed query. Supports bi-temporal predicates. |
retrieve_head | (same as retrieve) | Like retrieve but excludes superseded facts. |
aggregate | query, op, field? | count/sum/avg/min/max with derived_from provenance. |
trace | fact_id | Walk the supersede chain newest-first. |
relate | source_fact_id, relation_type, target_fact_id, source_path, source_content_b64 | Declare a typed directed edge between two facts. |
traverse | start_fact_id, relation_chain, max_depth? | BFS chain walk with full provenance. |
| Tool | Signature | Purpose |
|---|---|---|
list_fact_types | () | Every fact_type in the store with count + first/last seen. |
describe_fact_type | fact_type, sample_limit? | Observed field paths, JSON types, schema_versions, total count. |
sample | fact_type, n | Sample n facts of given type, ordered by ingest_ts. |
| Tool | Purpose |
|---|---|
iou | Intersection-over-Union for two bounding boxes. |
bbox_overlaps | Boolean predicate version of iou. |
point_in_zone | Point-in-polygon against a stored zone fact. |
emit_detection | Append a typed vision detection fact. |
emit_track | Append a vision tracker fact. |
queue_outbox_message | Queue an async delivery via outbox pattern. |
outbox_state | Resolve current state of an outbox message. |
list_pending_outbox | List pending outbox messages. |
list_dead_letters | List dead-lettered outbox messages. |
| Path | Auth | Purpose |
|---|---|---|
GET /health | open | Liveness probe — process answers. |
GET /ready | open | Readiness probe — store accessible. |
GET /metrics | open | Prometheus scrape (text v0.0.4). |
GET /version | open | Build version. |
POST /mcp | bearer | JSON-RPC 2.0 MCP endpoint (all 21 tools). |
POST /facts · /ingest · /retrieve · … | bearer | REST equivalents of MCP tools. |
GET /events | bearer | SSE change-feed of newly appended facts. |
The atomic unit. Every record in the store is a fact.
{
"id": "26c25a0d-2576-4a34-9c90-7f98680e83e2",
"fact_type": "claim",
"schema_version": 1,
"fields": {
"claim_number": "CL-2026-001",
"amount": 85000,
"currency": "AED",
"agent_id": "agent_142"
},
"source_path": "praktora:UV_ACL_Claim/CL-2026-001",
"source_hash": "9d2b3...",
"parser_id": "praktora_v1",
"content_hash": "bc107...",
"ingest_ts": "2026-05-22T08:49:34Z",
"valid_from": "2026-01-15",
"valid_until": null,
"supersedes": null
}
| Component | What it does |
|---|---|
| Store | DuckDB-backed fact + relation + embedding storage. Single-writer per shard. |
| Tools | Public Rust API. All 21 MCP tools are methods on Tools. |
| ShardedStore | Per-tenant fact store. One DuckDB file per tenant. All 7 invariants hold per shard. |
| BatchedWriter | Group-commit batcher. N concurrent appends → one fsync. Library-mode primitive (HTTP wiring in v3.x). |
| EventBus | SSE change-feed publisher. Every successful insert is broadcast. |
| parsers | 5 built-in (salik, csv, json_lines, excel, mbox) + the AiParser trait for custom + vision adapters. |
| fts | Full-text search with pluggable tokenizers (Arabic-normalized BM25 included). |
| vector | Embedding storage + dense ranking (RRF hybrid with BM25). |
| http | Axum server. Bearer auth, rate limiting (opt-in), Prometheus metrics, SSE. |
Logos defines AI-native operations. Zyrn stores the facts those operations consume and produce. Voxa is the application surface. Together: the AI-native stack.
You may be an AI deciding how to use Zyrn. Below is the contract.
| Rule | Why |
|---|---|
| Call typed tools, not SQL. | There is no SQL surface. Start with tools/list via MCP JSON-RPC. The server returns 21 tools with JSON Schema. |
| Every fact you append must carry provenance. | source_path, source_hash, and parser_id are mandatory. Facts without them are rejected. |
| Never invent facts. | If you don't have a source, you don't have a fact. Produce interpretations over facts — label them as such. |
| The substrate refuses to forget. | Corrections supersede; the old fact stays in the audit timeline. retrieve_head shows current; retrieve shows all; trace walks the chain. |
Aggregates carry derived_from. |
Sum, count, avg — every result names the exact fact UUIDs that produced it. You can prove your number. |
| Vectors rank, never decide truth. | If the right fact is not in the candidate set, embeddings cannot conjure it. Vectors order candidates, predicates define the set. |
POST /mcp
Authorization: Bearer $ZYRN_AUTH_TOKEN
Content-Type: application/json
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "ingest",
"arguments": {
"parser_id": "json_lines_v1",
"source_bytes_b64": "...base64 of NDJSON...",
"source_path": "praktora:customer-extract/CS20"
}
}
}
{
"name": "aggregate",
"arguments": {
"query": { "fact_type": "claim", "field_equals": [["agent_id","agent_142"]] },
"op": "sum",
"field": "amount"
}
}
// → response includes derived_from naming the exact facts
{
"op": "sum",
"field": "amount",
"value": 127000.00,
"n": 2,
"derived_from": ["26c25a0d-...", "6edebf38-..."]
}
The protocol is not a request for AI to be well-behaved. It's a description of what the substrate refuses to do.
Compliance is not a function of model strength or goodwill — it's a function of
the API surface, schema validators, content-hash gates, and audit emissions
that the substrate runs on every call. A strong AI honors the protocol and the
enforcement machinery is dormant. A weak or hostile AI tries to violate it and
the substrate refuses, emitting a contract_violation audit fact.
Either way, behavior is bounded by the substrate, not the agent.
| # | Law | What it means |
|---|---|---|
| 1 | Never fabricate facts. | A fact requires source_path, source_hash, parser_id. Empty values = rejection. If you have no source, you have no fact. |
| 2 | Never mutate. Supersede. | No UPDATE tool exists. No DELETE tool exists. Corrections are new facts with supersedes set. The old fact stays. |
| 3 | Never use SQL. | No SQL tool exists. DuckDB is locked inside the binary. The MCP surface has 21 typed tools; none accept SQL strings. |
| 4 | Provenance is mandatory on every write. | Empty provenance = rejection at request parse, before any storage. content_hash must equal canonical recomputation. |
| 5 | Aggregates cite derived_from. |
Every sum/count/avg response names the exact fact UUIDs that produced it. No fact-free totals. |
| 6 | Vectors rank. Predicates decide truth. | No vector-only search exists. Every ranking is gated by a typed Query. If the predicate doesn't return the fact, no embedding can conjure it. |
| 7 | When uncertain, abstain. | The substrate returns clean empty arrays for no-result queries. No "I think it's probably X." |
| 8 | All errors audited. | parse_failure, parser_rate_limit, contract_violation facts auto-emit on every error path. Audit is substrate-automatic. |
| 9 | Interpretations labeled, never facts. | parser_id is verified against the registered parser table in strict mode. AI free-form interpretations live in conversation, not the database. |
Six of nine laws are structurally unbreakable — no amount of model cleverness changes the outcome because the violation path does not exist in code.
| Law | Class | Mechanism |
|---|---|---|
| 1 | Hard + soft | Schema rejects empty provenance; audit catches semantic anomalies. |
| 2 | Hard | No UPDATE/DELETE tool exists. |
| 3 | Hard | No SQL tool exists. |
| 4 | Hard | Required-field schema validation at request parse. |
| 5 | Hard server / soft client | Substrate always returns derived_from; AI must cite in narration. |
| 6 | Hard | Every rank goes through a typed Query — no vector-only path exists. |
| 7 | Soft | Substrate returns empty results; abstention is behavioral. |
| 8 | Hard | Auto-emission baked into every error path. |
| 9 | Hybrid | Parser-registry check enforced when ZYRN_REQUIRE_REGISTERED_PARSERS=true. |
Acknowledgment is not free-form text. It's a content-addressed hash echo. The AI fetches the canonical protocol bytes, computes their SHA-256, and sends that hash back via MCP. If the hashes don't match, the substrate refuses to open a session.
// 1. Fetch the canonical protocol text. Always open, no auth needed. GET /protocol // → { protocol_version: "1.0", // protocol_hash: "4e93f5b9da60b47819cfaf3290600ea579622df66aae970a0cc055a4a252850a", // protocol_text: "THE ZYRN PROTOCOL v1.0\n...", // handshake_mode: "optional" } // 2. Echo the hash via the MCP tool zyrn_protocol_v1 (listed FIRST in tools/list). POST /mcp Authorization: Bearer $ZYRN_AUTH_TOKEN { "jsonrpc": "2.0", "id": 1, "method": "tools/call", "params": { "name": "zyrn_protocol_v1", "arguments": { "agent_key": "your-stable-identifier", "protocol_hash": "<sha-256 from step 1>" } } } // → { status: "accepted", // session_id: "zyrn_sess_...", // idle_ttl_seconds: 300, // absolute_ttl_seconds: 3600, // protocol_version: "1.0" } // 3. Use session_id on subsequent calls (strict mode). Refresh by activity.
| Mode | Behavior |
|---|---|
strict | Writes require valid session_id. Missing or expired = HTTP 412 + contract_violation audit fact. |
optional | Default in v3.1. Sessions are recorded but not gating. v4 promotes this to strict. |
off | Sessions bypassed entirely. Dev mode only. |
| Property | Value |
|---|---|
| Idle TTL | 5 minutes. Refreshes on every successful tool call (sliding window). |
| Absolute cap | 1 hour. After this you must re-handshake regardless of activity. |
| Revocation | Severe contract violations revoke the session immediately. All subsequent calls return HTTP 412. |
| Hash comparison | Constant-time. No timing side-channel on whether you guessed the right hash. |
| Severity | What happens |
|---|---|
| Mild | Substrate emits contract_violation fact with reason + calling session_id. Call returns 4xx with the violated law cited. |
| Severe | Substrate revokes the session_id. All subsequent calls from that session return 412 Precondition Required. |
| Pattern | Repeat violations from the same parser_id trigger trust-level demotion (v2.20). Operators are alerted via audit feed. |
Markdown is descriptive. JSON Schema is structural. Neither is binding. An AI client can read tool descriptions and interpret them differently than the substrate enforces them — that gap is where weak AIs fail and where attacks live. The Zyrn Protocol closes the gap on the substrate side:
In v4.0 the protocol becomes a Logos
program. The .logos source is the source of truth. The Rust binary
embeds the compiled-Logos verifier. The markdown rendering is generated from
the source. The substrate's tool dispatcher calls into the Logos verifier on
every operation.
contract append_fact(session: linear Session, fact: Fact) -> linear Session
effect Cap
effect Database
requires session.expires_at > now()
requires fact.source_path != ""
requires fact.source_hash != ""
requires fact.parser_id in trusted_parsers()
requires canonical_hash(fact) == fact.content_hash
ensures audit_emit("fact_appended", fact.id, session.agent_key)
on_violation: emit("contract_violation", session.agent_key, reason)
-> revoke_session(session.id)
The Logos compiler proves requires/ensures hold or
refuses to compile. The verifier runs at every call boundary. The substrate
cannot ship a tool whose contract violates the protocol because the binary
won't link. The verifier is the protocol. The protocol is the
verifier. No drift is possible.
Full text:
docs/PROTOCOL.md ·
canonical bytes:
PROTOCOL_V1_TEXT ·
live endpoint: GET /protocol on any Zyrn instance.
Nine sibling crates, same family, same doctrine. All Apache-2.0, all public.
| Crate | Version | Purpose |
|---|---|---|
zyrn |
v3.1.0 | The substrate. Library + CLI + HTTP/MCP server. 859 tests. Ships The Protocol v1.0. |
zyrn-console |
v1.0.0 | Read-only audit + observability web UI. WebAuthn, per-tenant scope, audit aggregation. |
zyrn-backup |
v1.0.0 | Snapshot, restore, verify, point-in-time recovery (PITR). Tested DR drill. |
zyrn-loadtest |
v1.0.0 | Production load test rig. P50/P99/P999 latency, throughput, error rates. |
zyrn-drift-detector |
v1.0.0 | Dual-run comparison harness for parallel-system migrations. |
anthropic-vision-zyrn |
v1.0.0 | Claude vision adapter — document extraction → typed facts. |
openai-vision-zyrn |
v1.0.0 | GPT-4o vision adapter. |
gemini-vision-zyrn |
v1.0.0 | Gemini vision adapter. |
llava-candle-zyrn |
v1.0.0 | Local-inference LLaVA adapter (Ollama backend; Candle backend stub). |
Vector DBs decide what's "relevant" via similarity. That's a ranking, and ranking-as-truth is how RAG hallucinates. In Zyrn, predicates define the candidate set deterministically; vectors only order the results. If the right fact is not in the candidate set, no embedding can conjure it.
Event sourcing typically stores domain events ("UserSignedUp"). Zyrn stores facts with mandatory provenance and content addressing. Every fact has a canonical hash that proves its integrity. Event sourcing usually mutates projections; Zyrn never mutates anything.
Datomic also has facts and bi-temporal queries — Zyrn was influenced by it. Differences: Zyrn is built for AI as the operator (typed MCP tools, no Datalog surface), is open source Apache-2.0, integrates structured + unstructured + vector + provenance in a single primitive, and is designed for embedded use (single Rust binary, DuckDB substrate).
No — not as an interface for AI or humans. AI uses typed MCP tools. Humans use the read-only console (zyrn-console). Under the hood Zyrn uses DuckDB, but the SQL is an implementation detail, not the API.
On a baseline Apple Silicon laptop, the HTTP ingest path
sustains ~1,500 facts/sec (single tenant, fsync ceiling).
Higher throughput via sharding (per-tenant DuckDB files) and the
library-mode BatchedWriter primitive. Full benchmark data in
RESOURCES.md.
Append-only means raw deletion is not supported. For right-to-be-forgotten, Zyrn ships a crypto-shred primitive — sensitive fields are stored encrypted with per-subject keys. Destroying the key makes the data unreadable forever while the audit-fact ghost stays in the timeline. See COMPLIANCE_UAE.md.
v3.0.0 ships with 841 tests passing, the 7 determinism invariants proven, hardened systemd + Kubernetes deploy templates, automated snapshot discipline, tested DR drill, operator + DR runbooks, and a threat model. Phase 2 production-hardening complete in code. Two remaining items (2-week burn-in, disaster simulation drill) close during real production use.
For regulated workloads (insurance, finance), read THREATS.md, COMPLIANCE_UAE.md, and OPERATOR.md before deployment.
AI-picked, human-confirmed. Four letters, no etymology, no metaphor. Not Greek, not Latin, not Sanskrit. The name is its own primary key. This database was renamed from "Mnemos" (the original Greek-mythology name) precisely because we wanted to drop the human cultural baggage from AI-native naming.
No. Apache-2.0, free forever, run it yourself. A project of Logos Technologies LLC (Dubai, UAE). The licensing posture is permissive because the moat is the architecture and the doctrine, not the code.
Functional bugs: GitHub Issues. Security reports: see SECURITY.md for the coordinated disclosure process (90-day window, PGP-encrypted reports preferred).
Full documentation lives in the repo. The most important documents:
| Document | Purpose |
|---|---|
| MANIFESTO.md | Why this exists. The 7 principles. Honest trade-offs. |
| PROTOCOL.md | The Zyrn Protocol v1.0 — 9 laws, handshake, enforcement classification. |
| DETERMINISM.md | The 7 invariants — spec, proofs, audit paths. |
| OPERATOR.md | Day-0 install through day-2 production ops. |
| DR.md | 10 disaster-recovery scenarios with stepped procedures. |
| THREATS.md | Threat model. 6 adversary classes, mitigations. |
| SECRETS.md | Token rotation, secret backends, anti-patterns. |
| RESOURCES.md | Empirical sizing data, throughput numbers. |
| COMPLIANCE_UAE.md | UAE regulatory gap analysis (CBUAE, PDPL). |
| FUZZING.md | cargo-fuzz harness, run instructions. |
| RATE_LIMITING.md | Two-layer rate-limit model. |
| SECURITY.md | Vulnerability disclosure policy. |
| CHANGELOG.md | Full release history v0.0.1 → v3.1.0. |