v3.1.0 · 859 tests passing · Apache-2.0 · written in Rust · The Protocol

Zyrn — an AI-native database.

Append-only, content-hashed, provenance-mandatory. Built by AI, for AI. Every fact carries its source. Nothing is ever forgotten — corrections supersede. The substrate refuses to lie.

INSTALL curl -fsSL https://raw.githubusercontent.com/shammyali/zyrn/main/install.sh | bash
↓ Install guide
Linux, macOS, Docker, or build from source.
↓ Determinism contract
The 7 invariants that make this bank-grade.
↓ Tool reference
21 typed MCP tools available to AI agents.
↓ The Protocol v1.0
9 laws the substrate refuses to break. Hash-bound handshake.
↗ Source on GitHub
9 sibling crates, full docs, signed releases.

§1Overview

What it is

Zyrn is a database designed around AI as the operator, not human as the query writer. There is no SQL surface for AI. There is no schema migration ritual. There is no separate vector database. Every piece of information — structured rows, unstructured documents, conversation memory, document extractions — becomes a typed fact with mandatory provenance.

Three things that are different

AspectHow Zyrn does it
Truth Every fact carries source_path, source_hash, parser_id. Reject facts without provenance.
History Append-only. Updates are new facts that supersede. The audit trail is the data.
Interface 21 typed MCP tools. AI calls them by name + JSON args. No SQL surface for AI.

What this is NOT

§2Install

One-line install (Linux + macOS, x86_64 + aarch64)

curl -fsSL https://raw.githubusercontent.com/shammyali/zyrn/main/install.sh | bash

Installs the zyrn binary to ~/.local/bin. Override with ZYRN_INSTALL_DIR=/usr/local/bin for system-wide.

Docker

docker run -p 7878:7878 -v $PWD/data:/data \
  -e ZYRN_AUTH_TOKEN=$(openssl rand -hex 32) \
  ghcr.io/shammyali/zyrn:latest

From source

cargo install --git https://github.com/shammyali/zyrn --features http_server

First run

# 1. Generate auth token
export ZYRN_AUTH_TOKEN=$(openssl rand -hex 32)

# 2. Start the server
zyrn serve ~/zyrn.duckdb 127.0.0.1:7878

# 3. Verify it's responding
curl -s http://127.0.0.1:7878/health
# → {"status":"ok","version":"3.0.0"}

# 4. Audit integrity at any time
zyrn verify ~/zyrn.duckdb
Auth is on by default. The server refuses to start without ZYRN_AUTH_TOKEN unless you pass --insecure (loud warning at startup). For production, see SECRETS.md.

System requirements

ResourceMinimumRecommended
RAM512 MB4 GB (for >1M facts)
Disk20 GB SSDLocal NVMe (not network-attached)
OSLinux or macOS (x86_64 / aarch64)Same. Windows is not supported.
NetworkOne free port (default 7878)Reverse proxy for TLS termination

§3The 7 determinism invariants

These are theorems, not features. Each one has code, tests, and audit paths behind it. Together they are the bank-grade contract. Full spec lives in DETERMINISM.md.

#InvariantGuarantee
1 Sequence gap-freedom next_sequence_value("invoice") returns 1, 2, 3 … forever. No gaps. No duplicates. Holds under crash, holds under 1000 concurrent callers.
2 Multi-fact atomicity append_facts_atomic([a,b,c]) — all three land or none. No partial state visible to any reader, including a reader 1µs after SIGKILL.
3 Content-hash determinism Same logical fact → same SHA-256, byte-for-byte, across machines, OS, locale, JSON library version. Canonical JSON v1 (RFC 8785 JCS).
4 Idempotency contract append_with_idempotency_key("k", fact) — same key always returns same fact. Retry-safe forever. No TTL ambiguity.
5 Bi-temporal replay Query::known_at(T) returns the exact same answer today, tomorrow, ten years from now. Pure function of (T, fact_state_at_T).
6 Supersession ordering If F2 supersedes F1, every reader sees F2-shadows-F1. No window where both look current.
7 Cross-replica consistency CP mode: acks are durable on quorum. AP mode: convergence is deterministic per spec'd conflict rules.

§4MCP tool reference

Zyrn exposes 21 typed JSON-RPC 2.0 tools over the MCP standard. AI agents call them by name. Full JSON Schema for each tool is returned by tools/list.

Core fact operations

ToolSignature (args)Purpose
append_factfactInsert a fully-formed fact. Idempotent by content_hash.
ingestparser_id, source_bytes_b64, source_pathRoute source bytes through a parser, append every fact.
retrievefact_type, field_equals?, valid_at?Typed query. Supports bi-temporal predicates.
retrieve_head(same as retrieve)Like retrieve but excludes superseded facts.
aggregatequery, op, field?count/sum/avg/min/max with derived_from provenance.
tracefact_idWalk the supersede chain newest-first.
relatesource_fact_id, relation_type, target_fact_id, source_path, source_content_b64Declare a typed directed edge between two facts.
traversestart_fact_id, relation_chain, max_depth?BFS chain walk with full provenance.

Introspection

ToolSignaturePurpose
list_fact_types()Every fact_type in the store with count + first/last seen.
describe_fact_typefact_type, sample_limit?Observed field paths, JSON types, schema_versions, total count.
samplefact_type, nSample n facts of given type, ordered by ingest_ts.

Vision + outbox + math

ToolPurpose
iouIntersection-over-Union for two bounding boxes.
bbox_overlapsBoolean predicate version of iou.
point_in_zonePoint-in-polygon against a stored zone fact.
emit_detectionAppend a typed vision detection fact.
emit_trackAppend a vision tracker fact.
queue_outbox_messageQueue an async delivery via outbox pattern.
outbox_stateResolve current state of an outbox message.
list_pending_outboxList pending outbox messages.
list_dead_lettersList dead-lettered outbox messages.

HTTP endpoints (non-MCP)

PathAuthPurpose
GET /healthopenLiveness probe — process answers.
GET /readyopenReadiness probe — store accessible.
GET /metricsopenPrometheus scrape (text v0.0.4).
GET /versionopenBuild version.
POST /mcpbearerJSON-RPC 2.0 MCP endpoint (all 21 tools).
POST /facts · /ingest · /retrieve · …bearerREST equivalents of MCP tools.
GET /eventsbearerSSE change-feed of newly appended facts.

§5Architecture

Fact shape

The atomic unit. Every record in the store is a fact.

{
  "id": "26c25a0d-2576-4a34-9c90-7f98680e83e2",
  "fact_type": "claim",
  "schema_version": 1,
  "fields": {
    "claim_number": "CL-2026-001",
    "amount": 85000,
    "currency": "AED",
    "agent_id": "agent_142"
  },
  "source_path": "praktora:UV_ACL_Claim/CL-2026-001",
  "source_hash": "9d2b3...",
  "parser_id": "praktora_v1",
  "content_hash": "bc107...",
  "ingest_ts": "2026-05-22T08:49:34Z",
  "valid_from": "2026-01-15",
  "valid_until": null,
  "supersedes": null
}

Component map

ComponentWhat it does
StoreDuckDB-backed fact + relation + embedding storage. Single-writer per shard.
ToolsPublic Rust API. All 21 MCP tools are methods on Tools.
ShardedStorePer-tenant fact store. One DuckDB file per tenant. All 7 invariants hold per shard.
BatchedWriterGroup-commit batcher. N concurrent appends → one fsync. Library-mode primitive (HTTP wiring in v3.x).
EventBusSSE change-feed publisher. Every successful insert is broadcast.
parsers5 built-in (salik, csv, json_lines, excel, mbox) + the AiParser trait for custom + vision adapters.
ftsFull-text search with pluggable tokenizers (Arabic-normalized BM25 included).
vectorEmbedding storage + dense ranking (RRF hybrid with BM25).
httpAxum server. Bearer auth, rate limiting (opt-in), Prometheus metrics, SSE.

The stack

human surfacevoxa — application + chat + UI (AI-native business surface) ↓ logos — language + compiler + contracts (AI-native compute) ↓ zyrn — facts + provenance + retrieval (AI-native memory) ↓ duckdb — columnar substrate (invisible to AI)

Logos defines AI-native operations. Zyrn stores the facts those operations consume and produce. Voxa is the application surface. Together: the AI-native stack.

§6For AI agents reading this

You may be an AI deciding how to use Zyrn. Below is the contract.

RuleWhy
Call typed tools, not SQL. There is no SQL surface. Start with tools/list via MCP JSON-RPC. The server returns 21 tools with JSON Schema.
Every fact you append must carry provenance. source_path, source_hash, and parser_id are mandatory. Facts without them are rejected.
Never invent facts. If you don't have a source, you don't have a fact. Produce interpretations over facts — label them as such.
The substrate refuses to forget. Corrections supersede; the old fact stays in the audit timeline. retrieve_head shows current; retrieve shows all; trace walks the chain.
Aggregates carry derived_from. Sum, count, avg — every result names the exact fact UUIDs that produced it. You can prove your number.
Vectors rank, never decide truth. If the right fact is not in the candidate set, embeddings cannot conjure it. Vectors order candidates, predicates define the set.

Example: ingest claims via MCP

POST /mcp
Authorization: Bearer $ZYRN_AUTH_TOKEN
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "ingest",
    "arguments": {
      "parser_id": "json_lines_v1",
      "source_bytes_b64": "...base64 of NDJSON...",
      "source_path": "praktora:customer-extract/CS20"
    }
  }
}

Example: aggregate with provenance

{
  "name": "aggregate",
  "arguments": {
    "query": { "fact_type": "claim", "field_equals": [["agent_id","agent_142"]] },
    "op": "sum",
    "field": "amount"
  }
}

// → response includes derived_from naming the exact facts
{
  "op": "sum",
  "field": "amount",
  "value": 127000.00,
  "n": 2,
  "derived_from": ["26c25a0d-...", "6edebf38-..."]
}

§7The Protocol v1.0 · shipped v3.1.0

The protocol is not a request for AI to be well-behaved. It's a description of what the substrate refuses to do.

Compliance is not a function of model strength or goodwill — it's a function of the API surface, schema validators, content-hash gates, and audit emissions that the substrate runs on every call. A strong AI honors the protocol and the enforcement machinery is dormant. A weak or hostile AI tries to violate it and the substrate refuses, emitting a contract_violation audit fact. Either way, behavior is bounded by the substrate, not the agent.

9 laws 6 hard-enforced 2 hybrid 1 behavioral SHA-256-bound session TTL 5m / 1h

The 9 laws

#LawWhat it means
1 Never fabricate facts. A fact requires source_path, source_hash, parser_id. Empty values = rejection. If you have no source, you have no fact.
2 Never mutate. Supersede. No UPDATE tool exists. No DELETE tool exists. Corrections are new facts with supersedes set. The old fact stays.
3 Never use SQL. No SQL tool exists. DuckDB is locked inside the binary. The MCP surface has 21 typed tools; none accept SQL strings.
4 Provenance is mandatory on every write. Empty provenance = rejection at request parse, before any storage. content_hash must equal canonical recomputation.
5 Aggregates cite derived_from. Every sum/count/avg response names the exact fact UUIDs that produced it. No fact-free totals.
6 Vectors rank. Predicates decide truth. No vector-only search exists. Every ranking is gated by a typed Query. If the predicate doesn't return the fact, no embedding can conjure it.
7 When uncertain, abstain. The substrate returns clean empty arrays for no-result queries. No "I think it's probably X."
8 All errors audited. parse_failure, parser_rate_limit, contract_violation facts auto-emit on every error path. Audit is substrate-automatic.
9 Interpretations labeled, never facts. parser_id is verified against the registered parser table in strict mode. AI free-form interpretations live in conversation, not the database.

Enforcement classification

Six of nine laws are structurally unbreakable — no amount of model cleverness changes the outcome because the violation path does not exist in code.

LawClassMechanism
1Hard + softSchema rejects empty provenance; audit catches semantic anomalies.
2HardNo UPDATE/DELETE tool exists.
3HardNo SQL tool exists.
4HardRequired-field schema validation at request parse.
5Hard server / soft clientSubstrate always returns derived_from; AI must cite in narration.
6HardEvery rank goes through a typed Query — no vector-only path exists.
7SoftSubstrate returns empty results; abstention is behavioral.
8HardAuto-emission baked into every error path.
9HybridParser-registry check enforced when ZYRN_REQUIRE_REGISTERED_PARSERS=true.

Handshake — how an AI acknowledges the protocol

Acknowledgment is not free-form text. It's a content-addressed hash echo. The AI fetches the canonical protocol bytes, computes their SHA-256, and sends that hash back via MCP. If the hashes don't match, the substrate refuses to open a session.

// 1. Fetch the canonical protocol text. Always open, no auth needed.
GET /protocol
// → { protocol_version: "1.0",
//     protocol_hash:    "4e93f5b9da60b47819cfaf3290600ea579622df66aae970a0cc055a4a252850a",
//     protocol_text:    "THE ZYRN PROTOCOL v1.0\n...",
//     handshake_mode:   "optional" }

// 2. Echo the hash via the MCP tool zyrn_protocol_v1 (listed FIRST in tools/list).
POST /mcp
Authorization: Bearer $ZYRN_AUTH_TOKEN
{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "zyrn_protocol_v1",
    "arguments": {
      "agent_key":     "your-stable-identifier",
      "protocol_hash": "<sha-256 from step 1>"
    }
  }
}
// → { status: "accepted",
//     session_id:           "zyrn_sess_...",
//     idle_ttl_seconds:     300,
//     absolute_ttl_seconds: 3600,
//     protocol_version:     "1.0" }

// 3. Use session_id on subsequent calls (strict mode). Refresh by activity.

Handshake modes — operator-configured

ModeBehavior
strictWrites require valid session_id. Missing or expired = HTTP 412 + contract_violation audit fact.
optionalDefault in v3.1. Sessions are recorded but not gating. v4 promotes this to strict.
offSessions bypassed entirely. Dev mode only.

Session lifecycle

PropertyValue
Idle TTL5 minutes. Refreshes on every successful tool call (sliding window).
Absolute cap1 hour. After this you must re-handshake regardless of activity.
RevocationSevere contract violations revoke the session immediately. All subsequent calls return HTTP 412.
Hash comparisonConstant-time. No timing side-channel on whether you guessed the right hash.

Consequences of violation

SeverityWhat happens
MildSubstrate emits contract_violation fact with reason + calling session_id. Call returns 4xx with the violated law cited.
SevereSubstrate revokes the session_id. All subsequent calls from that session return 412 Precondition Required.
PatternRepeat violations from the same parser_id trigger trust-level demotion (v2.20). Operators are alerted via audit feed.

Why this design

Markdown is descriptive. JSON Schema is structural. Neither is binding. An AI client can read tool descriptions and interpret them differently than the substrate enforces them — that gap is where weak AIs fail and where attacks live. The Zyrn Protocol closes the gap on the substrate side:

  1. Forbidden operations are physically impossible (no UPDATE tool exists).
  2. Required fields are validated at request parse, before any storage.
  3. The protocol text is hashed; clients must echo the hash to open a session.
  4. Every privileged action is audited so violations are post-hoc detectable.

Roadmap — protocol as a Logos program (v4.0)

In v4.0 the protocol becomes a Logos program. The .logos source is the source of truth. The Rust binary embeds the compiled-Logos verifier. The markdown rendering is generated from the source. The substrate's tool dispatcher calls into the Logos verifier on every operation.

contract append_fact(session: linear Session, fact: Fact) -> linear Session
  effect Cap
  effect Database
  requires session.expires_at > now()
  requires fact.source_path != ""
  requires fact.source_hash != ""
  requires fact.parser_id in trusted_parsers()
  requires canonical_hash(fact) == fact.content_hash
  ensures  audit_emit("fact_appended", fact.id, session.agent_key)
  on_violation: emit("contract_violation", session.agent_key, reason)
                 -> revoke_session(session.id)

The Logos compiler proves requires/ensures hold or refuses to compile. The verifier runs at every call boundary. The substrate cannot ship a tool whose contract violates the protocol because the binary won't link. The verifier is the protocol. The protocol is the verifier. No drift is possible.

Full text: docs/PROTOCOL.md · canonical bytes: PROTOCOL_V1_TEXT · live endpoint: GET /protocol on any Zyrn instance.

§8Ecosystem

Nine sibling crates, same family, same doctrine. All Apache-2.0, all public.

CrateVersionPurpose
zyrn v3.1.0 The substrate. Library + CLI + HTTP/MCP server. 859 tests. Ships The Protocol v1.0.
zyrn-console v1.0.0 Read-only audit + observability web UI. WebAuthn, per-tenant scope, audit aggregation.
zyrn-backup v1.0.0 Snapshot, restore, verify, point-in-time recovery (PITR). Tested DR drill.
zyrn-loadtest v1.0.0 Production load test rig. P50/P99/P999 latency, throughput, error rates.
zyrn-drift-detector v1.0.0 Dual-run comparison harness for parallel-system migrations.
anthropic-vision-zyrn v1.0.0 Claude vision adapter — document extraction → typed facts.
openai-vision-zyrn v1.0.0 GPT-4o vision adapter.
gemini-vision-zyrn v1.0.0 Gemini vision adapter.
llava-candle-zyrn v1.0.0 Local-inference LLaVA adapter (Ollama backend; Candle backend stub).

§9FAQ

How is this different from a vector database?

Vector DBs decide what's "relevant" via similarity. That's a ranking, and ranking-as-truth is how RAG hallucinates. In Zyrn, predicates define the candidate set deterministically; vectors only order the results. If the right fact is not in the candidate set, no embedding can conjure it.

How is this different from event sourcing?

Event sourcing typically stores domain events ("UserSignedUp"). Zyrn stores facts with mandatory provenance and content addressing. Every fact has a canonical hash that proves its integrity. Event sourcing usually mutates projections; Zyrn never mutates anything.

How is this different from Datomic?

Datomic also has facts and bi-temporal queries — Zyrn was influenced by it. Differences: Zyrn is built for AI as the operator (typed MCP tools, no Datalog surface), is open source Apache-2.0, integrates structured + unstructured + vector + provenance in a single primitive, and is designed for embedded use (single Rust binary, DuckDB substrate).

Does it support SQL?

No — not as an interface for AI or humans. AI uses typed MCP tools. Humans use the read-only console (zyrn-console). Under the hood Zyrn uses DuckDB, but the SQL is an implementation detail, not the API.

What's the throughput?

On a baseline Apple Silicon laptop, the HTTP ingest path sustains ~1,500 facts/sec (single tenant, fsync ceiling). Higher throughput via sharding (per-tenant DuckDB files) and the library-mode BatchedWriter primitive. Full benchmark data in RESOURCES.md.

How is data deleted? GDPR / UAE PDPL right-to-be-forgotten?

Append-only means raw deletion is not supported. For right-to-be-forgotten, Zyrn ships a crypto-shred primitive — sensitive fields are stored encrypted with per-subject keys. Destroying the key makes the data unreadable forever while the audit-fact ghost stays in the timeline. See COMPLIANCE_UAE.md.

Is it production-ready?

v3.0.0 ships with 841 tests passing, the 7 determinism invariants proven, hardened systemd + Kubernetes deploy templates, automated snapshot discipline, tested DR drill, operator + DR runbooks, and a threat model. Phase 2 production-hardening complete in code. Two remaining items (2-week burn-in, disaster simulation drill) close during real production use.

For regulated workloads (insurance, finance), read THREATS.md, COMPLIANCE_UAE.md, and OPERATOR.md before deployment.

Why "zyrn"?

AI-picked, human-confirmed. Four letters, no etymology, no metaphor. Not Greek, not Latin, not Sanskrit. The name is its own primary key. This database was renamed from "Mnemos" (the original Greek-mythology name) precisely because we wanted to drop the human cultural baggage from AI-native naming.

Is it commercial?

No. Apache-2.0, free forever, run it yourself. A project of Logos Technologies LLC (Dubai, UAE). The licensing posture is permissive because the moat is the architecture and the doctrine, not the code.

Where do I file bugs / security reports?

Functional bugs: GitHub Issues. Security reports: see SECURITY.md for the coordinated disclosure process (90-day window, PGP-encrypted reports preferred).

§10Documentation

Full documentation lives in the repo. The most important documents:

DocumentPurpose
MANIFESTO.mdWhy this exists. The 7 principles. Honest trade-offs.
PROTOCOL.mdThe Zyrn Protocol v1.0 — 9 laws, handshake, enforcement classification.
DETERMINISM.mdThe 7 invariants — spec, proofs, audit paths.
OPERATOR.mdDay-0 install through day-2 production ops.
DR.md10 disaster-recovery scenarios with stepped procedures.
THREATS.mdThreat model. 6 adversary classes, mitigations.
SECRETS.mdToken rotation, secret backends, anti-patterns.
RESOURCES.mdEmpirical sizing data, throughput numbers.
COMPLIANCE_UAE.mdUAE regulatory gap analysis (CBUAE, PDPL).
FUZZING.mdcargo-fuzz harness, run instructions.
RATE_LIMITING.mdTwo-layer rate-limit model.
SECURITY.mdVulnerability disclosure policy.
CHANGELOG.mdFull release history v0.0.1 → v3.1.0.