The four-layer tenant isolation we ship

· IsoKron team · 3 min read

Postgres RLS alone isn't enough when service-role connections bypass it. Here are the four independent layers we enforce — and the 109 tests that keep them honest.

  • security
  • multi-tenant
  • postgres

Why one layer is fragile

Multi-tenant SaaS gets this wrong all the time. The pattern goes: stand up Postgres, add Row-Level Security policies, ship. RLS is great. It's also one mistake away from leaking everything.

The most common mistake: someone needs a query that doesn't fit the RLS shape, so they reach for the service-role connection that bypasses RLS. They mean to add an explicit WHERE workspace_id = $1 filter. They forget. The query returns rows from every tenant.

Production data leaks have happened this way at companies you've heard of. The defense is not "be more careful." The defense is structure that makes the careless mistake mechanically impossible.

We enforce tenant isolation at four independent layers. Each one alone closes the breach. You'd have to defeat all four to leak data — and the test suite (109 of them, growing) makes sure that doesn't happen quietly.

Layer 1 — Postgres Row-Level Security

Every customer-scoped table has RLS enabled, with policies that read the active tenant from the JWT claim chain. SECURITY DEFINER helper functions extract clerk_user_id(), clerk_active_org_id(), and clerk_org_role() from the JWT and check them against the row's workspace_id. The policies live in packages/db/migrations/0007_rls_policies.sql and are pinned by the schema.

Performance overhead is under 5ms per query on Postgres 17's planner — verified, not estimated. The "wrap auth.jwt() in (SELECT ...)" optimization keeps the function from being called once per row.

Layer 2 — Fastify preHandler hook

Every API request passes through a hook that validates the Clerk JWT, resolves the workspace from the route params (or rejects if the JWT doesn't have access to the requested workspace), and writes an audit row for sensitive operations.

This layer also runs the in-memory deny-list cache check (TTL 65 minutes) that closes the JWT-replay window after a user is downgraded. Without this, a recently-fired admin could keep acting like an admin for up to an hour with the old token still cached.

Layer 3 — MCP host scoping

When a customer's fleet connects via MCP, the connection carries a Bearer token. The host extracts the workspace from the token, scopes the connection, and rejects any message claiming a different workspace. Tickets dispatched to a fleet can only ever touch the workspace the fleet authenticated against.

Layer 4 — Worker explicit filter (the most critical layer)

The previous three layers protect customer-facing requests. They do not protect background workers, because workers use a service-role Postgres connection that bypasses RLS by design (RLS doesn't fire under service role).

This is the layer where production tenant leaks usually start. Our solution is a wrapper function — serviceQuery(workspaceId, query, params) — that:

  1. Throws at runtime if the query text doesn't include workspace_id
  2. Writes an audit row before executing
  3. Then executes against the service-role connection

It's enforced by:

  • An ESLint rule (no-db-connection-during-network-io) that bans the raw connection from the network paths it shouldn't touch
  • A Semgrep rule that bans cross-credential imports
  • 109 CI tests in tests/tenant-isolation/ that probe every layer
  • Branch protection on main with enforce_admins: true — operator cannot bypass under deadline pressure

The test count can only grow. CI gate enforces that. If a PR somehow drops a test, the gate blocks merge.

What this actually buys you

Every layer alone catches a different mistake:

  • Layer 1 catches the developer who forgets to filter and uses a customer-scoped connection
  • Layer 2 catches the JWT-replay attack
  • Layer 3 catches the malicious or misconfigured fleet sending unscoped MCP messages
  • Layer 4 catches the worker code path that uses service-role and forgets the filter

You'd have to defeat all four. The probability of a quiet leak through this surface, given the enforced tests and the structural prevention, is what we built for.

Bottom line

Tenant isolation is the single thing a multi-tenant platform can least afford to get wrong. We invested in four layers and 109 tests because "one layer is enough" is a thing that's only true until it isn't. The customer evidence we've accumulated says it isn't.

If you're evaluating IsoKron for production work and want to walk the actual code: docs/architecture-invariants.md in the repo enumerates the invariants and the migration that enforces each. Every layer is pointed at a file path you can read. We didn't want any of this to be marketing-only.

Related: BYOK economics — why we pay only for security, The hash chain we ship.