⚡ Speed & Scale

In-memory engine. Frankfurt. Single digit ms median.

Dedicated Skryx engine nodes, in-memory indices, 3-tier embedding cache, 2-second engine timeout, two-layer rate limiting, atomic alias swaps for zero-downtime reindex.

Start free → Try the demo

Starter Growth Scale Enterprise

Latency · last 60 min

p50

11 ms

p75

22 ms

p95

38 ms

p99

48 ms

Measured at response_time_ms on every search_event

The stack

Frankfurt. Honest single-region today.

Skryx runs on a Docker-compose stack in Frankfurt (Contabo): the Skryx engine, MySQL 8, Redis 7 with 2 GB LRU, Laravel Horizon for the worker pool, the scheduler container, and Nginx Proxy Manager for SSL. Single-region, no multi-region failover yet. We say so up front instead of hiding behind "global CDN" boilerplate.

// docker-compose.yml — the live shape
nginx-proxy   // SSL termination · port 443
mysql         // MySQL 8.0, utf8mb4
redis         // Redis 7-alpine, 2 GB allkeys-lru
typesense     // Skryx engine · in-memory indices
horizon       // queue workers · embeddings, syncs, coach
scheduler     // Laravel scheduler · cron passes
app           // Laravel app — search request path

Latency optimisations

The boring engineering, listed out.

🧠 In-memory engine

No disk seeks on the query path.

The Skryx engine keeps every active index in RAM. Searches don't touch disk during reads; persistence runs in the background. Restart times are seconds, not minutes — and cold-cache penalties don't exist because there is no cold cache.

HNSW vector index for semantic search (sub-linear nearest-neighbour lookup)
In-memory full-text indices for keyword and prefix matching
2-second engine timeout — any query that goes long gets cut, never blocks the worker

// config/typesense.php
[
  "connection_timeout_seconds" => 2,
  "healthcheck_interval_seconds" => 15,
  …
]

// On timeout:
// → request returns 504 with SK-SE-504
// → never blocks the worker thread

🧊 3-tier embedding cache

Cheap queries never pay for AI twice.

Query vectors are cached at three levels: per-request memo (dedup within one search), Cache facade with 24 h TTL in Redis, persistent embedding_cache table with hit counters. Popular queries return their vector from RAM in microseconds.

Tier 1: in-request memoisation — dedup repeated calls in a single search
Tier 2: Redis (24 h TTL) — first hit cost amortised across thousands of follow-ups
Tier 3: embedding_cache table — warm-starts a brand-new worker, tracks popularity

// embedding_cache · last 24h

Tier 1 hits: 12,408 (in-request)

Tier 2 hits: 142,901 (Redis)

Tier 3 hits: 8,217 (table)

Misses: 3,108

─────────────────

Hit rate 98.1% across 24h

🪣 Multi-tenant alias isolation

Stable alias. Versioned collection.
Atomic swap on reindex.

Every index has a stable alias (t_{ref}_{slug}) that all queries hit, and a versioned physical collection (t_{ref}_{slug}_v1, _v2, …) underneath. Reindex builds the new version in parallel and atomically flips the alias — zero-downtime, no inconsistent reads in flight.

Per-tenant prefix prevents cross-tenant collisions and noisy-neighbour effects
Alias swap is the operation, not "drain traffic and rebuild"
Exposed as POST /v1/indexes/{name}/swap-with in the API

// Alias pattern

t_demo_products → t_demo_products_v3

(live)

─────────────────

// During reindex

t_demo_products → t_demo_products_v3

(live)

t_demo_products_v4 → building...

─────────────────

Atomic swap → v4 live, v3 deleted

🛡️ Two-layer rate limiting

IP burst + tenant monthly quota.
Stored in Redis.

Layer one: 60 requests per second per IP, every endpoint, returns 429 with X-RateLimit-* headers. Layer two: per-tenant monthly search quota from the plan (or override). Quota exhaustion returns 429 on hard plans; on overage-enabled plans it bills instead of failing.

Per-IP: 60 req/s soft cap, Laravel RateLimiter, 1 s decay window
Per-tenant: Redis counter keyed quota:tenant:{id}:{YYYY-MM}, ~40 day TTL
Overage billing: overage_per_1k_{currency} in minor units on the plan

{
  "error": "quota_exceeded",
  "current_usage": 100242,
  "quota":         100000,
  "reset_at":      "2026-06-01T00:00:00Z",
  "upgrade_url":   "https://app.skryx.io/billing"
}

// Headers on every response:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 47
Retry-After: 0

Async by default

Anything slow goes through Horizon.

Embedding generation, data-source syncs, AI Coach analysis, synonym catalog scans — all run as queue jobs, never on the request path. Search requests touch only the engine, the embedding caches, and the analytics writer. The slow stuff happens out of band.

30 min

Hard timeout on the embedding job · 3 retries with exponential backoff

128

Max embedding batch size · per-batch retry, never poisons a whole run

Live

Progress persisted per batch · UI polls every 2 s during indexing

Observability

A real health page. A real latency histogram.

🩺 Platform health

Live probes + last-hour log triage.

The platform health dashboard probes database, Redis, search engine, embedding provider, queue workers, and active embedding jobs in real time. The last hour of WARNING / ERROR / CRITICAL log entries is parsed, grouped by issue family, and per-tenant health badges show who is currently noisy.

6 service probes refreshed every 30 s
Issue families: malformed_sort_by, engine_upsert, embedding_batch, database, billing, …
Per-tenant status: healthy · noisy · degraded · down

// /admin/health · last 60 min

● database ok

● redis ok

● search engine ok

● embeddings ok

● queue workers ok (2 master)

● embedding jobs ok (1 indexing)

📐 Real percentiles

Computed from `response_time_ms` on every event.

No sampling, no estimation. The latency histogram endpoint runs a SQL aggregation over the actual search_events rows and returns p50 / p75 / p95 / p99 plus 50 buckets at 10 ms granularity (0–500 ms) and a tail bucket. The dashboard renders them; you can also call the JSON endpoint directly.

response_time_ms on every search event (wall time)
embed_time_ms tracked separately for vector queries
Achievement system unlocks "Speed Demon" badge for tenants whose 24 h average stays under 20 ms

GET /api/analytics/latency-histogram?period=24h

{
  "p50":  11,
  "p75":  22,
  "p95":  38,
  "p99":  48,
  "mean": 19,
  "buckets": [
    { "from": 0,   "to": 10,  "count": 142019 },
    { "from": 10,  "to": 20,  "count": 98412 },
    // … 50 buckets
  ],
  "tail_500ms_plus": 14
}

What we don't pretend

A short list of things still on the roadmap.

Single-region (Frankfurt) today — no multi-region replication or failover yet. Latency is measured end-to-end wall time, not yet broken down per pipeline stage. Hybrid search runs the keyword leg first, then the vector leg sequentially (a parallel-merge mode is on the list). When we ship them, they'll appear here with numbers.

Frankfurt

Single region today · multi-region on the roadmap, no committed date

Data residency, no cross-border transfers, no SCC dance required

Open

We show real numbers from production — see the demo + the live health page

In-memory engine. Frankfurt. Single digit ms median.

Frankfurt. Honest single-region today.

The boring engineering, listed out.

No disk seeks on the query path.

Cheap queries never pay for AI twice.

Stable alias. Versioned collection.
Atomic swap on reindex.

IP burst + tenant monthly quota.
Stored in Redis.

Anything slow goes through Horizon.

A real health page. A real latency histogram.

Live probes + last-hour log triage.

Computed from `response_time_ms` on every event.

A short list of things still on the roadmap.

Other things Skryx does

Try it on your own catalog.

We use cookies

In-memory engine. Frankfurt. Single digit ms median.

Frankfurt. Honest single-region today.

The boring engineering, listed out.

No disk seeks on the query path.

Cheap queries never pay for AI twice.

Stable alias. Versioned collection. Atomic swap on reindex.

IP burst + tenant monthly quota. Stored in Redis.

Anything slow goes through Horizon.

A real health page. A real latency histogram.

Live probes + last-hour log triage.

Computed from response_time_ms on every event.

A short list of things still on the roadmap.

Other things Skryx does

Try it on your own catalog.

Stable alias. Versioned collection.
Atomic swap on reindex.

IP burst + tenant monthly quota.
Stored in Redis.

Computed from `response_time_ms` on every event.