Home· Features· Speed & Scale
Speed & Scale

In-memory engine. Frankfurt. Single digit ms median.

Dedicated Skryx engine nodes, in-memory indices, 3-tier embedding cache, 2-second engine timeout, two-layer rate limiting, atomic alias swaps for zero-downtime reindex.

Starter Growth Scale Enterprise
Latency · last 60 min
p50
11 ms
p75
22 ms
p95
38 ms
p99
48 ms
Measured at response_time_ms on every search_event
The stack

Frankfurt. Honest single-region today.

Skryx runs on a Docker-compose stack in Frankfurt (Contabo): the Skryx engine, MySQL 8, Redis 7 with 2 GB LRU, Laravel Horizon for the worker pool, the scheduler container, and Nginx Proxy Manager for SSL. Single-region, no multi-region failover yet. We say so up front instead of hiding behind "global CDN" boilerplate.

// docker-compose.yml — the live shape
nginx-proxy   // SSL termination · port 443
mysql         // MySQL 8.0, utf8mb4
redis         // Redis 7-alpine, 2 GB allkeys-lru
typesense     // Skryx engine · in-memory indices
horizon       // queue workers · embeddings, syncs, coach
scheduler     // Laravel scheduler · cron passes
app           // Laravel app — search request path
                
Latency optimisations

The boring engineering, listed out.

🧠 In-memory engine

No disk seeks on the query path.

The Skryx engine keeps every active index in RAM. Searches don't touch disk during reads; persistence runs in the background. Restart times are seconds, not minutes — and cold-cache penalties don't exist because there is no cold cache.

  • HNSW vector index for semantic search (sub-linear nearest-neighbour lookup)
  • In-memory full-text indices for keyword and prefix matching
  • 2-second engine timeout — any query that goes long gets cut, never blocks the worker
// config/typesense.php
[
  "connection_timeout_seconds" => 2,
  "healthcheck_interval_seconds" => 15,
  …
]

// On timeout:
// → request returns 504 with SK-SE-504
// → never blocks the worker thread
🧊 3-tier embedding cache

Cheap queries never pay for AI twice.

Query vectors are cached at three levels: per-request memo (dedup within one search), Cache facade with 24 h TTL in Redis, persistent embedding_cache table with hit counters. Popular queries return their vector from RAM in microseconds.

  • Tier 1: in-request memoisation — dedup repeated calls in a single search
  • Tier 2: Redis (24 h TTL) — first hit cost amortised across thousands of follow-ups
  • Tier 3: embedding_cache table — warm-starts a brand-new worker, tracks popularity
// embedding_cache · last 24h
Tier 1 hits: 12,408 (in-request)
Tier 2 hits: 142,901 (Redis)
Tier 3 hits: 8,217 (table)
Misses: 3,108
─────────────────
Hit rate 98.1% across 24h
🪣 Multi-tenant alias isolation

Stable alias. Versioned collection.
Atomic swap on reindex.

Every index has a stable alias (t_{ref}_{slug}) that all queries hit, and a versioned physical collection (t_{ref}_{slug}_v1, _v2, …) underneath. Reindex builds the new version in parallel and atomically flips the alias — zero-downtime, no inconsistent reads in flight.

  • Per-tenant prefix prevents cross-tenant collisions and noisy-neighbour effects
  • Alias swap is the operation, not "drain traffic and rebuild"
  • Exposed as POST /v1/indexes/{name}/swap-with in the API
// Alias pattern
t_demo_products → t_demo_products_v3
(live)
─────────────────
// During reindex
t_demo_products → t_demo_products_v3
(live)
t_demo_products_v4 → building...
─────────────────
Atomic swap → v4 live, v3 deleted
🛡️ Two-layer rate limiting

IP burst + tenant monthly quota.
Stored in Redis.

Layer one: 60 requests per second per IP, every endpoint, returns 429 with X-RateLimit-* headers. Layer two: per-tenant monthly search quota from the plan (or override). Quota exhaustion returns 429 on hard plans; on overage-enabled plans it bills instead of failing.

  • Per-IP: 60 req/s soft cap, Laravel RateLimiter, 1 s decay window
  • Per-tenant: Redis counter keyed quota:tenant:{id}:{YYYY-MM}, ~40 day TTL
  • Overage billing: overage_per_1k_{currency} in minor units on the plan
{
  "error": "quota_exceeded",
  "current_usage": 100242,
  "quota":         100000,
  "reset_at":      "2026-06-01T00:00:00Z",
  "upgrade_url":   "https://app.skryx.io/billing"
}

// Headers on every response:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 47
Retry-After: 0
Async by default

Anything slow goes through Horizon.

Embedding generation, data-source syncs, AI Coach analysis, synonym catalog scans — all run as queue jobs, never on the request path. Search requests touch only the engine, the embedding caches, and the analytics writer. The slow stuff happens out of band.

30 min
Hard timeout on the embedding job · 3 retries with exponential backoff
128
Max embedding batch size · per-batch retry, never poisons a whole run
Live
Progress persisted per batch · UI polls every 2 s during indexing
Observability

A real health page. A real latency histogram.

🩺 Platform health

Live probes + last-hour log triage.

The platform health dashboard probes database, Redis, search engine, embedding provider, queue workers, and active embedding jobs in real time. The last hour of WARNING / ERROR / CRITICAL log entries is parsed, grouped by issue family, and per-tenant health badges show who is currently noisy.

  • 6 service probes refreshed every 30 s
  • Issue families: malformed_sort_by, engine_upsert, embedding_batch, database, billing, …
  • Per-tenant status: healthy · noisy · degraded · down
// /admin/health · last 60 min
database ok
redis ok
search engine ok
embeddings ok
queue workers ok (2 master)
embedding jobs ok (1 indexing)
📐 Real percentiles

Computed from response_time_ms on every event.

No sampling, no estimation. The latency histogram endpoint runs a SQL aggregation over the actual search_events rows and returns p50 / p75 / p95 / p99 plus 50 buckets at 10 ms granularity (0–500 ms) and a tail bucket. The dashboard renders them; you can also call the JSON endpoint directly.

  • response_time_ms on every search event (wall time)
  • embed_time_ms tracked separately for vector queries
  • Achievement system unlocks "Speed Demon" badge for tenants whose 24 h average stays under 20 ms
GET /api/analytics/latency-histogram?period=24h

{
  "p50":  11,
  "p75":  22,
  "p95":  38,
  "p99":  48,
  "mean": 19,
  "buckets": [
    { "from": 0,   "to": 10,  "count": 142019 },
    { "from": 10,  "to": 20,  "count": 98412 },
    // … 50 buckets
  ],
  "tail_500ms_plus": 14
}
What we don't pretend

A short list of things still on the roadmap.

Single-region (Frankfurt) today — no multi-region replication or failover yet. Latency is measured end-to-end wall time, not yet broken down per pipeline stage. Hybrid search runs the keyword leg first, then the vector leg sequentially (a parallel-merge mode is on the list). When we ship them, they'll appear here with numbers.

Frankfurt
Single region today · multi-region on the roadmap, no committed date
EU
Data residency, no cross-border transfers, no SCC dance required
Open
We show real numbers from production — see the demo + the live health page
Keep exploring

Other things Skryx does

Try it on your own catalog.

Free tier, no credit card. EU-hosted from day one.