In-memory engine. Frankfurt. Single digit ms median.
Dedicated Skryx engine nodes, in-memory indices, 3-tier embedding cache, 2-second engine timeout, two-layer rate limiting, atomic alias swaps for zero-downtime reindex.
response_time_ms on every search_event
Frankfurt. Honest single-region today.
Skryx runs on a Docker-compose stack in Frankfurt (Contabo): the Skryx engine, MySQL 8, Redis 7 with 2 GB LRU, Laravel Horizon for the worker pool, the scheduler container, and Nginx Proxy Manager for SSL. Single-region, no multi-region failover yet. We say so up front instead of hiding behind "global CDN" boilerplate.
// docker-compose.yml — the live shape nginx-proxy // SSL termination · port 443 mysql // MySQL 8.0, utf8mb4 redis // Redis 7-alpine, 2 GB allkeys-lru typesense // Skryx engine · in-memory indices horizon // queue workers · embeddings, syncs, coach scheduler // Laravel scheduler · cron passes app // Laravel app — search request path
The boring engineering, listed out.
No disk seeks on the query path.
The Skryx engine keeps every active index in RAM. Searches don't touch disk during reads; persistence runs in the background. Restart times are seconds, not minutes — and cold-cache penalties don't exist because there is no cold cache.
- HNSW vector index for semantic search (sub-linear nearest-neighbour lookup)
- In-memory full-text indices for keyword and prefix matching
- 2-second engine timeout — any query that goes long gets cut, never blocks the worker
// config/typesense.php [ "connection_timeout_seconds" => 2, "healthcheck_interval_seconds" => 15, … ] // On timeout: // → request returns 504 with SK-SE-504 // → never blocks the worker thread
Cheap queries never pay for AI twice.
Query vectors are cached at three levels: per-request memo (dedup within
one search), Cache facade with 24 h TTL in Redis, persistent
embedding_cache table with hit counters. Popular queries
return their vector from RAM in microseconds.
- Tier 1: in-request memoisation — dedup repeated calls in a single search
- Tier 2: Redis (24 h TTL) — first hit cost amortised across thousands of follow-ups
- Tier 3:
embedding_cachetable — warm-starts a brand-new worker, tracks popularity
Stable alias. Versioned collection.
Atomic swap on reindex.
Every index has a stable alias (t_{ref}_{slug}) that all
queries hit, and a versioned physical collection
(t_{ref}_{slug}_v1, _v2, …) underneath. Reindex
builds the new version in parallel and atomically flips the alias —
zero-downtime, no inconsistent reads in flight.
- Per-tenant prefix prevents cross-tenant collisions and noisy-neighbour effects
- Alias swap is the operation, not "drain traffic and rebuild"
- Exposed as
POST /v1/indexes/{name}/swap-within the API
IP burst + tenant monthly quota.
Stored in Redis.
Layer one: 60 requests per second per IP, every endpoint, returns 429
with X-RateLimit-* headers. Layer two: per-tenant monthly
search quota from the plan (or override). Quota exhaustion returns 429
on hard plans; on overage-enabled plans it bills instead of failing.
- Per-IP: 60 req/s soft cap, Laravel RateLimiter, 1 s decay window
- Per-tenant: Redis counter keyed
quota:tenant:{id}:{YYYY-MM}, ~40 day TTL - Overage billing:
overage_per_1k_{currency}in minor units on the plan
{
"error": "quota_exceeded",
"current_usage": 100242,
"quota": 100000,
"reset_at": "2026-06-01T00:00:00Z",
"upgrade_url": "https://app.skryx.io/billing"
}
// Headers on every response:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 47
Retry-After: 0
Anything slow goes through Horizon.
Embedding generation, data-source syncs, AI Coach analysis, synonym catalog scans — all run as queue jobs, never on the request path. Search requests touch only the engine, the embedding caches, and the analytics writer. The slow stuff happens out of band.
A real health page. A real latency histogram.
Live probes + last-hour log triage.
The platform health dashboard probes database, Redis, search engine, embedding provider, queue workers, and active embedding jobs in real time. The last hour of WARNING / ERROR / CRITICAL log entries is parsed, grouped by issue family, and per-tenant health badges show who is currently noisy.
- 6 service probes refreshed every 30 s
- Issue families: malformed_sort_by, engine_upsert, embedding_batch, database, billing, …
- Per-tenant status: healthy · noisy · degraded · down
Computed from response_time_ms on every event.
No sampling, no estimation. The latency histogram endpoint runs a SQL
aggregation over the actual search_events rows and returns
p50 / p75 / p95 / p99 plus 50 buckets at 10 ms granularity (0–500 ms)
and a tail bucket. The dashboard renders them; you can also call the
JSON endpoint directly.
response_time_mson every search event (wall time)embed_time_mstracked separately for vector queries- Achievement system unlocks "Speed Demon" badge for tenants whose 24 h average stays under 20 ms
GET /api/analytics/latency-histogram?period=24h
{
"p50": 11,
"p75": 22,
"p95": 38,
"p99": 48,
"mean": 19,
"buckets": [
{ "from": 0, "to": 10, "count": 142019 },
{ "from": 10, "to": 20, "count": 98412 },
// … 50 buckets
],
"tail_500ms_plus": 14
}
A short list of things still on the roadmap.
Single-region (Frankfurt) today — no multi-region replication or failover yet. Latency is measured end-to-end wall time, not yet broken down per pipeline stage. Hybrid search runs the keyword leg first, then the vector leg sequentially (a parallel-merge mode is on the list). When we ship them, they'll appear here with numbers.