The schema you declare when you create an index is the single biggest lever you have on search quality. Most "the engine isn't finding my products" complaints trace back to a schema that didn't anticipate how customers actually type.
This page collects patterns we recommend after watching hundreds of catalogues go live.
# Start with what customers actually search
Before you write a schema, capture two things:
- The five most common queries. Get them from your current search logs, Google Analytics site-search, or by asking customer support what people ask for.
- The single most-confused product type. The one where customers describe it five different ways. That's the field that needs the most love.
Then design the schema around making those queries instant — not around your database tables.
# E-commerce product (recommended)
A real-world Skryx schema for an automotive parts retailer. Notice how multiple SKU variants and a layered taxonomy are explicit fields rather than free-text descriptions:
{
"id": "1234",
"title": "Alternator 12V Dacia Logan",
"sku": "BK99651",
"sku_alternative": "DIS99651",
"sku_no_prefix": "99651",
"sku_no_punctuation": "BK99651",
"search_keywords": "alternator dacia logan 12v generator",
"price": 285.50,
"image_url": "https://cdn.example.com/products/1234.jpg",
"stock": 1,
"category": "Electrice",
"subcategory": "Alternatoare",
"product_type": "Alternatori 12V",
"is_xl": 0,
"brand": "Breckner",
"loyalty_points": 28
}
The matching schema declaration:
{
"name": "products",
"schema": {
"fields": [
{ "name": "title", "type": "string" },
{ "name": "sku", "type": "string", "facet": false },
{ "name": "sku_alternative", "type": "string", "optional": true },
{ "name": "sku_no_prefix", "type": "string", "optional": true },
{ "name": "sku_no_punctuation", "type": "string", "optional": true },
{ "name": "search_keywords", "type": "string", "optional": true },
{ "name": "price", "type": "float", "facet": true },
{ "name": "image_url", "type": "string", "optional": true, "index": false },
{ "name": "stock", "type": "int32", "facet": true },
{ "name": "category", "type": "string", "facet": true },
{ "name": "subcategory", "type": "string", "facet": true },
{ "name": "product_type", "type": "string", "facet": true },
{ "name": "is_xl", "type": "int32", "facet": true },
{ "name": "brand", "type": "string", "facet": true },
{ "name": "loyalty_points", "type": "int32", "facet": false }
],
"default_sorting_field": "stock"
}
}
Search it with field-weighted query_by:
{
"q": "BK99651",
"query_by": "sku,sku_alternative,sku_no_prefix,sku_no_punctuation,title,search_keywords",
"query_by_weights": "10,10,8,8,5,3"
}
# Why multiple SKU variants
Customers type the SKU different ways. They type the version printed on the
part (BK99651), the alternative supplier code from a competitor catalogue
(DIS99651), the bare number (99651), or the punctuated form
(BK-99651). One field per variant lets each one match with a high weight
without polluting the title with codes.
sku_no_prefix and sku_no_punctuation are usually computed at index time
from sku — strip the letters, strip the dashes — but stored as fields so
the engine can match them with a single lookup.
# Why a search_keywords field
Free-form text you control. Use it for:
- Common misspellings of the title (already handled by typo tolerance, but this gives you per-product overrides).
- Romanian + English duplicates when your customers speak both
(
"alternator generator alternateur"). - Trade names, generation numbers, vehicle compatibility codes that don't belong in the title.
Keep search_keywords weighted lower than title so the title is what
mostly drives ranking. The keyword field is a safety net.
# Why three levels of taxonomy
Splitting category / subcategory / product_type instead of one field gives
you:
- Facets at the right granularity. Customers filter by
category=Electrice, drill intosubcategory=Alternatoare, and may further pickproduct_type= Alternatori 12V. - Boost rules per level. "Boost in-stock items in
category:Electrice" is one rule; "pin specific products inproduct_type:Alternatori 12V" is another. - Cleaner analytics. Top zero-result queries grouped by category tell you which taxonomy branch needs catalog work.
# Custom boolean / flag fields
Model anything you want to filter/boost on as a typed field, not free text in the title:
is_xl— large items that need different shipping pricing.is_featured,is_new,is_bestseller— promotion flags.stock— int, not boolean, so you can dofilter_by: stock:>0.
Skryx accepts bool, but int32 for stock is more useful because it doubles
as a sort key for "show in-stock first then by price".
# Image URL — store but don't index
image_url is shown in your front-end but never searched. Add "index": false so it doesn't bloat the search index. Same for any field that's
display-only — last_modified, internal notes, supplier_part_number
codes the customer never sees.
# Content / docs schema
Quite different from products. The query is usually a question or partial phrase; relevance hinges on title + first paragraph.
{
"fields": [
{ "name": "title", "type": "string" },
{ "name": "subtitle", "type": "string", "optional": true },
{ "name": "body", "type": "string" },
{ "name": "breadcrumb", "type": "string[]", "optional": true },
{ "name": "author", "type": "string", "facet": true },
{ "name": "section", "type": "string", "facet": true },
{ "name": "tags", "type": "string[]", "facet": true },
{ "name": "published_at", "type": "int64", "facet": true }
],
"default_sorting_field": "published_at"
}
Query weighted heavily toward titles and first lines:
{ "q": "synonyms", "query_by": "title,subtitle,body", "query_by_weights": "8,4,1" }
Chunk long documents (articles, manuals) into paragraph-sized records so highlights are useful — one match window per result, not one giant truncated block.
# SaaS records (per-customer search)
Whatever the unit of search is in your product — jobs, candidates, tickets, messages — pin the customer-id on every record:
{
"fields": [
{ "name": "tenant_id", "type": "int64", "facet": true },
{ "name": "title", "type": "string" },
{ "name": "body", "type": "string" },
{ "name": "owner", "type": "string", "facet": true },
{ "name": "status", "type": "string", "facet": true },
{ "name": "created_at", "type": "int64", "facet": true }
]
}
Then filter every query with filter_by: "tenant_id:42". See
Multi-tenancy for the trust model.
# Rules of thumb
- Make a field for every thing customers filter on. Filters are free at query time; computing them on-the-fly isn't.
- Don't put codes in the title. They hurt human-readability and don't help ranking — give them their own SKU-variant fields instead.
- Don't change types after going live. Adding optional fields is free (PATCH /settings); renaming a field requires a re-index. The zero-downtime swap pattern is built for exactly this.
- Use
optional: trueon fields you might not have on every record. Without it, the missing-field error breaks ingest for the whole batch. - Mark display-only fields
"index": false. Saves memory, speeds search.