๐งญ Introduction: Why Search Is Harder Than It Looks
Type a few letters into Google or Amazon, and results appear instantly โ ranked, relevant, and fresh.
Behind that magic lies a chain of complex engineering ideas:
- Debouncing makes typing feel smooth
- Ranking algorithms decide what appears first
- Elasticsearch retrieves relevant results at scale
- Change Data Capture (CDC) keeps everything up-to-date in real time
In this article, we'll break each piece down, step by step โ and end with a small, working example showing how to connect them all together.
๐ก 1. Debouncing: The First Layer of Performance
When a user types in a search box, every keystroke can trigger an API call โ wasting bandwidth and hammering your backend.
Example: typing "apple" fires 5 requests:
a โ ap โ app โ appl โ apple
That's inefficient. Debouncing solves this by waiting until the user pauses typing before running the search.
๐งฉ How Debouncing Works
Wait X milliseconds after the last keystroke before executing the function.
If another keystroke happens before the delay ends, the timer resets.
๐ง Example (JavaScript)
function debounce(func, delay) {
let timeout;
return (...args) => {
clearTimeout(timeout);
timeout = setTimeout(() => func.apply(this, args), delay);
};
}
// Usage
const handleSearch = debounce((query) => {
fetch(`/api/search?q=${query}`)
.then((res) => res.json())
.then((data) => console.log(data));
}, 300);
โ Result: Only one API call after the user stops typing โ smoother UX, less load.
Performance Impact
Without debouncing:
- 1000 users typing 5 characters = 5,000 requests
- With debouncing: same scenario = 1,000 requests
That's an 80% reduction in unnecessary API calls!
๐ Learn more:
โ๏ธ 2. Ranking: Making Search Results Relevant
Once the query hits your backend, you need to decide which results should appear first.
Ranking is the "brain" of search โ it determines what's most relevant.
Common Ranking Factors
- Text relevance โ how well the content matches the query
- Popularity โ e.g., click counts, purchases
- Recency โ newer results may rank higher
- Personalization โ user preferences or history
- User signals โ likes, shares, time-spent
๐ข Simplified Ranking Formula
Final Score = (text_score ร 0.6) + (popularity ร 0.3) + (recency ร 0.1)
This formula is weighted because:
- 60% text relevance (user is looking for specific content)
- 30% popularity (trusted/purchased products rank higher)
- 10% recency (fresh products get a small boost)
Example Ranking Scenario
Imagine searching for "wireless headphones" โ here's how three products score:
Visual breakdown showing why Sony ranks #1: balanced excellence across all ranking factors.
Sony WH-1000XM5
- Text Score: 95/100 (matches "wireless" + "headphones")
- Popularity: 90/100 (50K+ purchases)
- Recency: 85/100 (released 2 years ago, still current)
- Final Score: 90.8 โญ #1 Ranked
Generic Headphones
- Text Score: 85/100 (matches query but less specific)
- Popularity: 40/100 (5K purchases)
- Recency: 95/100 (just released)
- Final Score: 73.5 (not enough popularity)
Vintage Headphones
- Text Score: 88/100 (matches but labeled "vintage")
- Popularity: 30/100 (old product, few recent purchases)
- Recency: 20/100 (released 10 years ago)
- Final Score: 60.4 (low overall score)
Result: Sony ranks first because it excels across all factors. Users see the most relevant product first! ๐
๐งฎ Real-World Algorithms
Search engines use sophisticated algorithms:
- TF-IDF (Term FrequencyโInverse Document Frequency) โ classic approach
- BM25 โ a modern improvement on TF-IDF (used by Elasticsearch)
- Learning to Rank (LTR) โ machine learning-based ranking
BM25 Formula (Simplified):
score(D, Q) = ฮฃ IDF(qi) ร (f(qi, D) ร (k1 + 1)) / (f(qi, D) + k1 ร (1 - b + b ร |D| / avgdl))
Where:
D= documentQ= queryIDF= inverse document frequencyf= term frequencyk1,b= tuning parameters
๐ Learn more:
โ๏ธ 3. Elasticsearch: The Engine Powering Modern Search
Elasticsearch is a distributed search and analytics engine built on Apache Lucene. It's used by companies like Netflix, Uber, and Shopify to power their search systems.
๐ง Why Elasticsearch?
- Blazing fast full-text search โ searches millions of documents in milliseconds
- Scalable โ can handle billions of documents across clusters
- Powerful ranking & scoring out-of-the-box (BM25)
- Supports fuzzy search, filtering, and aggregations
- RESTful API โ easy to integrate with any backend
๐๏ธ Basic Architecture
Example: Index and Document
PUT /products/_doc/1
{
"id": 1,
"title": "Wireless Bluetooth Headphones",
"description": "Premium sound quality with noise cancellation",
"price": 299.99,
"rating": 4.8,
"reviews_count": 1250
}
Example: Simple Query
GET /products/_search
{
"query": {
"match": {
"title": "wireless headphones"
}
},
"size": 10,
"from": 0
}
Response Example
{
"hits": {
"total": { "value": 42 },
"hits": [
{
"_id": "1",
"_score": 8.95,
"_source": {
"title": "Wireless Bluetooth Headphones",
"price": 299.99,
"rating": 4.8
}
}
]
}
}
Advanced: Combining Multiple Queries
GET /products/_search
{
"query": {
"bool": {
"must": [
{ "match": { "title": "headphones" } }
],
"filter": [
{ "range": { "price": { "lte": 500 } } },
{ "term": { "in_stock": true } }
]
}
}
}
This query returns products matching "headphones" that are under $500 and in stock.
๐ Learn more:
๐ 3.5 Search Engine Alternatives to Elasticsearch
Elasticsearch is powerful but complex to set up and operate. Here are production-ready alternatives depending on your needs:
Comparison Table: Search Engines
๐ Typesense: Fast, Developer-Friendly Alternative
Typesense is a modern search engine optimized for speed and ease of use.
Pros:
- โก Instant search results (~100ms)
- ๐ฏ Built-in typo tolerance
- ๐ Facets & filtering out-of-box
- ๐ฆ Easy deployment (Docker, Heroku, etc.)
- ๐ฐ Open-source & self-hosted
- ๐ฑ Great for mobile apps
Cons:
- Document limits on free tier
- Smaller ecosystem than Elasticsearch
- Fewer advanced features
Quick Example:
// Install: npm install typesense
import Typesense from "typesense";
const client = new Typesense.Client({
nodes: [{ host: "localhost", port: 8108, protocol: "http" }],
apiKey: "xyz789",
});
// Search
const results = await client.collections("products").documents().search({
q: "wireless headphones",
query_by: "title,description",
filter_by: "price:<= 300",
limit: 10,
});
Deploy Typesense:
๐ Meilisearch: The User-Experience Champion
Meilisearch prioritizes beautiful search UX with minimal configuration.
Pros:
- ๐จ Amazing UX by default
- โก Fast (HTTP response in ~50ms)
- ๐ง Zero-config relevance
- ๐ Excellent documentation
- ๐ Open-source & MIT licensed
- ๐ REST API only (simpler than Elasticsearch)
Cons:
- Smaller than Typesense in some benchmarks
- Less flexible for custom ranking
Quick Example:
// Install: npm install meilisearch
import { MeiliSearch } from "meilisearch";
const client = new MeiliSearch({
host: "http://localhost:7700",
apiKey: "masterKey",
});
// Add documents
await client.index("products").addDocuments([
{ id: 1, title: "Sony Headphones", price: 299 },
{ id: 2, title: "Bose Headphones", price: 279 },
]);
// Search
const results = await client.index("products").search("wireless");
Deploy Meilisearch:
๐ผ Algolia: The Enterprise SaaS Option
Algolia is a fully managed SaaS solution for teams that need turnkey search.
Pros:
- โ Zero infrastructure management
- โ Global CDN (fast everywhere)
- โ Outstanding documentation
- โ Analytics & insights included
- โ Premium support
Cons:
- ๐ฐ Can be expensive at scale ($0.008+ per query)
- Vendor lock-in (proprietary platform)
- Less control over algorithms
Ideal for: Startups, high-traffic sites where ops overhead is a concern.
๐ Whoosh: Lightweight Python Alternative
For small to medium projects in Python, Whoosh is a pure-Python search library.
Pros:
- ๐ฆ Single Python package (no servers to run)
- ๐ Great for simple use cases
- ๐ง Fully customizable
Cons:
- โ No distributed/scaling capability
- Limited to local/single-machine
- Slower than Elasticsearch/Typesense
Quick Example:
from whoosh.fields import Schema, TEXT
from whoosh.index import create_in
# Define schema
schema = Schema(
id=ID(stored=True),
title=TEXT(stored=True),
content=TEXT
)
# Create index
ix = create_in('indexdir', schema)
writer = ix.writer()
# Index documents
writer.add_document(id='1', title='Wireless Headphones', content='...')
writer.commit()
# Search
with ix.searcher() as searcher:
results = searcher.find('title', 'wireless')
for result in results:
print(result['title'])
๐ฏ How to Choose?
- Enterprise, Millions of docs, Complex queries? โ Elasticsearch
- Want speed + easy setup? โ Typesense
- Prioritize UX + Open Source? โ Meilisearch
- Don't want to manage infrastructure? โ Algolia
- Small Python project? โ Whoosh
๐ 4. Change Data Capture (CDC): Keeping Search Fresh
Even the best search index becomes outdated if your source data changes.
When products are added, deleted, or updated in your main database, your search index must stay in sync.
That's where Change Data Capture (CDC) comes in.
๐ง What Is CDC?
CDC continuously monitors your database for changes and streams them to another system โ like Elasticsearch.
The Problem CDC Solves
Imagine an e-commerce platform:
- Product gets updated in MySQL
- Price drops to $99
- But search still shows $199
- Customer buys, expecting $99 price
- Revenue loss. Angry customer. Bad review.
CDC prevents this by ensuring search is always up-to-date.
Example Flow
- User updates product price in MySQL (from $299 โ $199)
- CDC tool (Debezium) captures the change in MySQL binlog
- Message sent to Kafka with the update event
- Kafka Consumer reads the event
- Elasticsearch updated with new price
- Next search query returns updated price โ
CDC Architecture Diagram
โโโโโโโโโโโโโโโโโโโโโโโ
โ MySQL Database โ
โ Product updated โ
โ (price: $199) โ
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ (binlog)
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ Debezium (CDC) โ
โ Captures changes โ
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ Apache Kafka โ
โ Streams changes โ
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ Elasticsearch โ
โ (Index updated) โ
โโโโโโโโโโโโโโโโโโโโโโโ
Tools for CDC
๐ 4.5 CDC Tools Deep Dive
๐ฅ Debezium: The Industry Standard
Debezium is the most popular open-source CDC tool for databases. Used by companies like Walmart, Booking.com, and Square.
Pros:
- โ Works with major databases (MySQL, PostgreSQL, MongoDB, Oracle)
- โ Battle-tested, enterprise-grade
- โ Free & open-source
- โ Sub-second latency
- โ Rich community & documentation
Cons:
- ๏ฟฝ Requires Kafka & Zookeeper setup (operational overhead)
- ๐ Learning curve
Architecture:
MySQL โ Debezium Connector โ Kafka โ Elasticsearch Sink โ Elasticsearch Index
Example: Debezium + Kafka Setup
# 1. Start Zookeeper
docker run -d --name zookeeper \
-e ZOOKEEPER_CLIENT_PORT=2181 \
confluentinc/cp-zookeeper
# 2. Start Kafka
docker run -d --name kafka \
-e KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 \
-e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092 \
-e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
confluentinc/cp-kafka
# 3. Create Debezium MySQL connector
curl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d '{
"name": "mysql-debezium",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.hostname": "mysql",
"database.port": 3306,
"database.user": "root",
"database.password": "password",
"database.server.id": 1,
"database.server.name": "dbserver1",
"table.include.list": "ecommerce.products",
"topic.prefix": "mysql"
}
}'
# 4. Listen to Kafka topic
docker exec kafka kafka-console-consumer \
--bootstrap-server localhost:9092 \
--topic mysql.ecommerce.products \
--from-beginning
โก Supabase Realtime: PostgreSQL + Real-time
Supabase combines PostgreSQL with built-in real-time subscriptions.
Pros:
- ๐ Extremely fast (< 100ms latency)
- ๐ฏ Built into PostgreSQL, no extra infrastructure
- ๐ก Real-time WebSocket subscriptions included
- ๐ Open-source
- โ๏ธ Hosted option available
Cons:
- ๐ PostgreSQL only
- Smaller ecosystem than Debezium
Example:
// Real-time subscription with Supabase
import { createClient } from "@supabase/supabase-js";
const supabase = createClient(url, key);
// Subscribe to changes on products table
supabase
.on(
"postgres_changes",
{ event: "UPDATE", schema: "public", table: "products" },
(payload) => {
console.log("Product updated:", payload.new);
// Sync to Elasticsearch
updateElasticsearch(payload.new);
}
)
.subscribe();
Deploy: supabase.com (managed) or self-hosted
๐ AWS DMS (Database Migration Service): Fully Managed
AWS DMS handles CDC without the infrastructure burden.
Pros:
- โ๏ธ Fully managed by AWS
- โ Works with 10+ database types
- ๐ง Easy setup (Console/CLI)
- ๐ CloudWatch monitoring included
- โ Sub-second replication
Cons:
- ๐ฐ Pricing per instance-hour
- Vendor lock-in (AWS)
Example:
import boto3
dms = boto3.client('dms', region_name='us-east-1')
# Create replication task
response = dms.create_replication_task(
ReplicationTaskIdentifier='mysql-to-es-cdc',
SourceEndpointArn='arn:aws:dms:...mysql-endpoint...',
TargetEndpointArn='arn:aws:dms:...es-endpoint...',
ReplicationInstanceArn='arn:aws:dms:...instance...',
MigrationType='cdc', # Change Data Capture
TableMappings=json.dumps({
"rules": [
{
"rule-type": "selection",
"rule-id": "1",
"rule-name": "1",
"object-locator": {
"schema-name": "ecommerce",
"table-name": "products"
},
"rule-action": "include"
}
]
})
)
Cost: ~$0.50-1.50/hour per instance
๐ฏ Custom Webhook Solution: Simplest for Small Teams
For small projects, a simple webhook approach might be enough.
Concept:
- Application updates database
- App triggers webhook to search API
- Search API updates Elasticsearch
Pros:
- ๐ง No complex infrastructure
- ๐ Easy to understand
- ๐ธ Minimal cost
Cons:
- โ ๏ธ Webhook failures โ stale data
- Not distributed/scalable
- Application-level coupling
Example:
# Flask: Update product โ trigger search update
from flask import Flask
import requests
app = Flask(__name__)
@app.route('/products/<id>', methods=['PUT'])
def update_product(id):
# Update database
product = db.products.update(id, request.json)
# Trigger search index update
requests.post('http://localhost:9200/products/_doc/' + id,
json=product)
return product
๐ฏ How to Choose a CDC Tool?
๐ Learn more:
- Debezium: CDC for MySQL and PostgreSQL
- Supabase Realtime Docs
- AWS DMS Documentation
- CDC Patterns with Kafka Connect
๐ 5. Putting It All Together: Architecture Overview
Here's how a modern, end-to-end search system looks:
Visual System Architecture
A layered architecture showing how frontend, backend, search engine, and data sync work together.
Complete System Architecture Diagram
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FRONTEND LAYER โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Search Input Component (React/Vue/Angular) โ โ
โ โ โข Text input field โ โ
โ โ โข Debounce logic (300-500ms) โ โ
โ โ โข Display suggestions/autocomplete โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ HTTP/Debounced API Request
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ BACKEND/API LAYER โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ REST/GraphQL API Endpoint โ โ
โ โ โข Route: /api/search?q=query โ โ
โ โ โข Validation & sanitization โ โ
โ โ โข Rate limiting & caching โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Search Query Engine โ โ
โ โ โข Build query DSL โ โ
โ โ โข Apply filters & facets โ โ
โ โ โข Implement ranking logic โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโ
โ Query + Ranking Parameters
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SEARCH ENGINE LAYER โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โElasticsearch โ โ Typesense โ โ
โ โ (Advanced) โ โ(Fast/Easy) โ โ
โ โ โ โ โ โ
โ โ โข BM25 โ โ โข Typo-tol. โ โ
โ โ โข Millions โ โ โข Facets โ โ
โ โ of docs โ โ โข 100K+ docs โ โ
โ โ โข Complex โ โ โข Easy setup โ โ
โ โ queries โ โ โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Scored/Ranked Results
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโ
โ BACKEND RESPONSE โ
โ โข Results with scores โ
โ โข Cache in Redis โ
โ โข Serialize to JSON โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ RENDER IN FRONTEND โ
โ โข Display results list โ
โ โข Highlight relevance โ
โ โข Load more/pagination โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ DATA SYNC LAYER (CDC) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Source Database โ โ
โ โ (MySQL/PostgreSQL/MongoDB) โ โ
โ โ โข Products table โ โ
โ โ โข Inventory changes โ โ
โ โ โข Price updates โ โ
โ โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ โ
โ โ Binary Logs โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ CDC Tool โ โ
โ โ (Debezium/Maxwell/Custom) โ โ
โ โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Message Broker โ โ
โ โ (Kafka/Kinesis/Pub-Sub) โ โ
โ โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Search Index Updater โ โ
โ โ (Consumer) โ โ
โ โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Search Index โ โ
โ โ (Elasticsearch/Typesense) โ โ
โ โ Updated in real-time โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Request Flow Example
User searches: "wireless headphones under $300"
- Frontend (Debounce): User types "wireless headphones", waits 300ms after last keystroke
- API Call:
/api/search?q=wireless%20headphones&maxPrice=300&limit=10 - Backend Processing:
- Validate & sanitize query
- Check Redis cache for identical query (if configured)
- Build search query for chosen engine
- Search Engine Query (e.g., Elasticsearch):
match("wireless headphones") AND price <= 300- Engine returns 42 results, ranked by BM25 score + custom factors
- Backend Response:
{ "results": [ { "id": 1, "title": "Sony WH-1000XM5", "price": 299.99, "score": 8.95, "highlights": "<strong>Wireless</strong> <strong>Headphones</strong>" }, { "id": 2, "title": "Bose QC45 Headphones", "price": 279.99, "score": 8.42, "highlights": "<strong>Wireless</strong> Premium <strong>Headphones</strong>" } ], "total": 42, "facets": { "brands": [ { "name": "Sony", "count": 15 }, { "name": "Bose", "count": 12 } ] } } - Frontend Display: Renders results with highlighting, user sees relevant products instantly ๐
- Real-Time Sync: If a product price drops, CDC captures it and updates the index within seconds
๐งฉ 6. Step-by-Step Easy Implementation
Let's tie it all together with a working example setup ๐
๐ฅ๏ธ Frontend: Debounced Search Input (React)
import { useState, useCallback } from "react";
function SearchComponent() {
const [query, setQuery] = useState("");
const [results, setResults] = useState([]);
const [loading, setLoading] = useState(false);
// Debounce function
const debounce = (func, delay) => {
let timeout;
return (...args) => {
clearTimeout(timeout);
timeout = setTimeout(() => func.apply(this, args), delay);
};
};
// Search handler
const handleSearch = useCallback(
debounce(async (searchQuery) => {
if (!searchQuery.trim()) {
setResults([]);
return;
}
setLoading(true);
try {
const response = await fetch(
`/api/search?q=${encodeURIComponent(searchQuery)}`
);
const data = await response.json();
setResults(data.results);
} catch (error) {
console.error("Search error:", error);
} finally {
setLoading(false);
}
}, 300),
[]
);
return (
<div>
<input
type="text"
placeholder="Search products..."
value={query}
onChange={(e) => {
setQuery(e.target.value);
handleSearch(e.target.value);
}}
/>
{loading && <p>Searching...</p>}
<ul>
{results.map((item) => (
<li key={item.id}>
{item.title} - ${item.price}
</li>
))}
</ul>
</div>
);
}
export default SearchComponent;
โ๏ธ Backend: Node.js + Elasticsearch API
import express from "express";
import { Client } from "@elastic/elasticsearch";
const client = new Client({ node: "http://localhost:9200" });
const app = express();
app.get("/api/search", async (req, res) => {
try {
const query = req.query.q || "";
const maxPrice = req.query.maxPrice ? parseFloat(req.query.maxPrice) : null;
if (!query.trim()) {
return res.json({ results: [] });
}
// Build Elasticsearch query
const esQuery = {
index: "products",
body: {
query: {
bool: {
must: [
{
multi_match: {
query: query,
fields: ["title^2", "description", "tags"],
fuzziness: "AUTO",
},
},
],
},
},
size: 20,
sort: [{ _score: "desc" }],
},
};
// Add price filter if provided
if (maxPrice) {
esQuery.body.query.bool.filter = [
{ range: { price: { lte: maxPrice } } },
];
}
const result = await client.search(esQuery);
// Transform results
const results = result.hits.hits.map((hit) => ({
id: hit._id,
score: hit._score,
...hit._source,
}));
res.json({
results,
total: result.hits.total.value,
});
} catch (error) {
console.error("Search error:", error);
res.status(500).json({ error: "Search failed" });
}
});
app.listen(3000, () => {
console.log("Search API running on http://localhost:3000");
});
๐๏ธ Setting Up Elasticsearch Locally (Docker)
# Start Elasticsearch with Docker
docker run -d \
-p 9200:9200 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
docker.elastic.co/elasticsearch/elasticsearch:8.0.0
# Create products index with mapping
curl -X PUT "http://localhost:9200/products" -H "Content-Type: application/json" -d '{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"title": { "type": "text" },
"description": { "type": "text" },
"price": { "type": "float" },
"rating": { "type": "float" },
"in_stock": { "type": "boolean" }
}
}
}'
# Index sample data
curl -X POST "http://localhost:9200/products/_doc/1" -H "Content-Type: application/json" -d '{
"title": "Wireless Bluetooth Headphones",
"description": "Premium sound quality with noise cancellation",
"price": 299.99,
"rating": 4.8,
"in_stock": true
}'
๐ Database Sync: CDC with Debezium
Step 1: Start Kafka & Zookeeper
docker-compose up -d
Step 2: Create Debezium MySQL Connector
curl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d '{
"name": "mysql-cdc-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.hostname": "mysql",
"database.port": 3306,
"database.user": "root",
"database.password": "password",
"database.server.id": 1,
"database.server.name": "mysql-server",
"database.include.list": "ecommerce",
"table.include.list": "ecommerce.products",
"plugin.name": "pgoutput"
}
}'
Step 3: Create Elasticsearch Sink Connector
curl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d '{
"name": "es-sink-connector",
"config": {
"connector.class": "com.github.dariobalinzo.kafka.connect.ElasticsearchSinkConnector",
"topics": "mysql-server.ecommerce.products",
"connection.url": "http://elasticsearch:9200",
"connection.user": "elastic",
"connection.password": "password",
"type.name": "_doc",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter"
}
}'
Step 4: Test the Flow
# Update a product in MySQL
UPDATE products SET price = 199.99 WHERE id = 1;
# Check Elasticsearch (should be updated)
curl "http://localhost:9200/products/_doc/1"
โ Result: Price updated in MySQL โ CDC captures it โ Kafka streams it โ Elasticsearch updates it!
๐ Resources:
๐ 7. Performance Tips & Optimization
Frontend Optimization
// 1. Adjust debounce delay based on API latency
const DEBOUNCE_DELAY = 300; // ms
// 2. Add loading state to prevent duplicate requests
const [isSearching, setIsSearching] = useState(false);
// 3. Cache results for repeated queries
const resultsCache = new Map();
Elasticsearch Optimization
// 1. Use keyword fields for exact matching
"product_id": { "type": "keyword" }
// 2. Add analyzers for better text search
"title": {
"type": "text",
"analyzer": "standard",
"fields": {
"keyword": { "type": "keyword" }
}
}
// 3. Enable sharding for scale
"settings": {
"number_of_shards": 5,
"number_of_replicas": 2
}
Backend Optimization
// 1. Add query result caching
const Redis = require("redis");
const redis = Redis.createClient();
// 2. Implement pagination
const size = 20;
const from = (page - 1) * size;
// 3. Monitor query performance
console.time("elasticsearch_query");
const result = await client.search(query);
console.timeEnd("elasticsearch_query");
๐ 8. Final Thoughts
A great search system is not just about speed โ it's about smart engineering across every layer.
- Debouncing smooths user input, reducing unnecessary API calls
- Ranking ensures relevance, keeping users engaged
- Elasticsearch delivers results at lightning speed, powering scalability
- CDC keeps everything up to date in real time, maintaining data integrity
When combined, they create a seamless experience where users get relevant results โ instantly and accurately.
Whether you're building a small product search or a massive e-commerce platform, these principles apply. Start simple, measure performance, and optimize as you scale.
๐ Summary of Resources
Ready to build the next great search system? Start with debouncing, move to Elasticsearch, and level up with CDC. ๐
Happy searching! ๐