Subhadeep Datta logo

Subhadeep Datta

Blog/

Building a Modern Search System: From Debouncing to Ranking and Real-Time Updates

Building a Modern Search System: From Debouncing to Ranking and Real-Time Updates

Building a Modern Search System: From Debouncing to Ranking and Real-Time Updates

๐Ÿ“…October 25, 2025
โฑ๏ธ22 min read
โœ๏ธSubhadeep Datta
search
elasticsearch
performance
backend
CDC
system-design

๐Ÿงญ Introduction: Why Search Is Harder Than It Looks

Type a few letters into Google or Amazon, and results appear instantly โ€” ranked, relevant, and fresh.

Behind that magic lies a chain of complex engineering ideas:

  • Debouncing makes typing feel smooth
  • Ranking algorithms decide what appears first
  • Elasticsearch retrieves relevant results at scale
  • Change Data Capture (CDC) keeps everything up-to-date in real time

In this article, we'll break each piece down, step by step โ€” and end with a small, working example showing how to connect them all together.


๐Ÿ’ก 1. Debouncing: The First Layer of Performance

When a user types in a search box, every keystroke can trigger an API call โ€” wasting bandwidth and hammering your backend.

Example: typing "apple" fires 5 requests:

a โ†’ ap โ†’ app โ†’ appl โ†’ apple

That's inefficient. Debouncing solves this by waiting until the user pauses typing before running the search.

๐Ÿงฉ How Debouncing Works

Wait X milliseconds after the last keystroke before executing the function.

If another keystroke happens before the delay ends, the timer resets.

๐Ÿง  Example (JavaScript)

function debounce(func, delay) {
  let timeout;
  return (...args) => {
    clearTimeout(timeout);
    timeout = setTimeout(() => func.apply(this, args), delay);
  };
}

// Usage
const handleSearch = debounce((query) => {
  fetch(`/api/search?q=${query}`)
    .then((res) => res.json())
    .then((data) => console.log(data));
}, 300);

โœ… Result: Only one API call after the user stops typing โ€” smoother UX, less load.

Performance Impact

Without debouncing:

  • 1000 users typing 5 characters = 5,000 requests
  • With debouncing: same scenario = 1,000 requests

That's an 80% reduction in unnecessary API calls!

๐Ÿ‘‰ Learn more:


โš–๏ธ 2. Ranking: Making Search Results Relevant

Once the query hits your backend, you need to decide which results should appear first.

Ranking is the "brain" of search โ€” it determines what's most relevant.

Common Ranking Factors

  1. Text relevance โ€” how well the content matches the query
  2. Popularity โ€” e.g., click counts, purchases
  3. Recency โ€” newer results may rank higher
  4. Personalization โ€” user preferences or history
  5. User signals โ€” likes, shares, time-spent

๐Ÿ”ข Simplified Ranking Formula

Final Score = (text_score ร— 0.6) + (popularity ร— 0.3) + (recency ร— 0.1)

This formula is weighted because:

  • 60% text relevance (user is looking for specific content)
  • 30% popularity (trusted/purchased products rank higher)
  • 10% recency (fresh products get a small boost)

Example Ranking Scenario

Imagine searching for "wireless headphones" โ€” here's how three products score:

Ranking Comparison

Visual breakdown showing why Sony ranks #1: balanced excellence across all ranking factors.

Sony WH-1000XM5

  • Text Score: 95/100 (matches "wireless" + "headphones")
  • Popularity: 90/100 (50K+ purchases)
  • Recency: 85/100 (released 2 years ago, still current)
  • Final Score: 90.8 โญ #1 Ranked

Generic Headphones

  • Text Score: 85/100 (matches query but less specific)
  • Popularity: 40/100 (5K purchases)
  • Recency: 95/100 (just released)
  • Final Score: 73.5 (not enough popularity)

Vintage Headphones

  • Text Score: 88/100 (matches but labeled "vintage")
  • Popularity: 30/100 (old product, few recent purchases)
  • Recency: 20/100 (released 10 years ago)
  • Final Score: 60.4 (low overall score)

Result: Sony ranks first because it excels across all factors. Users see the most relevant product first! ๐ŸŽ‰


๐Ÿงฎ Real-World Algorithms

Search engines use sophisticated algorithms:

  • TF-IDF (Term Frequencyโ€“Inverse Document Frequency) โ€” classic approach
  • BM25 โ€” a modern improvement on TF-IDF (used by Elasticsearch)
  • Learning to Rank (LTR) โ€” machine learning-based ranking

BM25 Formula (Simplified):

score(D, Q) = ฮฃ IDF(qi) ร— (f(qi, D) ร— (k1 + 1)) / (f(qi, D) + k1 ร— (1 - b + b ร— |D| / avgdl))

Where:

  • D = document
  • Q = query
  • IDF = inverse document frequency
  • f = term frequency
  • k1, b = tuning parameters

๐Ÿ‘‰ Learn more:


โš™๏ธ 3. Elasticsearch: The Engine Powering Modern Search

Elasticsearch is a distributed search and analytics engine built on Apache Lucene. It's used by companies like Netflix, Uber, and Shopify to power their search systems.

๐Ÿง  Why Elasticsearch?

  • Blazing fast full-text search โ€” searches millions of documents in milliseconds
  • Scalable โ€” can handle billions of documents across clusters
  • Powerful ranking & scoring out-of-the-box (BM25)
  • Supports fuzzy search, filtering, and aggregations
  • RESTful API โ€” easy to integrate with any backend

๐Ÿ—ƒ๏ธ Basic Architecture

Example: Index and Document

PUT /products/_doc/1
{
  "id": 1,
  "title": "Wireless Bluetooth Headphones",
  "description": "Premium sound quality with noise cancellation",
  "price": 299.99,
  "rating": 4.8,
  "reviews_count": 1250
}

Example: Simple Query

GET /products/_search
{
  "query": {
    "match": {
      "title": "wireless headphones"
    }
  },
  "size": 10,
  "from": 0
}

Response Example

{
  "hits": {
    "total": { "value": 42 },
    "hits": [
      {
        "_id": "1",
        "_score": 8.95,
        "_source": {
          "title": "Wireless Bluetooth Headphones",
          "price": 299.99,
          "rating": 4.8
        }
      }
    ]
  }
}

Advanced: Combining Multiple Queries

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "headphones" } }
      ],
      "filter": [
        { "range": { "price": { "lte": 500 } } },
        { "term": { "in_stock": true } }
      ]
    }
  }
}

This query returns products matching "headphones" that are under $500 and in stock.

๐Ÿ‘‰ Learn more:


๐Ÿ”„ 3.5 Search Engine Alternatives to Elasticsearch

Elasticsearch is powerful but complex to set up and operate. Here are production-ready alternatives depending on your needs:

Comparison Table: Search Engines

๐Ÿš€ Typesense: Fast, Developer-Friendly Alternative

Typesense is a modern search engine optimized for speed and ease of use.

Pros:

  • โšก Instant search results (~100ms)
  • ๐ŸŽฏ Built-in typo tolerance
  • ๐Ÿ” Facets & filtering out-of-box
  • ๐Ÿ“ฆ Easy deployment (Docker, Heroku, etc.)
  • ๐Ÿ’ฐ Open-source & self-hosted
  • ๐Ÿ“ฑ Great for mobile apps

Cons:

  • Document limits on free tier
  • Smaller ecosystem than Elasticsearch
  • Fewer advanced features

Quick Example:

// Install: npm install typesense
import Typesense from "typesense";

const client = new Typesense.Client({
  nodes: [{ host: "localhost", port: 8108, protocol: "http" }],
  apiKey: "xyz789",
});

// Search
const results = await client.collections("products").documents().search({
  q: "wireless headphones",
  query_by: "title,description",
  filter_by: "price:<= 300",
  limit: 10,
});

Deploy Typesense:


๐Ÿ”Ž Meilisearch: The User-Experience Champion

Meilisearch prioritizes beautiful search UX with minimal configuration.

Pros:

  • ๐ŸŽจ Amazing UX by default
  • โšก Fast (HTTP response in ~50ms)
  • ๐Ÿง™ Zero-config relevance
  • ๐Ÿ“š Excellent documentation
  • ๐Ÿ”“ Open-source & MIT licensed
  • ๐ŸŒ REST API only (simpler than Elasticsearch)

Cons:

  • Smaller than Typesense in some benchmarks
  • Less flexible for custom ranking

Quick Example:

// Install: npm install meilisearch
import { MeiliSearch } from "meilisearch";

const client = new MeiliSearch({
  host: "http://localhost:7700",
  apiKey: "masterKey",
});

// Add documents
await client.index("products").addDocuments([
  { id: 1, title: "Sony Headphones", price: 299 },
  { id: 2, title: "Bose Headphones", price: 279 },
]);

// Search
const results = await client.index("products").search("wireless");

Deploy Meilisearch:


๐Ÿ’ผ Algolia: The Enterprise SaaS Option

Algolia is a fully managed SaaS solution for teams that need turnkey search.

Pros:

  • โœ… Zero infrastructure management
  • โœ… Global CDN (fast everywhere)
  • โœ… Outstanding documentation
  • โœ… Analytics & insights included
  • โœ… Premium support

Cons:

  • ๐Ÿ’ฐ Can be expensive at scale ($0.008+ per query)
  • Vendor lock-in (proprietary platform)
  • Less control over algorithms

Ideal for: Startups, high-traffic sites where ops overhead is a concern.

Algolia Pricing & Docs


๐Ÿ Whoosh: Lightweight Python Alternative

For small to medium projects in Python, Whoosh is a pure-Python search library.

Pros:

  • ๐Ÿ“ฆ Single Python package (no servers to run)
  • ๐Ÿš€ Great for simple use cases
  • ๐Ÿ”ง Fully customizable

Cons:

  • โŒ No distributed/scaling capability
  • Limited to local/single-machine
  • Slower than Elasticsearch/Typesense

Quick Example:

from whoosh.fields import Schema, TEXT
from whoosh.index import create_in

# Define schema
schema = Schema(
  id=ID(stored=True),
  title=TEXT(stored=True),
  content=TEXT
)

# Create index
ix = create_in('indexdir', schema)
writer = ix.writer()

# Index documents
writer.add_document(id='1', title='Wireless Headphones', content='...')
writer.commit()

# Search
with ix.searcher() as searcher:
  results = searcher.find('title', 'wireless')
  for result in results:
    print(result['title'])

๐ŸŽฏ How to Choose?

  • Enterprise, Millions of docs, Complex queries? โ†’ Elasticsearch
  • Want speed + easy setup? โ†’ Typesense
  • Prioritize UX + Open Source? โ†’ Meilisearch
  • Don't want to manage infrastructure? โ†’ Algolia
  • Small Python project? โ†’ Whoosh

๐Ÿ”„ 4. Change Data Capture (CDC): Keeping Search Fresh

Even the best search index becomes outdated if your source data changes.

When products are added, deleted, or updated in your main database, your search index must stay in sync.

That's where Change Data Capture (CDC) comes in.

๐Ÿง  What Is CDC?

CDC continuously monitors your database for changes and streams them to another system โ€” like Elasticsearch.

The Problem CDC Solves

Imagine an e-commerce platform:

  • Product gets updated in MySQL
  • Price drops to $99
  • But search still shows $199
  • Customer buys, expecting $99 price
  • Revenue loss. Angry customer. Bad review.

CDC prevents this by ensuring search is always up-to-date.

Example Flow

  1. User updates product price in MySQL (from $299 โ†’ $199)
  2. CDC tool (Debezium) captures the change in MySQL binlog
  3. Message sent to Kafka with the update event
  4. Kafka Consumer reads the event
  5. Elasticsearch updated with new price
  6. Next search query returns updated price โœ…

CDC Architecture Diagram

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   MySQL Database    โ”‚
โ”‚   Product updated   โ”‚
โ”‚   (price: $199)     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚ (binlog)
           โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Debezium (CDC)     โ”‚
โ”‚  Captures changes   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚
           โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Apache Kafka       โ”‚
โ”‚  Streams changes    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚
           โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Elasticsearch      โ”‚
โ”‚  (Index updated)    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Tools for CDC


๐Ÿ”„ 4.5 CDC Tools Deep Dive

๐Ÿ”ฅ Debezium: The Industry Standard

Debezium is the most popular open-source CDC tool for databases. Used by companies like Walmart, Booking.com, and Square.

Pros:

  • โœ… Works with major databases (MySQL, PostgreSQL, MongoDB, Oracle)
  • โœ… Battle-tested, enterprise-grade
  • โœ… Free & open-source
  • โœ… Sub-second latency
  • โœ… Rich community & documentation

Cons:

  • ๏ฟฝ Requires Kafka & Zookeeper setup (operational overhead)
  • ๐Ÿ“ˆ Learning curve

Architecture:

MySQL โ†’ Debezium Connector โ†’ Kafka โ†’ Elasticsearch Sink โ†’ Elasticsearch Index

Example: Debezium + Kafka Setup

# 1. Start Zookeeper
docker run -d --name zookeeper \
  -e ZOOKEEPER_CLIENT_PORT=2181 \
  confluentinc/cp-zookeeper

# 2. Start Kafka
docker run -d --name kafka \
  -e KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 \
  -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092 \
  -e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
  confluentinc/cp-kafka

# 3. Create Debezium MySQL connector
curl -X POST http://localhost:8083/connectors \
  -H "Content-Type: application/json" \
  -d '{
    "name": "mysql-debezium",
    "config": {
      "connector.class": "io.debezium.connector.mysql.MySqlConnector",
      "database.hostname": "mysql",
      "database.port": 3306,
      "database.user": "root",
      "database.password": "password",
      "database.server.id": 1,
      "database.server.name": "dbserver1",
      "table.include.list": "ecommerce.products",
      "topic.prefix": "mysql"
    }
  }'

# 4. Listen to Kafka topic
docker exec kafka kafka-console-consumer \
  --bootstrap-server localhost:9092 \
  --topic mysql.ecommerce.products \
  --from-beginning

โšก Supabase Realtime: PostgreSQL + Real-time

Supabase combines PostgreSQL with built-in real-time subscriptions.

Pros:

  • ๐Ÿš€ Extremely fast (< 100ms latency)
  • ๐ŸŽฏ Built into PostgreSQL, no extra infrastructure
  • ๐Ÿ’ก Real-time WebSocket subscriptions included
  • ๐Ÿ”“ Open-source
  • โ˜๏ธ Hosted option available

Cons:

  • ๐Ÿ”’ PostgreSQL only
  • Smaller ecosystem than Debezium

Example:

// Real-time subscription with Supabase
import { createClient } from "@supabase/supabase-js";

const supabase = createClient(url, key);

// Subscribe to changes on products table
supabase
  .on(
    "postgres_changes",
    { event: "UPDATE", schema: "public", table: "products" },
    (payload) => {
      console.log("Product updated:", payload.new);
      // Sync to Elasticsearch
      updateElasticsearch(payload.new);
    }
  )
  .subscribe();

Deploy: supabase.com (managed) or self-hosted


๐Ÿš€ AWS DMS (Database Migration Service): Fully Managed

AWS DMS handles CDC without the infrastructure burden.

Pros:

  • โ˜๏ธ Fully managed by AWS
  • โœ… Works with 10+ database types
  • ๐Ÿ”ง Easy setup (Console/CLI)
  • ๐Ÿ“Š CloudWatch monitoring included
  • โœ… Sub-second replication

Cons:

  • ๐Ÿ’ฐ Pricing per instance-hour
  • Vendor lock-in (AWS)

Example:

import boto3

dms = boto3.client('dms', region_name='us-east-1')

# Create replication task
response = dms.create_replication_task(
    ReplicationTaskIdentifier='mysql-to-es-cdc',
    SourceEndpointArn='arn:aws:dms:...mysql-endpoint...',
    TargetEndpointArn='arn:aws:dms:...es-endpoint...',
    ReplicationInstanceArn='arn:aws:dms:...instance...',
    MigrationType='cdc',  # Change Data Capture
    TableMappings=json.dumps({
        "rules": [
            {
                "rule-type": "selection",
                "rule-id": "1",
                "rule-name": "1",
                "object-locator": {
                    "schema-name": "ecommerce",
                    "table-name": "products"
                },
                "rule-action": "include"
            }
        ]
    })
)

Cost: ~$0.50-1.50/hour per instance


๐ŸŽฏ Custom Webhook Solution: Simplest for Small Teams

For small projects, a simple webhook approach might be enough.

Concept:

  1. Application updates database
  2. App triggers webhook to search API
  3. Search API updates Elasticsearch

Pros:

  • ๐Ÿง  No complex infrastructure
  • ๐Ÿ“ Easy to understand
  • ๐Ÿ’ธ Minimal cost

Cons:

  • โš ๏ธ Webhook failures โ†’ stale data
  • Not distributed/scalable
  • Application-level coupling

Example:

# Flask: Update product โ†’ trigger search update
from flask import Flask
import requests

app = Flask(__name__)

@app.route('/products/<id>', methods=['PUT'])
def update_product(id):
    # Update database
    product = db.products.update(id, request.json)

    # Trigger search index update
    requests.post('http://localhost:9200/products/_doc/' + id,
                  json=product)

    return product

๐ŸŽฏ How to Choose a CDC Tool?

๐Ÿ‘‰ Learn more:


๐Ÿ”— 5. Putting It All Together: Architecture Overview

Here's how a modern, end-to-end search system looks:

Visual System Architecture

Search System Architecture

A layered architecture showing how frontend, backend, search engine, and data sync work together.

Complete System Architecture Diagram

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   FRONTEND LAYER                    โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  Search Input Component (React/Vue/Angular)  โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Text input field                          โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Debounce logic (300-500ms)               โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Display suggestions/autocomplete          โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ”‚ HTTP/Debounced API Request
               โ–ผ
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚      BACKEND/API LAYER                 โ”‚
    โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
    โ”‚  โ”‚  REST/GraphQL API Endpoint     โ”‚   โ”‚
    โ”‚  โ”‚  โ€ข Route: /api/search?q=query โ”‚   โ”‚
    โ”‚  โ”‚  โ€ข Validation & sanitization   โ”‚   โ”‚
    โ”‚  โ”‚  โ€ข Rate limiting & caching     โ”‚   โ”‚
    โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
    โ”‚               โ”‚                        โ”‚
    โ”‚               โ–ผ                        โ”‚
    โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
    โ”‚  โ”‚  Search Query Engine           โ”‚   โ”‚
    โ”‚  โ”‚  โ€ข Build query DSL             โ”‚   โ”‚
    โ”‚  โ”‚  โ€ข Apply filters & facets      โ”‚   โ”‚
    โ”‚  โ”‚  โ€ข Implement ranking logic     โ”‚   โ”‚
    โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                     โ”‚ Query + Ranking Parameters
                     โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚      SEARCH ENGINE LAYER                       โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”           โ”‚
โ”‚  โ”‚Elasticsearch โ”‚  โ”‚  Typesense   โ”‚           โ”‚
โ”‚  โ”‚  (Advanced)  โ”‚  โ”‚(Fast/Easy)   โ”‚           โ”‚
โ”‚  โ”‚              โ”‚  โ”‚              โ”‚           โ”‚
โ”‚  โ”‚ โ€ข BM25       โ”‚  โ”‚ โ€ข Typo-tol.  โ”‚           โ”‚
โ”‚  โ”‚ โ€ข Millions   โ”‚  โ”‚ โ€ข Facets     โ”‚           โ”‚
โ”‚  โ”‚   of docs    โ”‚  โ”‚ โ€ข 100K+ docs โ”‚           โ”‚
โ”‚  โ”‚ โ€ข Complex    โ”‚  โ”‚ โ€ข Easy setup โ”‚           โ”‚
โ”‚  โ”‚   queries    โ”‚  โ”‚              โ”‚           โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                     โ”‚ Scored/Ranked Results
                     โ–ผ
            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
            โ”‚  BACKEND RESPONSE      โ”‚
            โ”‚  โ€ข Results with scores โ”‚
            โ”‚  โ€ข Cache in Redis      โ”‚
            โ”‚  โ€ข Serialize to JSON   โ”‚
            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                             โ”‚
                             โ–ผ
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚   RENDER IN FRONTEND     โ”‚
              โ”‚  โ€ข Display results list  โ”‚
              โ”‚  โ€ข Highlight relevance   โ”‚
              โ”‚  โ€ข Load more/pagination  โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚    DATA SYNC LAYER (CDC)             โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  Source Database               โ”‚  โ”‚
โ”‚  โ”‚  (MySQL/PostgreSQL/MongoDB)    โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Products table              โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Inventory changes           โ”‚  โ”‚
โ”‚  โ”‚  โ€ข Price updates               โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚               โ”‚ Binary Logs          โ”‚
โ”‚               โ–ผ                       โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  CDC Tool                      โ”‚  โ”‚
โ”‚  โ”‚  (Debezium/Maxwell/Custom)     โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚               โ”‚                       โ”‚
โ”‚               โ–ผ                       โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  Message Broker                โ”‚  โ”‚
โ”‚  โ”‚  (Kafka/Kinesis/Pub-Sub)       โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚               โ”‚                       โ”‚
โ”‚               โ–ผ                       โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  Search Index Updater          โ”‚  โ”‚
โ”‚  โ”‚  (Consumer)                    โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚               โ”‚                       โ”‚
โ”‚               โ–ผ                       โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  Search Index                  โ”‚  โ”‚
โ”‚  โ”‚  (Elasticsearch/Typesense)     โ”‚  โ”‚
โ”‚  โ”‚  Updated in real-time โœ“        โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Request Flow Example

User searches: "wireless headphones under $300"

  1. Frontend (Debounce): User types "wireless headphones", waits 300ms after last keystroke
  2. API Call: /api/search?q=wireless%20headphones&maxPrice=300&limit=10
  3. Backend Processing:
    • Validate & sanitize query
    • Check Redis cache for identical query (if configured)
    • Build search query for chosen engine
  4. Search Engine Query (e.g., Elasticsearch):
    match("wireless headphones") AND price <= 300
    
    • Engine returns 42 results, ranked by BM25 score + custom factors
  5. Backend Response:
    {
      "results": [
        {
          "id": 1,
          "title": "Sony WH-1000XM5",
          "price": 299.99,
          "score": 8.95,
          "highlights": "<strong>Wireless</strong> <strong>Headphones</strong>"
        },
        {
          "id": 2,
          "title": "Bose QC45 Headphones",
          "price": 279.99,
          "score": 8.42,
          "highlights": "<strong>Wireless</strong> Premium <strong>Headphones</strong>"
        }
      ],
      "total": 42,
      "facets": {
        "brands": [
          { "name": "Sony", "count": 15 },
          { "name": "Bose", "count": 12 }
        ]
      }
    }
    
  6. Frontend Display: Renders results with highlighting, user sees relevant products instantly ๐ŸŽ‰
  7. Real-Time Sync: If a product price drops, CDC captures it and updates the index within seconds

๐Ÿงฉ 6. Step-by-Step Easy Implementation

Let's tie it all together with a working example setup ๐Ÿ‘‡

๐Ÿ–ฅ๏ธ Frontend: Debounced Search Input (React)

import { useState, useCallback } from "react";

function SearchComponent() {
  const [query, setQuery] = useState("");
  const [results, setResults] = useState([]);
  const [loading, setLoading] = useState(false);

  // Debounce function
  const debounce = (func, delay) => {
    let timeout;
    return (...args) => {
      clearTimeout(timeout);
      timeout = setTimeout(() => func.apply(this, args), delay);
    };
  };

  // Search handler
  const handleSearch = useCallback(
    debounce(async (searchQuery) => {
      if (!searchQuery.trim()) {
        setResults([]);
        return;
      }

      setLoading(true);
      try {
        const response = await fetch(
          `/api/search?q=${encodeURIComponent(searchQuery)}`
        );
        const data = await response.json();
        setResults(data.results);
      } catch (error) {
        console.error("Search error:", error);
      } finally {
        setLoading(false);
      }
    }, 300),
    []
  );

  return (
    <div>
      <input
        type="text"
        placeholder="Search products..."
        value={query}
        onChange={(e) => {
          setQuery(e.target.value);
          handleSearch(e.target.value);
        }}
      />
      {loading && <p>Searching...</p>}
      <ul>
        {results.map((item) => (
          <li key={item.id}>
            {item.title} - ${item.price}
          </li>
        ))}
      </ul>
    </div>
  );
}

export default SearchComponent;

โš™๏ธ Backend: Node.js + Elasticsearch API

import express from "express";
import { Client } from "@elastic/elasticsearch";

const client = new Client({ node: "http://localhost:9200" });
const app = express();

app.get("/api/search", async (req, res) => {
  try {
    const query = req.query.q || "";
    const maxPrice = req.query.maxPrice ? parseFloat(req.query.maxPrice) : null;

    if (!query.trim()) {
      return res.json({ results: [] });
    }

    // Build Elasticsearch query
    const esQuery = {
      index: "products",
      body: {
        query: {
          bool: {
            must: [
              {
                multi_match: {
                  query: query,
                  fields: ["title^2", "description", "tags"],
                  fuzziness: "AUTO",
                },
              },
            ],
          },
        },
        size: 20,
        sort: [{ _score: "desc" }],
      },
    };

    // Add price filter if provided
    if (maxPrice) {
      esQuery.body.query.bool.filter = [
        { range: { price: { lte: maxPrice } } },
      ];
    }

    const result = await client.search(esQuery);

    // Transform results
    const results = result.hits.hits.map((hit) => ({
      id: hit._id,
      score: hit._score,
      ...hit._source,
    }));

    res.json({
      results,
      total: result.hits.total.value,
    });
  } catch (error) {
    console.error("Search error:", error);
    res.status(500).json({ error: "Search failed" });
  }
});

app.listen(3000, () => {
  console.log("Search API running on http://localhost:3000");
});

๐Ÿ—„๏ธ Setting Up Elasticsearch Locally (Docker)

# Start Elasticsearch with Docker
docker run -d \
  -p 9200:9200 \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=false" \
  docker.elastic.co/elasticsearch/elasticsearch:8.0.0

# Create products index with mapping
curl -X PUT "http://localhost:9200/products" -H "Content-Type: application/json" -d '{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "description": { "type": "text" },
      "price": { "type": "float" },
      "rating": { "type": "float" },
      "in_stock": { "type": "boolean" }
    }
  }
}'

# Index sample data
curl -X POST "http://localhost:9200/products/_doc/1" -H "Content-Type: application/json" -d '{
  "title": "Wireless Bluetooth Headphones",
  "description": "Premium sound quality with noise cancellation",
  "price": 299.99,
  "rating": 4.8,
  "in_stock": true
}'

๐Ÿ”„ Database Sync: CDC with Debezium

Step 1: Start Kafka & Zookeeper

docker-compose up -d

Step 2: Create Debezium MySQL Connector

curl -X POST http://localhost:8083/connectors \
  -H "Content-Type: application/json" \
  -d '{
    "name": "mysql-cdc-connector",
    "config": {
      "connector.class": "io.debezium.connector.mysql.MySqlConnector",
      "database.hostname": "mysql",
      "database.port": 3306,
      "database.user": "root",
      "database.password": "password",
      "database.server.id": 1,
      "database.server.name": "mysql-server",
      "database.include.list": "ecommerce",
      "table.include.list": "ecommerce.products",
      "plugin.name": "pgoutput"
    }
  }'

Step 3: Create Elasticsearch Sink Connector

curl -X POST http://localhost:8083/connectors \
  -H "Content-Type: application/json" \
  -d '{
    "name": "es-sink-connector",
    "config": {
      "connector.class": "com.github.dariobalinzo.kafka.connect.ElasticsearchSinkConnector",
      "topics": "mysql-server.ecommerce.products",
      "connection.url": "http://elasticsearch:9200",
      "connection.user": "elastic",
      "connection.password": "password",
      "type.name": "_doc",
      "key.converter": "org.apache.kafka.connect.json.JsonConverter",
      "value.converter": "org.apache.kafka.connect.json.JsonConverter"
    }
  }'

Step 4: Test the Flow

# Update a product in MySQL
UPDATE products SET price = 199.99 WHERE id = 1;

# Check Elasticsearch (should be updated)
curl "http://localhost:9200/products/_doc/1"

โœ… Result: Price updated in MySQL โ†’ CDC captures it โ†’ Kafka streams it โ†’ Elasticsearch updates it!

๐Ÿ“š Resources:


๐Ÿš€ 7. Performance Tips & Optimization

Frontend Optimization

// 1. Adjust debounce delay based on API latency
const DEBOUNCE_DELAY = 300; // ms

// 2. Add loading state to prevent duplicate requests
const [isSearching, setIsSearching] = useState(false);

// 3. Cache results for repeated queries
const resultsCache = new Map();

Elasticsearch Optimization

// 1. Use keyword fields for exact matching
"product_id": { "type": "keyword" }

// 2. Add analyzers for better text search
"title": {
  "type": "text",
  "analyzer": "standard",
  "fields": {
    "keyword": { "type": "keyword" }
  }
}

// 3. Enable sharding for scale
"settings": {
  "number_of_shards": 5,
  "number_of_replicas": 2
}

Backend Optimization

// 1. Add query result caching
const Redis = require("redis");
const redis = Redis.createClient();

// 2. Implement pagination
const size = 20;
const from = (page - 1) * size;

// 3. Monitor query performance
console.time("elasticsearch_query");
const result = await client.search(query);
console.timeEnd("elasticsearch_query");

๐ŸŒŸ 8. Final Thoughts

A great search system is not just about speed โ€” it's about smart engineering across every layer.

  • Debouncing smooths user input, reducing unnecessary API calls
  • Ranking ensures relevance, keeping users engaged
  • Elasticsearch delivers results at lightning speed, powering scalability
  • CDC keeps everything up to date in real time, maintaining data integrity

When combined, they create a seamless experience where users get relevant results โ€” instantly and accurately.

Whether you're building a small product search or a massive e-commerce platform, these principles apply. Start simple, measure performance, and optimize as you scale.


๐Ÿ”— Summary of Resources


Ready to build the next great search system? Start with debouncing, move to Elasticsearch, and level up with CDC. ๐Ÿš€

Happy searching! ๐Ÿ”

๐Ÿ’ก Found this helpful? Share it with others!