Subhadeep Datta logo

Subhadeep Datta

Blog/

Scaling Node.js: From Single Server to Handling Millions of Requests

Scaling Node.js: From Single Server to Handling Millions of Requests

Scaling Node.js: From Single Server to Handling Millions of Requests

📅February 1, 2026
⏱️8 min read
✍️Subhadeep Datta
Node.js
Backend
Performance
System Design
Architecture

🚀 Introduction: Node.js Can Scale — If You Let It

There's a myth that Node.js can't handle heavy loads because it's single-threaded. I used to believe it too. Then I scaled a Node.js system to handle 3 million daily transactions at Noisiv Consulting, and it's been running at 99.9% uptime for over two years.

Node.js isn't slow. But it won't scale itself. You need to understand its concurrency model and layer the right patterns on top.

This article covers what I've learned — the practical stuff, not the theoretical. From clustering on a single box to distributing across a fleet behind a load balancer.


🧠 1. Understanding the Event Loop (The 60-Second Version)

Node.js is single-threaded, but that doesn't mean it can only do one thing at a time. It uses an event loop to handle concurrent I/O without blocking.

Here's the mental model:

Client Request → Event Loop → I/O Operation (DB, File, Network)
                     ↓                    ↓
             Handle next request    Callback when done
                     ↓                    ↓
              Continue processing  ← Result returned

This works beautifully for I/O-heavy workloads (APIs, web servers, chat apps). It falls apart for CPU-heavy tasks (image processing, data crunching, PDF generation).

The rule: Keep the event loop free. Never block it with synchronous work.


📈 2. Vertical Scaling: Use All Your CPU Cores

By default, Node.js uses one CPU core. If you're running on an 8-core machine, 87.5% of your compute is sitting idle.

Node.js Cluster Module

The cluster module forks your app into multiple worker processes, one per CPU core:

import cluster from "node:cluster";
import { cpus } from "node:os";
import http from "node:http";

const numCPUs = cpus().length;

if (cluster.isPrimary) {
  console.log(`Primary ${process.pid} starting ${numCPUs} workers`);

  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on("exit", (worker, code) => {
    console.log(`Worker ${worker.process.pid} died (code: ${code}). Restarting...`);
    cluster.fork(); // Auto-restart dead workers
  });
} else {
  http.createServer((req, res) => {
    res.writeHead(200);
    res.end(`Handled by worker ${process.pid}\n`);
  }).listen(3000);

  console.log(`Worker ${process.pid} started`);
}

Or Just Use PM2

In practice, I use PM2 instead of the raw cluster module. It handles clustering, logging, restarts, and monitoring in one tool:

# Start in cluster mode, 1 instance per CPU core
pm2 start server.js -i max --name api

# Monitor
pm2 monit

# Zero-downtime reload
pm2 reload api

Impact: Throughput on an 8-core machine goes from ~5,000 req/s to ~35,000 req/s.


🌐 3. Horizontal Scaling: Multiple Servers

Once you've maxed out a single machine, add more machines. This is where load balancing comes in.

Nginx as a Load Balancer

# /etc/nginx/conf.d/api.conf
upstream api_servers {
    least_conn;  # Send to the server with fewest active connections
    server 10.0.1.10:3000;
    server 10.0.1.11:3000;
    server 10.0.1.12:3000;
}

server {
    listen 80;
    server_name api.example.com;

    location / {
        proxy_pass http://api_servers;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;

        # WebSocket support
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

Load Balancing Strategies

For stateless APIs (which yours should be), Least Connections is usually the best choice.

Making Your App Stateless

For horizontal scaling to work, your app can't store state in memory. Move these to external services:

❌ In-memory sessions → ✅ Redis sessions
❌ Local file uploads  → ✅ S3 / Cloud Storage
❌ In-process cache    → ✅ Redis cache
❌ Local job queues    → ✅ BullMQ + Redis

🧊 4. Caching: The Biggest Performance Win

Before adding more servers, ask yourself: can I just cache this?

Multi-Layer Caching

Client → CDN (static assets) → Nginx (reverse proxy cache) → Redis (app cache) → Database

Each layer catches requests before they hit the next one. In a well-cached system, only 10-20% of requests actually reach your database.

Redis Caching in Node.js

import Redis from "ioredis";

const redis = new Redis({
  host: "redis-cluster.internal",
  port: 6379,
  maxRetriesPerRequest: 3,
});

// Cache-aside pattern
async function getUser(userId) {
  const cacheKey = `user:${userId}`;

  // Try cache first
  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);

  // Cache miss: fetch from DB
  const user = await db.users.findById(userId);
  if (user) {
    await redis.setex(cacheKey, 600, JSON.stringify(user)); // 10 min TTL
  }

  return user;
}

// Invalidate on write
async function updateUser(userId, data) {
  await db.users.updateById(userId, data);
  await redis.del(`user:${userId}`); // Clear cache
}

What to Cache (and What Not To)


⚡ 5. Worker Threads: When You Need CPU Power

For CPU-intensive work (image processing, PDF generation, data transformations), don't block the event loop. Offload to worker threads:

// main.js
import { Worker } from "node:worker_threads";

app.post("/api/report", async (req, res) => {
  const result = await runInWorker("./workers/generate-report.js", {
    reportId: req.body.reportId,
    dateRange: req.body.dateRange,
  });
  res.json(result);
});

function runInWorker(workerFile, data) {
  return new Promise((resolve, reject) => {
    const worker = new Worker(workerFile, { workerData: data });
    worker.on("message", resolve);
    worker.on("error", reject);
  });
}
// workers/generate-report.js
import { parentPort, workerData } from "node:worker_threads";

// CPU-intensive work happens here, off the main thread
const report = generateExpensiveReport(workerData);
parentPort.postMessage(report);

For recurring heavy work, consider a job queue like BullMQ:

import { Queue, Worker as BullWorker } from "bullmq";

const reportQueue = new Queue("reports", { connection: redis });

// Producer: add job
await reportQueue.add("generate", { reportId: "abc-123" });

// Consumer: process job (can run on a separate server)
new BullWorker("reports", async (job) => {
  const report = await generateReport(job.data.reportId);
  await saveReport(report);
}, { connection: redis, concurrency: 5 });

📊 6. Monitoring: You Can't Scale What You Can't Measure

Essential Metrics

// Simple request timing middleware
app.use((req, res, next) => {
  const start = Date.now();

  res.on("finish", () => {
    const duration = Date.now() - start;
    console.log(JSON.stringify({
      method: req.method,
      path: req.path,
      status: res.statusCode,
      duration_ms: duration,
      timestamp: new Date().toISOString(),
    }));

    // Alert if response > 2 seconds
    if (duration > 2000) {
      console.warn(`Slow request: ${req.method} ${req.path} took ${duration}ms`);
    }
  });

  next();
});

What to Monitor

Event Loop Monitoring

import { monitorEventLoopDelay } from "node:perf_hooks";

const histogram = monitorEventLoopDelay({ resolution: 20 });
histogram.enable();

setInterval(() => {
  console.log({
    eventLoopDelay: {
      min: histogram.min / 1e6,   // Convert ns to ms
      max: histogram.max / 1e6,
      mean: histogram.mean / 1e6,
      p99: histogram.percentile(99) / 1e6,
    },
  });
  histogram.reset();
}, 60000); // Log every minute

🎯 Scaling Playbook: What to Do at Each Stage


🎯 Conclusion

Scaling Node.js isn't about rewriting everything in Go or switching to a "more scalable" language. It's about understanding the event loop, using all available cores, caching aggressively, and distributing load.

The system I run at Noisiv handles 3 million daily transactions on a modest fleet of servers. The secret isn't exotic technology — it's applying these fundamentals consistently.

Start with PM2 clustering and Redis. You'll be surprised how far that gets you.


📚 Key Takeaways

  • Cluster mode is free performance — use all your CPU cores
  • Horizontal scaling requires stateless design (move sessions and cache to Redis)
  • Caching eliminates 80% of database load when done right
  • Worker threads keep the event loop free for I/O
  • Monitor everything — you can't optimize what you can't measure

💡 Found this helpful? Share it with others!