Zero-Downtime Kubernetes Deployments: Readiness, Rolling Updates, and Graceful Shutdown

Zero-downtime deployment in Kubernetes is not just a rolling update setting. For backend APIs, queues, database migrations, long-running requests, and SIGTERM handling, availability depends on how the application lifecycle and cluster lifecycle cooperate.

Kubernetes

Zero-downtime deployment in Kubernetes is often treated as a deployment strategy problem: set strategy: RollingUpdate, add a readiness probe, and the platform will handle the rest. In real backend systems, that is not enough. The difficult part is not starting a new pod. The difficult part is stopping the old one without dropping requests, duplicating jobs, blocking database writes, or routing traffic to an application that is technically running but not ready.

For a production backend API, deployment safety depends on several moving parts: readiness probes, rolling update limits, graceful shutdown, queue worker behavior, database migration strategy, and how the application reacts to SIGTERM. Kubernetes can coordinate traffic and pod lifecycle, but it cannot infer your application’s business rules. Your service must participate.

What “zero downtime” actually means

Zero downtime does not mean no pod ever exits. It means users and dependent systems do not observe an outage during deployment. For an API, that usually means:

Existing requests are allowed to complete where practical.
New traffic stops reaching pods that are shutting down.
New pods receive traffic only after the application is ready.
Queue jobs are not interrupted in unsafe ways.
Database changes remain compatible with both old and new code.
Load balancers, ingress controllers, and service endpoints converge without a visible gap.

This is why “zero downtime” is less about one Kubernetes feature and more about lifecycle alignment.

Kubernetes can remove a pod from service endpoints, but only the application can decide when it is safe to stop accepting work.

The common failure mode: alive is not ready

A frequent mistake is using a simple health endpoint for everything. The app returns 200 OK, so the pod is considered ready. But a backend service can be alive while still being unable to serve production traffic.

For example, the process may be running while:

The database connection pool is not initialized.
Required configuration is missing.
A cache warm-up is still in progress.
Background consumers have not started correctly.
The service is draining and should not receive new requests.

Kubernetes separates this into different concerns:

Probe	What it should answer	Traffic impact	Typical failure consequence
startupProbe	Has the application finished booting?	Delays liveness and readiness checks	Prevents slow-starting apps from being killed too early
readinessProbe	Can this pod receive traffic now?	Removes pod from service endpoints when failing	Stops new requests from reaching unavailable or draining pods
livenessProbe	Is the process stuck beyond recovery?	Restarts the container when failing	Recovers from deadlocks or broken process state

A production API should not rely on livenessProbe to control traffic. Use readinessProbe for traffic eligibility and reserve livenessProbe for cases where restart is the correct recovery action.

A deployment spec that supports draining

A practical Kubernetes deployment starts by making rollout behavior explicit. The goal is to avoid taking down too many old pods before new pods are actually ready.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend-api
spec:
  replicas: 6
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  minReadySeconds: 10
  selector:
    matchLabels:
      app: backend-api
  template:
    metadata:
      labels:
        app: backend-api
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: api
          image: registry.example/backend-api:2026-04-22
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            periodSeconds: 5
            failureThreshold: 1
          livenessProbe:
            httpGet:
              path: /health/live
              port: 8080
            periodSeconds: 10
            failureThreshold: 3
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 10"]

Several details matter here.

maxUnavailable: 0 tells Kubernetes not to reduce available capacity during rollout. maxSurge: 1 allows one extra pod above the desired replica count, so a new pod can become ready before an old pod is removed. This is usually safer for APIs than replacing pods one-for-one under load.

terminationGracePeriodSeconds gives the application time to finish in-flight work after receiving SIGTERM. It should be longer than the normal upper bound of request duration or job checkpoint interval, but not so high that broken pods remain around indefinitely.

The preStop sleep is not a replacement for graceful shutdown. It gives service endpoint updates and external load balancers a short buffer before the process exits. The application still needs to stop accepting new work and finish active work properly.

Readiness must change during shutdown

A pod that receives SIGTERM should usually become not ready before it exits. This prevents new traffic from being routed to a process that is already draining.

A minimal Node.js API pattern looks like this:

import http from "node:http";

let ready = false;
let shuttingDown = false;
let activeRequests = 0;

const server = http.createServer(async (req, res) => {
  if (req.url === "/health/live") {
    res.writeHead(200);
    return res.end("ok");
  }

  if (req.url === "/health/ready") {
    const status = ready && !shuttingDown ? 200 : 503;
    res.writeHead(status);
    return res.end(status === 200 ? "ready" : "not ready");
  }

  activeRequests++;

  try {
    await handleRequest(req, res);
  } finally {
    activeRequests--;
  }
});

async function shutdown() {
  shuttingDown = true;

  server.close(() => {
    process.exit(0);
  });

  setTimeout(() => {
    process.exit(1);
  }, 55_000);
}

process.on("SIGTERM", shutdown);

await initializeDatabasePool();
await warmRequiredState();

ready = true;
server.listen(8080);

The important behavior is not specific to Node.js. The same principle applies in PHP, Go, Java, Python, or any backend stack:

Expose readiness based on real traffic eligibility.
Flip readiness to false when shutdown starts.
Stop accepting new connections.
Let active requests complete within a bounded window.
Exit before Kubernetes sends SIGKILL.

If your framework or server already handles parts of this lifecycle, use it. But verify the actual behavior under deployment, not just under local process termination.

Long requests need explicit policy

Long-running requests are where zero-downtime assumptions often break. A 30-second report generation endpoint, file upload, export, or payment callback may outlive the default expectations of a rolling update.

There are only a few defensible options:

Request type	Preferred deployment behavior	Operational requirement
Short API call	Finish during grace period	Request timeout below termination grace period
Long read operation	Allow completion or move to async job	Clear timeout and cancellation policy
File upload	Avoid mid-stream termination	Ingress, app, and pod timeouts aligned
Payment or webhook callback	Make handler idempotent	Safe retry handling and durable state
Export or report generation	Move work to queue	Client polls or receives completion notification

Trying to make every request survive indefinitely is usually the wrong target. A better design is to keep synchronous HTTP work bounded and move long work to queues with explicit retry and idempotency.

Queue workers are not API pods

Backend systems often deploy API servers and queue workers from the same image. That is fine. Treating them as the same lifecycle is not.

An API pod must stop accepting new HTTP traffic. A worker must stop fetching new jobs, finish or safely release the current job, then exit. If a worker dies mid-job, the queue system must either retry safely or preserve enough state to resume.

A worker shutdown loop can be simple:

#!/usr/bin/env bash
set -euo pipefail

term_received=false

trap 'term_received=true' TERM INT

while true; do
  if [ "$term_received" = true ]; then
    echo "Shutdown requested, not fetching new jobs"
    exit 0
  fi

  php artisan queue:work \
    --once \
    --timeout=45 \
    --tries=3

  sleep 1
done

This pattern is intentionally conservative. The worker processes one job at a time, checks whether shutdown was requested, and avoids fetching another job after SIGTERM. In real systems, the exact command differs, but the lifecycle requirement is the same: do not start new work when the pod is already terminating.

For queue-heavy services, API and worker deployments should often be separated:

Deployment	Scaling signal	Shutdown concern	Rollout risk
API pods	Request rate, latency, CPU	In-flight HTTP requests	Dropped traffic or failed requests
Worker pods	Queue depth, job duration, CPU	Active job completion	Duplicate jobs or lost progress
Scheduler pods	Time-based triggers	Avoid duplicate scheduling	Multiple pods triggering same task
Migration job	Schema version	Compatibility with live code	Breaking old or new release

Database migrations are the hidden downtime source

Many Kubernetes rollouts fail because the application deployment is safe, but the database migration is not. Rolling updates mean old and new application versions may run at the same time. Your schema must support that overlap.

The safe pattern is expand, deploy, contract:

Expand the schema with backward-compatible changes.
Deploy application code that can work with both old and new schema state.
Backfill data if needed.
Switch reads or writes to the new path.
Remove old columns, indexes, or code only after all old versions are gone.

For example, adding a nullable column is usually easier to roll out than renaming a column used by live code. A safer migration might look like this:

ALTER TABLE orders
ADD COLUMN external_reference text;

CREATE INDEX CONCURRENTLY idx_orders_external_reference
ON orders (external_reference);

The exact SQL depends on the database, but the principle is stable: avoid schema changes that instantly break the currently running version. Destructive migrations should be delayed until the system no longer depends on the old structure.

This is especially important with maxSurge, because during a rollout your old and new pods intentionally coexist. That coexistence is what preserves availability. The database must be compatible with it.

Rolling update is not enough without capacity

maxUnavailable: 0 helps only if the cluster can schedule surge pods. If the cluster is already at its CPU, memory, pod, or connection limit, Kubernetes may not be able to create the extra pod before terminating old capacity.

Before assuming a rollout is safe, check the operational constraints:

Is there enough node capacity for maxSurge?
Are pod disruption budgets aligned with deployment strategy?
Can the database handle temporary extra connections?
Are connection pools sized per pod, not just globally?
Does autoscaling react fast enough for rollout traffic?
Do ingress and load balancer timeouts exceed normal request duration?
Are readiness failures visible in metrics and alerts?

A rollout can be correct in YAML and still fail because the surrounding system has no spare capacity.

What to adopt first

A practical adoption sequence is better than trying to fix every lifecycle edge case at once.

Start with the API lifecycle:

Add real readinessProbe and livenessProbe endpoints.
Make readiness fail during shutdown.
Handle SIGTERM in the application.
Set terminationGracePeriodSeconds based on observed request duration.
Use maxUnavailable: 0 for critical API deployments.
Test rollout under traffic, not just in an idle environment.

Then separate worker behavior:

Run workers in their own deployment.
Stop fetching jobs after shutdown starts.
Make job handlers idempotent.
Align job timeout, retry visibility timeout, and termination grace period.
Avoid running migrations inside every application pod startup.

Finally, clean up database delivery:

Use backward-compatible migrations.
Avoid destructive schema changes in the same release as dependent code changes.
Treat long backfills as operational jobs, not as blocking deploy steps.
Keep old and new application versions compatible during rollout overlap.

Test the lifecycle, not only the code

Unit tests will not prove zero-downtime deployment. You need deployment tests that exercise the lifecycle under realistic conditions.

A useful test plan includes:

Send steady traffic while running kubectl rollout restart deployment/backend-api.
Include slow requests that approach normal timeout limits.
Verify no new traffic reaches pods after readiness fails.
Confirm SIGTERM is logged and handled.
Check that queue workers do not start new jobs after termination begins.
Run migrations in the same order used by production delivery.
Watch p95 and p99 latency, error rate, queue retries, and database connection count during rollout.

The goal is not to prove that failure is impossible. The goal is to make shutdown behavior predictable enough that deployment is a routine operation rather than a production gamble.

For engineers who work with Kubernetes delivery, probes, rollouts, and workload lifecycle in production, the most relevant certification to review is Kubernetes Specialist.

Conclusion

Zero-downtime deployment in Kubernetes is not achieved by enabling rolling updates alone. Rolling updates create the opportunity for safe replacement. Readiness probes decide when traffic should flow. Graceful shutdown decides what happens to work already in progress. Database migrations decide whether old and new versions can coexist.

For backend APIs, queues, migrations, long requests, and SIGTERM, the reliable approach is explicit lifecycle design. Make readiness truthful, make shutdown bounded, make jobs idempotent, make migrations backward-compatible, and test deployment under real traffic. That is what turns Kubernetes from a restart mechanism into a dependable delivery platform.