DevCerts logo DevCerts

Graceful Shutdown in Go: APIs, Workers, and Queue Consumers in Kubernetes

Graceful shutdown in Go is not just handling SIGTERM. In production, it is a lifecycle contract between your application, Kubernetes, load balancers, background workers, and message brokers.

Go Kubernetes
Graceful Shutdown in Go: APIs, Workers, and Queue Consumers in Kubernetes

Graceful shutdown in Go becomes important the first time a deployment causes dropped HTTP requests, duplicated queue messages, or half-finished background jobs. The problem is rarely Go itself. The problem is that many services treat shutdown as an afterthought: catch SIGTERM, call Close(), and hope the process exits cleanly.

In Kubernetes, that is not enough. During a rollout, a pod may still receive traffic for a short period, active requests may still be running, workers may still be processing jobs, and Kafka or RabbitMQ consumers may still hold messages that are not safe to acknowledge. A correct shutdown path must coordinate all of these moving parts within a bounded termination window.

Graceful shutdown is a lifecycle, not a signal handler

A production Go service usually has several concurrent responsibilities:

  • An HTTP API accepting external requests

  • Background workers running scheduled or internal jobs

  • Kafka, RabbitMQ, or other queue consumers

  • Database connections and transactions

  • Metrics, tracing, and log flushing

  • Kubernetes readiness and termination behavior

The mistake is to wire all of this directly to os.Signal and let every component decide what to do. That creates races. The HTTP server may still accept requests while the consumer is closing. A worker may start a new job after shutdown has already begun. A message may be acknowledged before its side effects are safely committed.

A better model is explicit application lifecycle management:

  1. Receive termination signal.

  2. Mark the process as not ready.

  3. Stop accepting new work.

  4. Let in-flight work finish within a deadline.

  5. Cancel remaining work.

  6. Flush telemetry and close resources.

  7. Exit with a predictable status.

A graceful shutdown is not successful because the process exits. It is successful because the process exits without creating ambiguous work.

What Kubernetes changes during rollout

In a typical Kubernetes rollout, old pods are terminated while new pods are started. The old pod receives SIGTERM, and Kubernetes waits for the configured termination grace period before forcing the container to stop.

From the application point of view, the risky part is the gap between “the pod is terminating” and “no traffic or work can reach it.” Service endpoint updates, load balancer behavior, client retries, and connection reuse can overlap. This means the application should not assume that receiving SIGTERM instantly removes it from all traffic paths.

The service should become unready early, then stop accepting new work, then drain existing work. Readiness is not a replacement for shutdown logic, but it is an important part of the contract.

Naive shutdown versus production shutdown

Area

Naive approach

Production-oriented approach

Runtime behavior

HTTP API

Exit on SIGTERM

Stop accepting new requests and drain active ones

Lower risk of dropped requests

Readiness

Always returns 200 until exit

Returns failure once shutdown starts

Pod leaves traffic rotation earlier

Workers

Loop until process dies

Stop polling, finish current job, respect deadline

Fewer partial jobs

Kafka consumers

Close immediately

Stop fetching, finish processing, commit completed offsets

Lower duplicate or lost processing risk

RabbitMQ consumers

Ack early or close channel

Ack only after successful processing, nack or requeue unfinished work

Clearer message ownership

Shutdown deadline

No explicit timeout

Context with bounded grace period

Predictable exit behavior

Observability

Logs disappear on exit

Final logs and metrics are flushed where possible

Easier rollout debugging

The goal is not to make shutdown infinitely patient. It is to make shutdown bounded and understandable.

A practical Go shutdown skeleton

The core pattern is a root context controlled by signals, plus a separate shutdown context with a deadline. The root context tells components to stop starting new work. The shutdown context limits how long the service will wait.

package main

import (
	"context"
	"errors"
	"log/slog"
	"net/http"
	"os"
	"os/signal"
	"sync/atomic"
	"syscall"
	"time"
)

func main() {
	var shuttingDown atomic.Bool

	rootCtx, stop := signal.NotifyContext(
		context.Background(),
		syscall.SIGINT,
		syscall.SIGTERM,
	)
	defer stop()

	mux := http.NewServeMux()

	mux.HandleFunc("/readyz", func(w http.ResponseWriter, r *http.Request) {
		if shuttingDown.Load() {
			http.Error(w, "shutting down", http.StatusServiceUnavailable)
			return
		}
		w.WriteHeader(http.StatusOK)
	})

	mux.HandleFunc("/api/orders", func(w http.ResponseWriter, r *http.Request) {
		select {
		case <-r.Context().Done():
			return
		default:
			// Handle request using r.Context().
			w.WriteHeader(http.StatusAccepted)
		}
	})

	server := &http.Server{
		Addr:              ":8080",
		Handler:           mux,
		ReadHeaderTimeout: 5 * time.Second,
	}

	errCh := make(chan error, 1)

	go func() {
		slog.Info("http server started", "addr", server.Addr)
		errCh <- server.ListenAndServe()
	}()

	select {
	case <-rootCtx.Done():
		slog.Info("shutdown signal received")
	case err := <-errCh:
		if !errors.Is(err, http.ErrServerClosed) {
			slog.Error("http server failed", "error", err)
			os.Exit(1)
		}
	}

	shuttingDown.Store(true)

	shutdownCtx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
	defer cancel()

	if err := server.Shutdown(shutdownCtx); err != nil {
		slog.Error("http shutdown deadline exceeded", "error", err)
		_ = server.Close()
	}

	slog.Info("service stopped")
}

This does three important things:

  • Readiness fails after shutdown starts.

  • http.Server.Shutdown stops accepting new connections and waits for active handlers.

  • The shutdown wait is bounded.

The timeout should be lower than the Kubernetes termination grace period, leaving time for final logs and cleanup. For example, if the pod has a 30 second grace period, the application should not spend all 30 seconds inside HTTP shutdown.

Kubernetes configuration should match the app lifecycle

The Kubernetes side should reflect the same assumptions. The termination grace period must be long enough for realistic request and worker completion, but not so long that rollouts stall when a pod is unhealthy.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: orders-api
spec:
  replicas: 3
  template:
    spec:
      terminationGracePeriodSeconds: 35
      containers:
        - name: app
          image: orders-api:latest
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /readyz
              port: 8080
            periodSeconds: 5
            failureThreshold: 1

A preStop hook can sometimes help when external load balancers need a short delay before traffic fully drains, but it should not be the main shutdown mechanism. The application still needs to handle SIGTERM correctly. Sleeping in preStop without application-level draining only moves the race somewhere else.

Workers: stop polling before stopping execution

Workers need a different shutdown strategy from HTTP handlers. A worker should stop taking new jobs as soon as shutdown begins, but it may continue the job it already owns if there is enough time left.

The worker loop should be driven by context. Avoid loops that ignore cancellation until the next long sleep or blocking operation finishes.

func runWorker(ctx context.Context, jobs <-chan Job, handle func(context.Context, Job) error) error {
	for {
		select {
		case <-ctx.Done():
			return ctx.Err()

		case job, ok := <-jobs:
			if !ok {
				return nil
			}

			jobCtx, cancel := context.WithTimeout(ctx, 20*time.Second)
			err := handle(jobCtx, job)
			cancel()

			if err != nil {
				// Record failure, retry, or let the queue mechanism redeliver.
				// The exact behavior should match the job's idempotency model.
				return err
			}
		}
	}
}

The important rule is simple: shutdown should prevent new work from being claimed. It should not blindly kill work that is already past the point of no return.

For long-running jobs, add checkpoints. A job that takes several minutes should not be treated as one uninterruptible unit unless the platform allows that much termination time. Store progress explicitly, make operations idempotent, and design retries as part of the normal path.

Kafka consumers: commit only what is complete

Kafka shutdown is mostly about offset ownership. The unsafe pattern is to read a message, commit the offset, and then process the message. If the process dies after the commit but before the side effect, the message may be skipped from the consumer group’s point of view.

A safer pattern is:

  1. Stop polling when shutdown starts.

  2. Finish messages already handed to handlers.

  3. Commit offsets only after successful processing.

  4. Close the consumer before the shutdown deadline expires.

Pseudocode varies by client library, but the lifecycle should look like this:

func consumeKafka(ctx context.Context, consumer Consumer, handle func(context.Context, Message) error) error {
	defer consumer.Close()

	for {
		select {
		case <-ctx.Done():
			return nil

		default:
			msg, err := consumer.Poll(ctx)
			if err != nil {
				if ctx.Err() != nil {
					return nil
				}
				return err
			}

			if err := handle(ctx, msg); err != nil {
				// Do not commit a failed message.
				// Let retry, dead-letter, or redelivery policy handle it.
				return err
			}

			if err := consumer.Commit(ctx, msg); err != nil {
				return err
			}
		}
	}
}

This does not eliminate duplicate processing. Kafka consumers should still be idempotent because a process can crash between the side effect and the commit. Graceful shutdown reduces unnecessary duplicates during planned rollouts, but it is not a substitute for idempotency.

RabbitMQ consumers: acknowledge after durable success

RabbitMQ has a similar but not identical concern. The key decision is when to acknowledge a delivery. If the consumer sends ack before the business operation is durable, a shutdown can lose work. If it never acknowledges successful work, the message may be redelivered and processed again.

A production consumer should usually:

  • Use manual acknowledgements.

  • Ack only after successful processing.

  • Nack or requeue when work cannot finish safely.

  • Stop consuming new deliveries when shutdown starts.

  • Keep handler concurrency bounded.

func handleDelivery(ctx context.Context, d Delivery, process func(context.Context, []byte) error) {
	err := process(ctx, d.Body)

	if err == nil {
		_ = d.Ack(false)
		return
	}

	if ctx.Err() != nil {
		// Shutdown interrupted processing. Requeue unless the message is known unsafe to retry.
		_ = d.Nack(false, true)
		return
	}

	// Non-shutdown failure. Route according to retry or dead-letter policy.
	_ = d.Nack(false, false)
}

The exact retry policy depends on the system. Some messages should be requeued. Some should go to a dead-letter exchange after bounded attempts. The shutdown path should not invent a different reliability model from the normal failure path.

Coordination: use one shutdown budget

One common production bug is giving every component its own full timeout. The API waits 30 seconds, then the worker waits 30 seconds, then the consumer waits 30 seconds, while Kubernetes only allows 45 seconds. This works in local testing and fails during rollout.

Use one process-level shutdown budget and divide it deliberately:

Component

Shutdown action

Typical constraint

Readiness

Fail immediately

Should happen first

HTTP server

Drain active requests

Bound by request timeout

Workers

Finish current job or checkpoint

Bound by job design

Kafka consumer

Stop polling, finish, commit

Bound by broker session and app deadline

RabbitMQ consumer

Stop consuming, ack or nack owned messages

Bound by handler deadline

Telemetry

Flush logs, traces, metrics

Small remaining budget

The budget should be based on real request timeouts and job behavior, not wishful thinking. If a handler can take two minutes but the pod gets 30 seconds to terminate, the system is already inconsistent.

Testing graceful shutdown

Graceful shutdown should be tested as a behavior, not reviewed as code style. Useful tests include:

  • Send a long HTTP request, trigger SIGTERM, verify the request completes.

  • Trigger SIGTERM, verify readiness fails before process exit.

  • Start a worker job, cancel the context, verify no new job is claimed.

  • Process a Kafka message, terminate before commit, verify it can be processed again.

  • Process a RabbitMQ delivery, terminate before ack, verify requeue behavior.

  • Run a Kubernetes rollout under load and inspect errors, retries, and duplicate work.

Local tests can cover most lifecycle bugs. Cluster tests reveal integration timing issues: readiness propagation, load balancer behavior, connection reuse, and shutdown budget mismatches.

What to adopt first

A team does not need a framework to improve graceful shutdown. The most useful first steps are practical:

  1. Add a single application shutdown context.

  2. Make readiness fail as soon as shutdown begins.

  3. Use http.Server.Shutdown instead of abruptly closing the process.

  4. Stop workers and consumers from claiming new work.

  5. Acknowledge or commit messages only after durable success.

  6. Make handlers and jobs context-aware.

  7. Align application timeouts with terminationGracePeriodSeconds.

For engineers who work with Go services in production, the most relevant certification to review is Senior Go Developer, especially if your day-to-day work includes concurrency, APIs, and long-running backend processes.


Conclusion

Graceful shutdown in Go is not a cleanup function at the end of main. It is part of the service contract. In Kubernetes, that contract includes readiness, signal handling, HTTP draining, worker cancellation, broker acknowledgements, and a realistic shutdown budget.

The production goal is not to avoid every retry or duplicate. Distributed systems cannot promise that during every failure mode. The goal is to make planned termination boring: no new work after shutdown starts, active work gets a fair deadline, completed work is recorded correctly, and unfinished work has a clear retry path. That is what turns Kubernetes rollouts from a reliability risk into a routine operation.