DevCerts logo DevCerts

Why Your Python API Is Slow: Async, Pools, ORM Queries, and JSON

Slow Python API endpoints are rarely caused by Python alone. The real causes usually sit at the boundaries: fake async code, blocking calls inside event loops, exhausted database pools, inefficient ORM queries, and expensive JSON serialization.

Python
Why Your Python API Is Slow: Async, Pools, ORM Queries, and JSON

A slow Python API is often misdiagnosed as a framework problem, an async problem, or a language problem. In real services, the endpoint is usually slow because one part of the request path blocks everything else: a synchronous library inside an async handler, a database pool with too few connections, an ORM query that loads too much data, or a response model that spends too much CPU turning objects into JSON.

The central mistake is treating async as a performance switch. It is not. Async helps when the service spends much of its time waiting on I/O and every I/O operation cooperates with the event loop. If one dependency blocks, the endpoint can still look modern while behaving like a single-file queue.

Async is a concurrency model, not a speed feature

An async endpoint does not make the database faster, reduce query complexity, compress payloads, or remove CPU cost. It mainly changes what the worker can do while waiting.

That distinction matters in APIs where each request performs several remote operations:

  • reads from PostgreSQL or MySQL

  • calls another internal service

  • queries Redis

  • validates and serializes response objects

  • writes logs, metrics, or audit events

If those operations are properly asynchronous, the worker can serve other requests while waiting. If any operation blocks the event loop, other requests wait behind it.

A common mistake is writing an async route but calling blocking code inside it:

# Bad: async route, blocking work inside it

import time
import requests
from fastapi import FastAPI

app = FastAPI()

@app.get("/orders/{order_id}")
async def get_order(order_id: int):
    time.sleep(0.2)  # blocks the event loop

    response = requests.get(f"https://billing.internal/orders/{order_id}")
    return response.json()

This endpoint is syntactically async but operationally blocking. Under low traffic, it may appear fine. Under concurrent traffic, requests queue behind time.sleep() and requests.get() because neither yields control back to the event loop.

A better version uses non-blocking libraries and avoids artificial blocking:

# Better: async route with cooperative I/O

import httpx
from fastapi import FastAPI

app = FastAPI()

@app.get("/orders/{order_id}")
async def get_order(order_id: int):
    async with httpx.AsyncClient(timeout=2.0) as client:
        response = await client.get(f"https://billing.internal/orders/{order_id}")
        response.raise_for_status()
        return response.json()

This still does not guarantee low latency. The billing service can be slow. DNS can be slow. JSON parsing can be costly. But the worker is no longer blocked while waiting for the network.

Async code improves concurrency only when the slow part of the request cooperates with the concurrency model.

The sync vs async decision in production

Sync Python APIs are not automatically wrong. For many services, a synchronous stack with enough worker processes is easier to debug, easier to operate, and predictable under load. Async becomes useful when the API has many concurrent I/O-bound requests and the team can keep the whole request path non-blocking.

Runtime model

Concurrency behavior

Failure mode

Operational complexity

Best fit

Sync workers

One request occupies one worker

Worker exhaustion under slow I/O

Low to Medium

CPU-light APIs, simple CRUD, predictable traffic

Async workers

One worker can interleave many waiting tasks

Event loop blocked by sync calls

Medium

I/O-heavy APIs with async-compatible dependencies

Background jobs

Request delegates work to queue

Queue delay, retry complexity

Medium to High

Slow side effects, emails, reports, webhooks

More replicas

Adds process and connection capacity

Higher database pressure

Medium

Horizontal scaling after bottlenecks are understood

The table shows why “switch to async” is not a complete plan. If the API is slow because each request runs a bad SQL query, async may increase the number of concurrent bad queries. That can make the database slower and push latency up for everyone.

Database pools: the hidden queue inside your API

A database connection pool is not just a configuration detail. It is a concurrency limit. Every request that needs a database connection either gets one or waits.

When p95 latency rises while CPU usage stays moderate, the pool is one of the first places to inspect. A small pool can serialize requests even when the application has many workers. A large pool can overload the database and increase lock contention, memory usage, and query scheduling overhead.

The failure pattern often looks like this:

  1. API traffic increases.

  2. More requests reach the database layer at the same time.

  3. The pool reaches its limit.

  4. Requests wait for a connection.

  5. Request duration grows.

  6. Connections are held longer.

  7. The pool stays saturated.

This is a feedback loop, not a simple capacity problem.

A simplified SQLAlchemy-style async setup might make the pool explicit:

from sqlalchemy.ext.asyncio import create_async_engine, async_sessionmaker

engine = create_async_engine(
    "postgresql+asyncpg://app:secret@db/app",
    pool_size=10,
    max_overflow=5,
    pool_timeout=2,
)

SessionLocal = async_sessionmaker(engine, expire_on_commit=False)

The right values depend on worker count, database capacity, query duration, transaction length, and whether other services share the same database. The important part is not the specific number. The important part is that the pool must be treated as a production control point and observed directly.

Track at least:

  • time spent waiting for a connection

  • active vs idle connections

  • query duration by endpoint

  • transaction duration

  • database CPU, locks, and slow query logs

  • request p95 and p99 latency

If pool wait time is high, increasing the pool may help only if the database has spare capacity. If the database is already overloaded, a larger pool can make the incident worse.

ORM queries: readable code can hide expensive behavior

ORMs make data access easier to write, but they can hide the shape of the actual database work. Slow endpoints often contain one of these patterns:

  • N+1 queries from lazy-loaded relationships

  • loading full rows when only a few columns are needed

  • filtering in Python after fetching too much data

  • missing pagination on list endpoints

  • serializing deeply nested relationship graphs

  • long transactions around unrelated work

The application code may look harmless:

# Bad: likely N+1 query pattern

@app.get("/customers")
async def list_customers(session: AsyncSession):
    customers = await session.scalars(select(Customer))

    return [
        {
            "id": customer.id,
            "email": customer.email,
            "orders": [order.id for order in customer.orders],
        }
        for customer in customers
    ]

If customer.orders is lazy-loaded, this can trigger one query for customers and then one query per customer. On a small local database, the endpoint looks acceptable. In production, network round trips, locks, and row counts expose the real cost.

A better approach is to make the query shape intentional:

# Better: explicit loading and bounded result size

from sqlalchemy import select
from sqlalchemy.orm import selectinload

@app.get("/customers")
async def list_customers(session: AsyncSession, limit: int = 100):
    stmt = (
        select(Customer)
        .options(selectinload(Customer.orders))
        .order_by(Customer.id.desc())
        .limit(min(limit, 100))
    )

    result = await session.scalars(stmt)
    customers = result.all()

    return [
        {
            "id": customer.id,
            "email": customer.email,
            "order_ids": [order.id for order in customer.orders],
        }
        for customer in customers
    ]

This does not mean every query should eager-load everything. It means every endpoint should have a deliberate data access pattern. A detail endpoint, a list endpoint, and an export endpoint have different query shapes and should not reuse the same broad “load the object graph” approach.

JSON serialization can become the endpoint

Once database queries are optimized, the next bottleneck is often response construction. JSON serialization is CPU work. It can dominate latency when the endpoint returns large lists, nested objects, or models with expensive computed fields.

This is especially common when teams return ORM entities directly and rely on the framework to figure out the response. That can create accidental coupling between persistence models and API contracts.

Prefer explicit response shapes:

# Better: return only the fields the API contract needs

@app.get("/orders")
async def list_orders(session: AsyncSession):
    rows = await session.execute(
        select(Order.id, Order.status, Order.total_cents)
        .order_by(Order.created_at.desc())
        .limit(100)
    )

    return [
        {
            "id": row.id,
            "status": row.status,
            "total_cents": row.total_cents,
        }
        for row in rows
    ]

This reduces database work, Python object construction, validation cost, and JSON size. It also makes the API contract more stable. The database model can change without accidentally changing the response payload.

For large responses, also consider:

  • pagination or cursor-based navigation

  • field selection for expensive nested data

  • streaming only when the client can consume it safely

  • moving exports to background jobs

  • caching stable computed responses

  • compressing responses at the edge when appropriate

The point is not to avoid JSON. The point is to stop treating serialization as free.

A practical debugging sequence

When an endpoint is slow, start with the request path rather than the framework brand. Measure each segment before changing architecture.

A useful sequence is:

  1. Confirm whether latency is network, application, database, or serialization heavy.

  2. Check whether async endpoints call sync libraries.

  3. Measure database pool wait time, not just query time.

  4. Inspect the exact SQL generated by ORM code.

  5. Count queries per request.

  6. Measure response payload size and serialization time.

  7. Separate slow side effects from the request path.

  8. Re-test under realistic concurrency, not only single-request local tests.

This order prevents expensive rewrites. It also keeps the team focused on the bottleneck with the highest operational impact.

For example, a 900 ms endpoint might be spending time like this:

Segment

What to inspect

Typical fix direction

Pool wait

Connection acquisition time

Tune worker and pool counts, shorten transactions

SQL execution

Slow query log, query plan

Add index, change query shape, reduce joins

ORM hydration

Rows loaded, relationships loaded

Select fewer columns, avoid broad object graphs

Remote calls

Timeout, retries, dependency latency

Use async client, set budgets, move side effects

JSON serialization

Payload size, nested models

Reduce response shape, paginate, cache

The exact numbers will vary by workload, but the diagnostic structure stays useful.

What to adopt first

For most teams, the highest return changes are not dramatic:

  • Put timeouts on every outbound call.

  • Make blocking calls visible in code review.

  • Track database pool wait time.

  • Log slow queries with endpoint context.

  • Add query count checks to performance-sensitive tests.

  • Limit list endpoint payloads by default.

  • Keep API response models separate from ORM entities.

  • Move slow side effects to queues when the client does not need the result immediately.

Async migration should come after this analysis, not before it. Otherwise, the team risks making the runtime more complex while preserving the original bottlenecks.

If you work with Python APIs in production and want a structured way to validate senior-level backend judgment, the Senior Python Developer certification is the most relevant DevCerts track to review.


Conclusion

A slow Python API is usually slow at the boundaries. The endpoint waits for connections, runs inefficient queries, blocks the event loop, creates too many Python objects, or serializes more JSON than the client needs.

The practical answer is not “use async” or “avoid async.” The answer is to understand where time is spent, keep the concurrency model consistent, make database access explicit, control pool behavior, and treat serialization as real work. Once those pieces are visible, performance work becomes engineering rather than guessing.