PHP Memory Limit: Finding Leaks in Imports, Queues, and Cron Jobs

PHP memory limit errors in imports, queues, and cron jobs are rarely solved by raising memory_limit alone. The real work is finding where objects, arrays, ORM models, XML trees, and job state accumulate across long-running execution.

PHP

PHP Memory Limit: Finding Leaks in Imports, Queues, and Cron Jobs

Content

PHP memory limit errors usually appear at the worst point: halfway through a CSV import, during an overnight report, or inside a queue worker that has been stable for weeks. The immediate reaction is often to increase memory_limit. Sometimes that buys time. It rarely fixes the design problem.

The core issue is lifecycle. Classic PHP requests are short-lived, so memory disappears when the request ends. Imports, queue workers, report generators, and cron commands behave differently. They can process thousands or millions of records in one process, which means every retained reference, cached model, accumulated array, and parsed XML node can become production risk.

The mistake: treating memory_limit as capacity planning

memory_limit is a guardrail, not a scaling strategy. Raising it from 256M to 1024M may prevent an immediate crash, but it also hides whether the process grows linearly with input size.

A healthy long-running command should have a memory profile that rises during a batch, then returns close to a stable baseline. A risky command grows with every processed row.

Typical symptoms include:

CSV import crashes only on large files
Queue workers fail after processing many jobs, not the first job
Cron reports work in staging but fail in production data volume
Memory grows even when records are processed in chunks
Switching to a larger server delays the failure but does not remove it

In long-running PHP processes, the dangerous question is not “How much memory does this job need?” It is “Does memory keep growing after each unit of work?”

First, measure memory inside the process

Before changing implementation, add lightweight memory logging around the unit of work. For imports, that unit may be every 1,000 rows. For queues, it may be every job. For reports, it may be every customer, tenant, or date range.

<?php

function logMemory(string $label): void
{
    printf(
        "[%s] current=%s MB peak=%s MB\n",
        $label,
        round(memory_get_usage(true) / 1024 / 1024, 2),
        round(memory_get_peak_usage(true) / 1024 / 1024, 2)
    );
}

logMemory('start');

foreach ($batches as $index => $batch) {
    processBatch($batch);

    if ($index % 10 === 0) {
        logMemory("after batch {$index}");
    }
}

logMemory('end');

This does not replace profiling, but it quickly separates two cases:

Expected peak usage, memory rises during a large operation but stabilizes.
Retention leak, memory keeps growing after each batch or job.

In PHP, “leak” often means application-level retention, not a runtime bug. You may be keeping references in arrays, static properties, service singletons, closures, event listeners, ORM identity maps, logger buffers, or report builders.

CSV imports: streaming beats loading

A common failure pattern is reading the entire file before processing it.

<?php

// Bad for large files: the full file becomes an array in memory.
$rows = array_map('str_getcsv', file($path));

foreach ($rows as $row) {
    importRow($row);
}

This pattern scales memory with file size. A 20 MB CSV can become much larger after parsing because each field becomes a PHP string in an array structure. The safer pattern is streaming line by line.

<?php

$handle = fopen($path, 'rb');

if ($handle === false) {
    throw new RuntimeException("Cannot open CSV file.");
}

try {
    while (($row = fgetcsv($handle)) !== false) {
        importRow($row);
    }
} finally {
    fclose($handle);
}

For real imports, add batching around database writes instead of collecting the full file. Keep the batch size explicit and clear the batch after flushing.

<?php

$batch = [];
$batchSize = 1000;

while (($row = fgetcsv($handle)) !== false) {
    $batch[] = mapRowToPayload($row);

    if (count($batch) >= $batchSize) {
        insertBatch($batch);
        $batch = [];
    }
}

if ($batch !== []) {
    insertBatch($batch);
}

The important detail is not the exact batch size. It is that memory is bounded by the batch size, not by the file size.

XML imports: avoid building the whole tree

XML is a frequent source of memory problems because DOM-style parsing builds an in-memory tree. That can be acceptable for small documents, but it is the wrong default for large feeds.

A streaming parser keeps the process focused on one node at a time.

<?php

$reader = new XMLReader();

if (!$reader->open($path)) {
    throw new RuntimeException("Cannot open XML file.");
}

while ($reader->read()) {
    if ($reader->nodeType === XMLReader::ELEMENT && $reader->name === 'product') {
        $xml = $reader->readOuterXML();

        if ($xml === '') {
            continue;
        }

        $product = new SimpleXMLElement($xml);
        importProduct($product);

        unset($product, $xml);
    }
}

$reader->close();

This still creates a temporary object for each product, but it does not keep the whole feed in memory. For very large feeds, also avoid collecting validation errors, skipped rows, or transformed payloads in a single array. Write them to a table, file, or log stream.

chunk() vs cursor(): different memory contracts

In Laravel-style applications, chunk() and cursor() are often discussed as if one is simply “better” than the other. That misses the production trade-off. They offer different runtime behavior.

Approach	Query pattern	Memory behavior	Latency per item	Failure recovery	Best fit
get()	One large query	High, all rows loaded	Low after load	Poor for large sets	Small bounded result sets
chunk()	One query per batch	Medium, one batch loaded	Batch-level	Good with checkpoints	Batch writes, imports, bulk updates
chunkById()	One query per ID range	Medium, one batch loaded	Batch-level	Better under changing data	Large tables with stable numeric keys
cursor()	Streamed result	Low per row	Row-level	Needs careful checkpointing	Read-heavy iteration, exports, simple transforms

chunk() limits memory only if each batch is allowed to die. It does not help if the code appends every model to $allRows, keeps relations loaded, or stores closures that capture batch data.

<?php

// Risky: chunking the query, but accumulating everything anyway.
$allUsers = [];

User::query()->chunk(1000, function ($users) use (&$allUsers) {
    foreach ($users as $user) {
        $allUsers[] = buildReportRow($user);
    }
});

A better approach writes each batch to its destination and then releases it.

<?php

User::query()
    ->select(['id', 'email', 'created_at'])
    ->chunkById(1000, function ($users) use ($writer) {
        foreach ($users as $user) {
            $writer->write(buildReportRow($user));
        }

        unset($users);
    });

Use cursor() when the work is naturally row-by-row and you do not need batch-level writes. Be careful with lazy iteration combined with eager relationship access. A cursor can keep memory low for the main query while still causing many additional queries or object allocations if each row loads more data.

Queues: memory leaks survive between jobs

Queue workers are long-lived by design. That makes them faster than booting the framework for every job, but it also means state can survive longer than expected.

Common sources of queue memory growth include:

Large payloads stored on job objects
Services that keep processed items in properties
Static caches that are never cleared
Image, PDF, spreadsheet, or XML libraries retaining buffers
ORM models with loaded relations kept beyond the job
Logging handlers buffering records in memory
Dependency injection singletons storing per-job state

A safer job keeps the payload small and reloads the required data inside handle().

<?php

final class ImportCustomerRowJob
{
    public function __construct(
        public readonly int $importId,
        public readonly int $rowNumber
    ) {}

    public function handle(CustomerImporter $importer): void
    {
        $row = ImportRow::query()
            ->where('import_id', $this->importId)
            ->where('row_number', $this->rowNumber)
            ->firstOrFail();

        $importer->import($row->payload);

        unset($row);
    }
}

Operational limits still matter. A worker should have a maximum number of jobs or a memory cap so it can restart before fragmentation or retained state becomes harmful. That is not a substitute for fixing leaks, but it is a reasonable safety boundary for production.

Cron jobs: split work by checkpoint, not by hope

Cron scripts often start as simple commands and slowly become data pipelines. The dangerous version is a nightly command that tries to process everything in one run and only succeeds when data volume is low.

A more stable pattern is checkpointed processing:

Select a bounded range of work.
Process it in batches.
Persist progress.
Exit cleanly.
Let the next run continue.

<?php

$lastId = (int) $state->get('reports.last_order_id', 0);

Order::query()
    ->where('id', '>', $lastId)
    ->orderBy('id')
    ->chunkById(500, function ($orders) use ($state) {
        foreach ($orders as $order) {
            appendOrderToReport($order);
            $state->set('reports.last_order_id', $order->id);
        }
    });

This improves failure recovery and reduces the pressure to complete everything in one process. It also makes memory testing easier because the command has a predictable unit of work.

How to investigate a suspected leak

Use a repeatable process before rewriting the job.

1. Reproduce with production-shaped input

Small fixtures rarely expose memory problems. Use anonymized or synthetic data that has similar row counts, string sizes, relation depth, and error rates.

2. Log memory at stable boundaries

Measure before and after every batch, not inside every row. Track both current and peak memory.

3. Remove accumulation points

Search for arrays that grow during the whole process:

$results[]
$errors[]
$models[]
$payloads[]
report rows collected before writing
imported rows saved for a final summary

Replace them with streaming writes, counters, or persisted diagnostics.

4. Reduce model weight

For bulk processing, avoid loading more data than needed. Select explicit columns, avoid unnecessary relations, and prefer scalar payloads when the ORM model lifecycle adds no value.

5. Restart long-lived workers intentionally

Use process boundaries as operational protection. A clean restart after a bounded number of jobs is easier to reason about than a worker that is expected to survive indefinitely.

Practical defaults for production PHP workloads

No universal batch size exists. The right value depends on row width, model complexity, database latency, downstream APIs, and write strategy. Still, these defaults are safe starting points:

Prefer streaming file reads over loading full files.
Use chunkById() for large mutable database tables with stable numeric keys.
Use cursor() for simple read streams where per-row processing is cheap.
Write report output incrementally instead of building it in memory.
Store import errors externally when the error list can grow.
Keep queue job payloads small.
Add memory logs to long-running commands by default.
Set worker restart limits as a production safety measure.

For engineers who work with PHP and Laravel in production, memory behavior in workers, imports, and batch processing is a senior-level operational concern. The Senior Laravel Developer certification is the most relevant DevCerts track to review if this is part of your day-to-day backend work.

Conclusion

PHP memory limit failures in imports, queues, and cron tasks are usually lifecycle problems, not configuration problems. The fix is to make memory bounded by design: stream files, process database records in controlled batches, avoid hidden accumulation, checkpoint long-running work, and restart workers intentionally.

chunk() and cursor() are both useful, but neither protects you from retaining data yourself. The production-grade approach is to define the unit of work, measure memory around it, and make sure the process returns to a stable baseline. Once that behavior is predictable, memory_limit becomes a guardrail again, not the only thing standing between your job and failure.