DevCerts logo DevCerts

JavaScript async/await Mistakes That Cause Race Conditions and Lost Errors

async/await makes asynchronous JavaScript easier to read, but it does not make concurrency safe. The real risks appear in batch operations, retries, timeouts, abort handling, and error aggregation.

JavaScript
JavaScript async/await Mistakes That Cause Race Conditions and Lost Errors

async/await solved one readability problem in JavaScript, but it did not remove the hard parts of asynchronous programming. Production failures still come from unclear ownership of concurrency: promises started but never awaited, Promise.all used without understanding fail-fast behavior, retries that keep running after cancellation, and batch jobs that lose half their errors.

The central issue is not syntax. It is lifecycle control. In real systems, every asynchronous operation needs an owner, a timeout policy, a cancellation path, and a decision about how errors are reported. Without those boundaries, async/await can make unsafe code look clean.

The misconception: async code that looks sequential is not necessarily safe

An async function pauses at await, but any promise created before that point may already be running. This matters when code starts multiple operations, mutates shared state, or assumes that errors will naturally bubble up.

A common example is using forEach with an async callback:

// Bad: errors are not awaited by the caller, and completion is not controlled.
async function updateUsers(users) {
  const updated = [];

  users.forEach(async (user) => {
    const profile = await fetchProfile(user.id);
    updated.push(profile);
  });

  return updated;
}

This function returns before the async callbacks complete. Errors inside the callback are not handled by updateUsers, and the returned array is likely incomplete. The code reads like a batch operation, but it has no batch lifecycle.

A safer version makes concurrency explicit:

async function updateUsers(users) {
  const profiles = await Promise.all(
    users.map((user) => fetchProfile(user.id))
  );

  return profiles;
}

This is still concurrent, but now the caller owns completion and failure. The important change is not Promise.all itself. The important change is that every promise is returned to a parent operation.

Async code should have a visible ownership tree: if an operation starts work, it must also define how that work finishes, fails, times out, or gets cancelled.

Promise.all is not a batch error reporting mechanism

Promise.all is often the right primitive for dependent batch work, but it has one behavior teams frequently underestimate: it rejects as soon as one input promise rejects. Other promises may still be running, but the caller receives only the first rejection.

That behavior is useful when one failure invalidates the whole result. It is risky when each item should produce its own outcome.

Pattern

Failure behavior

Result shape

Failure isolation

Best fit

Sequential for...of with await

Stops unless caught manually

One item at a time

High

Rate-limited APIs, ordered workflows

Promise.all

Rejects on first rejection

Single success array or one error

Low

All-or-nothing work

Promise.allSettled

Waits for all promises

Per-item status records

High

Batch imports, notifications, cleanup jobs

Controlled concurrency pool

Depends on implementation

Per-item or aggregate

Medium to High

Large batches, external service limits

For batch operations, Promise.allSettled is often more honest:

async function sendBatchNotifications(users) {
  const results = await Promise.allSettled(
    users.map((user) => sendNotification(user))
  );

  return results.map((result, index) => {
    const user = users[index];

    if (result.status === "fulfilled") {
      return {
        userId: user.id,
        ok: true,
        messageId: result.value.id,
      };
    }

    return {
      userId: user.id,
      ok: false,
      error: result.reason instanceof Error
        ? result.reason.message
        : String(result.reason),
    };
  });
}

This does not make failures disappear. It makes them observable. That distinction is critical for job dashboards, retry queues, audit logs, and customer support workflows.

Race conditions usually come from shared state, not from Promise.all itself

Promise.all does not create a race condition by default. A race condition appears when concurrent operations read or write shared state in an order the program depends on.

This is risky:

// Bad: concurrent writes depend on completion order.
async function calculateTotal(orderIds) {
  let total = 0;

  await Promise.all(
    orderIds.map(async (id) => {
      const order = await loadOrder(id);
      total += order.amount;
    })
  );

  return total;
}

In many JavaScript environments this may appear to work because execution resumes on the event loop, but the design is still fragile. It mixes data fetching with mutation. The safer approach separates concurrent I/O from deterministic aggregation:

async function calculateTotal(orderIds) {
  const orders = await Promise.all(
    orderIds.map((id) => loadOrder(id))
  );

  return orders.reduce((sum, order) => sum + order.amount, 0);
}

The same rule applies to maps, caches, counters, progress tracking, and write buffers. Collect results first, then reduce them in a predictable step. When shared state is unavoidable, isolate the mutation behind a single writer or a queue.

Retries can amplify failures when they ignore cancellation

Retries are useful for transient failures, but they can also make outages worse. A retry loop that does not respect abort signals may keep sending requests after the caller has given up. A retry loop without per-attempt timeout may wait forever on one stuck operation. A retry loop without error classification may repeat permanent failures.

A production-grade retry policy should answer four questions:

  1. Which errors are retryable?

  2. How many attempts are allowed?

  3. Does each attempt have a timeout?

  4. What happens if the parent operation is aborted?

A compact retry helper can make these decisions explicit:

async function retry(task, {
  attempts = 3,
  shouldRetry = () => true,
  delayMs = () => 0,
  signal,
} = {}) {
  let lastError;

  for (let attempt = 1; attempt <= attempts; attempt += 1) {
    if (signal?.aborted) {
      throw new Error("Operation aborted");
    }

    try {
      return await task({ attempt, signal });
    } catch (error) {
      lastError = error;

      const canRetry = attempt < attempts && shouldRetry(error);
      if (!canRetry) {
        throw error;
      }

      const wait = delayMs(attempt);
      if (wait > 0) {
        await sleep(wait, signal);
      }
    }
  }

  throw lastError;
}

function sleep(ms, signal) {
  return new Promise((resolve, reject) => {
    const timer = setTimeout(resolve, ms);

    signal?.addEventListener("abort", () => {
      clearTimeout(timer);
      reject(new Error("Operation aborted"));
    }, { once: true });
  });
}

This avoids a common lost-error problem: returning a generic retry failure while discarding the actual last error. In more advanced implementations, you may want to preserve the original error as cause, record all attempts, or classify errors into retryable and non-retryable categories.

Timeout and abort are separate concerns

A timeout is a local deadline. An abort signal is a cancellation mechanism. They often work together, but they are not the same thing.

Timeout answers: “How long should this operation be allowed to run?”

Abort answers: “Who can cancel this operation from the outside?”

Combining both gives the caller control while preventing stuck work:

async function fetchJsonWithTimeout(url, {
  timeoutMs = 5000,
  signal: parentSignal,
} = {}) {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), timeoutMs);

  const abortFromParent = () => controller.abort();

  if (parentSignal) {
    if (parentSignal.aborted) {
      controller.abort();
    } else {
      parentSignal.addEventListener("abort", abortFromParent, { once: true });
    }
  }

  try {
    const response = await fetch(url, { signal: controller.signal });

    if (!response.ok) {
      throw new Error(`HTTP request failed with status ${response.status}`);
    }

    return await response.json();
  } finally {
    clearTimeout(timeout);
    parentSignal?.removeEventListener("abort", abortFromParent);
  }
}

The finally block matters. Without cleanup, timeout handlers and abort listeners can accumulate across many calls. That may not be visible in a unit test, but it becomes relevant in long-lived browser sessions, Node.js workers, job processors, and services handling sustained traffic.

Batch operations need an error contract

The biggest mistake in batch async code is treating errors as an implementation detail. In production, a batch operation needs a result contract. Consumers should know whether the operation is all-or-nothing, partial success, or best-effort.

A practical batch result often includes:

  • stable item identifier

  • success flag

  • normalized error code or message

  • retryable flag

  • attempt count

  • optional raw diagnostic detail for logs, not user output

This makes downstream behavior easier to test. It also prevents ambiguous states such as “the job failed, but some records were written” without a record of which ones.

For example, an import job should not only throw Import failed. It should say which rows succeeded, which rows failed validation, which rows failed due to external dependencies, and which failures can be retried safely.

What to adopt first

Teams do not need a large async framework to avoid most of these problems. They need a few consistent conventions.

Start with these:

  • Never start a promise inside a function unless it is returned, awaited, stored with clear ownership, or intentionally detached with logging.

  • Use Promise.all only when first failure should fail the whole operation.

  • Use Promise.allSettled or a structured result type when partial success matters.

  • Avoid mutating shared state from concurrent callbacks. Collect, then reduce.

  • Add timeouts to external I/O, especially network calls and service-to-service requests.

  • Pass cancellation signals through retry loops and nested async helpers.

  • Log or return enough context to identify which item failed in a batch.

The testing strategy should match the failure modes. Test partial failures, timeout behavior, cancellation during retry delay, and mixed success in batch operations. These tests often catch more production-relevant bugs than simple “happy path” async tests.

For engineers who work with JavaScript concurrency and production async flows regularly, the Senior JavaScript Developer certification is the most relevant DevCerts track to review.


Conclusion

async/await improves readability, but it does not define safe concurrency by itself. The hard parts are still ownership, isolation, cancellation, timeout policy, and error contracts.

In real systems, the difference between reliable and fragile async code is usually visible in the edges: what happens when one item fails, when the caller disconnects, when an external service hangs, when a retry should stop, or when a batch partially succeeds. Make those behaviors explicit, and async/await becomes a tool for clarity rather than a way to hide uncontrolled concurrency.