Safe User Input in JavaScript: XSS, Regex Injection, and HTML Escaping

User input is not dangerous because it is user input. It becomes dangerous when a system places it into the wrong execution context. This article explains how to handle search, comments, admin screens, rich text, and dangerouslySetInnerHTML without turning convenience into an injection surface.

JavaScript

User input security in JavaScript is often reduced to one vague rule: sanitize everything. That advice is too imprecise for production systems. A search query, a comment body, an admin-controlled label, and rich text content all require different treatment because they end up in different execution contexts.

The practical rule is stricter and more useful: treat input as data until the exact output context is known. Validation decides whether data is acceptable. Escaping decides how data can be safely represented. Sanitization decides which parts of structured content, usually HTML, are allowed to survive. Mixing those concerns is how teams accidentally ship XSS, regex injection, broken previews, and unsafe dangerouslySetInnerHTML usage.

The real problem: context changes the risk

The same string can be safe in one place and unsafe in another.

<script>alert(1)</script> is just text in a database column. It becomes a problem when rendered as HTML. A search string like .* is harmless in a plain substring match but dangerous when passed directly into new RegExp(). An admin-provided banner title may feel trusted, but if an attacker gets admin access through another route, that field can become a stored XSS payload.

Input should not be classified only by who entered it. It should be classified by where it will execute, render, match, or compose.

In JavaScript applications, the most common risky contexts are:

HTML body content
HTML attributes
URLs
inline styles
JavaScript strings
regular expressions
SQL or NoSQL queries built server-side
Markdown or rich text converted to HTML

This article focuses on the browser-facing JavaScript layer, but the same boundary thinking applies to backend rendering, admin panels, APIs, queues, and notification templates.

XSS is usually an output bug, not an input bug

Cross-site scripting usually happens when untrusted data is interpreted as executable markup or script. The instinct is often to clean the value when it enters the system. That can help in narrow cases, but it is not enough as a general strategy.

A user name may be rendered in a React component, exported into CSV, inserted into an email template, and shown in an admin audit log. Each output has a different escaping rule. If the input is destructively modified too early, you may lose valid data while still failing to protect another output context.

Unsafe pattern: composing HTML with user input

function renderSearchLabel(query) {
  return `<p>Results for <strong>${query}</strong></p>`;
}

container.innerHTML = renderSearchLabel(new URLSearchParams(location.search).get("q"));

This is unsafe because query is treated as HTML. If the value contains markup, the browser parses it as markup. The bug is not that the query came from the URL. The bug is that the query crossed from data into HTML without context-aware escaping.

Safer pattern: assign text as text

function renderSearchLabel(query) {
  const p = document.createElement("p");
  const prefix = document.createTextNode("Results for ");

  const strong = document.createElement("strong");
  strong.textContent = query ?? "";

  p.append(prefix, strong);
  return p;
}

container.replaceChildren(renderSearchLabel(new URLSearchParams(location.search).get("q")));

Using textContent tells the browser to handle the value as text, not markup. In React, normal interpolation follows the same principle for text nodes:

function SearchHeading({ query }) {
  return <p>Results for <strong>{query}</strong></p>;
}

This is safe for text rendering because React escapes text content by default. It does not mean every React rendering path is safe. The guarantee changes when you opt into raw HTML.

Search: protect both the DOM and the regex engine

Search screens are a common source of subtle injection issues because user input is often used twice: once to query or filter data, and once to highlight matched text.

A typical mistake is to build a regex directly from the query:

function highlight(text, query) {
  const pattern = new RegExp(query, "gi");
  return text.replace(pattern, match => `<mark>${match}</mark>`);
}

There are two separate problems here.

First, query is interpreted as a regular expression. A user who searches for [ can break the regex. A user who enters a complex pattern may trigger excessive CPU usage depending on the pattern and data size.

Second, the function returns HTML. If the highlighted text is later assigned with innerHTML, the output must be safe as HTML, not just correct as text.

Escape regex syntax before building a dynamic pattern

function escapeRegExp(value) {
  return value.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
}

function findMatches(text, query) {
  const safeQuery = escapeRegExp(query.trim());

  if (!safeQuery) {
    return [];
  }

  return [...text.matchAll(new RegExp(safeQuery, "gi"))];
}

Escaping regex metacharacters makes the query behave like a literal search string. For many search UIs, a plain includes() comparison is even simpler and avoids regex entirely.

For highlighting, prefer rendering structured nodes rather than returning HTML strings. In React:

function HighlightedText({ text, query }) {
  const safeQuery = query.trim();

  if (!safeQuery) {
    return text;
  }

  const lowerText = text.toLowerCase();
  const lowerQuery = safeQuery.toLowerCase();
  const index = lowerText.indexOf(lowerQuery);

  if (index === -1) {
    return text;
  }

  return (
    <>
      {text.slice(0, index)}
      <mark>{text.slice(index, index + safeQuery.length)}</mark>
      {text.slice(index + safeQuery.length)}
    </>
  );
}

This avoids both regex injection and raw HTML composition. It also makes the rendering logic easier to test because the output is component structure, not a string that must later be interpreted by the browser.

Comments: store original text, escape on output

Comment systems are a classic stored XSS target. The payload is saved once and executed later for every reader. That makes comments more dangerous than reflected search input because the attacker does not need to convince each victim to open a crafted URL.

For plain text comments, the production-grade approach is usually:

Validate length and shape at the API boundary.
Store the original text.
Render it as text, not HTML.
Convert newlines to visual line breaks at render time.
Apply rate limits and moderation separately from escaping.

Avoid saving already-escaped HTML as the canonical value. It creates double-escaping bugs, makes search worse, and couples storage to one presentation format.

function CommentBody({ body }) {
  return (
    <p className="comment-body">
      {body.split("\n").map((line, index) => (
        <Fragment key={index}>
          {index > 0 && <br />}
          {line}
        </Fragment>
      ))}
    </p>
  );
}

The key detail is that each line is still rendered as text. The <br /> elements are controlled by the application, not by the user.

Admin screens are not automatically trusted

Teams often treat admin input as safe because only internal users can access it. That is a weak assumption. Admin accounts can be phished, internal tools can be exposed, roles can be overbroad, and support users may paste content received from customers.

Admin panels often write to high-impact locations:

marketing banners
transactional email templates
CMS pages
feature flag descriptions
product configuration
customer support notes
audit logs

A stored XSS bug in an admin-managed field may execute for every user, not just for admins. The safer default is to apply the same output rules to admin data as to customer data. Use permissions to control who can write. Use escaping and sanitization to control how the value renders.

Rich text: escaping is not enough

Plain text should be escaped. Rich text is different because some markup is expected to survive. A comment field should not allow HTML. A product description editor might allow paragraphs, links, bold text, lists, and code blocks.

For rich text, escaping everything destroys the feature. Trusting everything creates XSS. The correct control is allowlist-based sanitization.

Input type	Storage format	Render strategy	Runtime risk	Operational note
Search query	Raw string	Text rendering, escaped regex if needed	Low when not used as HTML	Prefer substring search unless regex is required
Plain comment	Raw string	Text nodes, controlled line breaks	Medium if rendered incorrectly	Stored XSS risk if later moved to HTML
Admin label	Raw string	Text nodes or escaped attributes	Medium	Do not rely on role trust as a security boundary
Rich text	HTML, Markdown, or editor JSON	Sanitize to an allowlist before rendering	Higher	Needs policy tests and regression coverage
Raw HTML embed	HTML	Isolated and strictly controlled	High	Avoid for general users and most admins

A sanitizer should remove unsafe elements and attributes rather than trying to detect attacks by pattern. Blocklists age poorly because browser behavior, HTML parsing, SVG, URL schemes, and attribute interactions are difficult to model with ad hoc string replacement.

For rich text, define a policy explicitly:

allowed tags, such as p, strong, em, ul, ol, li, code, pre, a
allowed attributes, such as href on links
allowed URL schemes, usually http, https, and sometimes mailto
whether images, iframes, inline styles, SVG, and embeds are allowed
whether sanitization happens on write, on read, or both

Sanitizing on write makes stored content safer and reduces repeated work. Sanitizing on read protects old content when the policy changes. In higher-risk systems, doing both can be reasonable, with the original editor document stored separately from the rendered HTML.

`dangerouslySetInnerHTML`: treat it as a boundary, not a shortcut

React named dangerouslySetInnerHTML accurately. The problem is not that it always causes XSS. The problem is that it bypasses React’s normal text escaping and tells the browser to parse a string as HTML.

That means the safety of this code depends entirely on the source and sanitization of html:

function ArticlePreview({ html }) {
  return <div dangerouslySetInnerHTML={{ __html: html }} />;
}

A safer implementation makes the boundary visible:

function SafeRichText({ sanitizedHtml }) {
  return (
    <article
      className="rich-text"
      dangerouslySetInnerHTML={{ __html: sanitizedHtml }}
    />
  );
}

The naming matters. html is ambiguous. sanitizedHtml documents the expectation and makes code review more precise. It should still be backed by tests and by a real sanitizer in the data flow, not by naming alone.

A practical review checklist for dangerouslySetInnerHTML:

Where was the HTML produced?
Was it sanitized with an allowlist policy?
Are links restricted to safe URL schemes?
Are scripts, event handlers, inline styles, SVG, and iframes handled intentionally?
Can admin users, customers, imports, or integrations modify this content?
Is there a test fixture for common XSS payload shapes?
Is this component isolated from privileged UI state?

The safest code is often the code that does not need raw HTML at all. Use normal JSX rendering for plain text, structured content, and known UI fragments. Reserve raw HTML rendering for content that is truly authored as rich text.

What to adopt first

Security improvements work best when they become routine engineering constraints, not special cleanup projects. A practical adoption order looks like this:

Remove innerHTML and dangerouslySetInnerHTML where plain text rendering is enough.
Escape user input before using it in dynamic regular expressions.
Standardize rich text through one sanitizer policy instead of scattered helpers.
Add tests for search highlighting, comments, admin-rendered fields, and rich text rendering.
Name variables by trust state, such as rawComment, plainText, sanitizedHtml, and trustedTemplate.
Review admin workflows as potential stored XSS entry points.
Keep validation, escaping, and sanitization as separate steps in code.

For engineers who work deeply with browser security, rendering boundaries, and production JavaScript code, the Senior JavaScript Developer certification is the most relevant DevCerts track to review.

Conclusion

Safe user input handling in JavaScript is not a single function call. It is a set of boundaries: validate at entry, preserve data honestly, escape for the output context, sanitize only when structured markup is allowed, and treat raw HTML rendering as an explicit risk.

Search queries, comments, admin fields, rich text, and dangerouslySetInnerHTML fail in different ways. The common mistake is treating them as the same problem. The production approach is to make the context visible in code and make unsafe transitions hard to hide in reviews. That gives teams a system that is easier to maintain, easier to test, and less likely to turn routine product features into injection surfaces.