Lesson 4

Common HTML Entity Mistakes

Avoid double encoding, decode-as-sanitize, and whitespace bugs.

Entity bugs often look like "wrong character on the page" but come from pipeline ordering or invisible characters.

Double encoding

Encoding already-escaped content is the most frequent issue. Symptoms:

Users see &copy; instead of ©
Search finds the literal string < in rendered text

Always know whether your input is raw user text or pre-escaped storage.

Treating decode as sanitize

Decoding <script>alert(1)</script> produces <script>alert(1)</script> as text—but inserting that into HTML without sanitization is still dangerous if your renderer interprets tags.

Decode for inspection and editing. Sanitize separately before rendering untrusted HTML.

Confusing space and nbsp

  is a non-breaking space (U+00A0), not the same as a normal space (U+0020). Layout bugs, copy-paste mismatches, and string comparisons fail silently when these are mixed.

Use a preview that visualizes nbsp and tabs when debugging.

Wrong entity for the context

Encoding < is essential in HTML. Encoding every ASCII letter "just to be safe" creates unreadable CMS fields and breaks full-text search. Escape what the output format requires—no more.

Invalid numeric entities

Malformed references such as &#9999999; or incomplete &#xZZ; should fail loudly during batch processing instead of being silently passed through.

← Back to course overview