Lesson 4

Common HTML Entity Mistakes

Avoid double encoding, decode-as-sanitize, and whitespace bugs.

Entity bugs often look like "wrong character on the page" but come from pipeline ordering or invisible characters.

Double encoding

Encoding already-escaped content is the most frequent issue. Symptoms:

  • Users see © instead of ©
  • Search finds the literal string < in rendered text

Always know whether your input is raw user text or pre-escaped storage.

Treating decode as sanitize

Decoding &lt;script&gt;alert(1)&lt;/script&gt; produces <script>alert(1)</script> as text—but inserting that into HTML without sanitization is still dangerous if your renderer interprets tags.

Decode for inspection and editing. Sanitize separately before rendering untrusted HTML.

Confusing space and nbsp

&nbsp; is a non-breaking space (U+00A0), not the same as a normal space (U+0020). Layout bugs, copy-paste mismatches, and string comparisons fail silently when these are mixed.

Use a preview that visualizes nbsp and tabs when debugging.

Wrong entity for the context

Encoding < is essential in HTML. Encoding every ASCII letter "just to be safe" creates unreadable CMS fields and breaks full-text search. Escape what the output format requires—no more.

Invalid numeric entities

Malformed references such as &#9999999; or incomplete &#xZZ; should fail loudly during batch processing instead of being silently passed through.

When you want to practice, use the related DevCove tool — optional, not part of this lesson.

Open related tool

Back to course overview