Lesson 4
Common HTML Entity Mistakes
Avoid double encoding, decode-as-sanitize, and whitespace bugs.
Entity bugs often look like "wrong character on the page" but come from pipeline ordering or invisible characters.
Double encoding
Encoding already-escaped content is the most frequent issue. Symptoms:
- Users see
©instead of © - Search finds the literal string
<in rendered text
Always know whether your input is raw user text or pre-escaped storage.
Treating decode as sanitize
Decoding <script>alert(1)</script> produces <script>alert(1)</script> as text—but inserting that into HTML without sanitization is still dangerous if your renderer interprets tags.
Decode for inspection and editing. Sanitize separately before rendering untrusted HTML.
Confusing space and nbsp
is a non-breaking space (U+00A0), not the same as a normal space (U+0020). Layout bugs, copy-paste mismatches, and string comparisons fail silently when these are mixed.
Use a preview that visualizes nbsp and tabs when debugging.
Wrong entity for the context
Encoding < is essential in HTML. Encoding every ASCII letter "just to be safe" creates unreadable CMS fields and breaks full-text search. Escape what the output format requires—no more.
Invalid numeric entities
Malformed references such as � or incomplete &#xZZ; should fail loudly during batch processing instead of being silently passed through.