Lección 4

Common HTML Entity Mistakes en español

Guía en español para html entity common html entity mistakes: Avoid double encoding, decode-as-sanitize, and whitespace bugs.

Este contenido todavía no está disponible en español. Se muestra la versión en English mientras completamos la localización.

Entity bugs often look like "wrong character on the page" but come from pipeline ordering or invisible characters.

Double encoding

Encoding already-escaped content is the most frequent issue. Symptoms:

  • Users see © instead of ©
  • Search finds the literal string < in rendered text

Always know whether your input is raw user text or pre-escaped storage.

Treating decode as sanitize

Decoding &lt;script&gt;alert(1)&lt;/script&gt; produces <script>alert(1)</script> as text—but inserting that into HTML without sanitization is still dangerous if your renderer interprets tags.

Decode for inspection and editing. Sanitize separately before rendering untrusted HTML.

Confusing space and nbsp

&nbsp; is a non-breaking space (U+00A0), not the same as a normal space (U+0020). Layout bugs, copy-paste mismatches, and string comparisons fail silently when these are mixed.

Use a preview that visualizes nbsp and tabs when debugging.

Wrong entity for the context

Encoding < is essential in HTML. Encoding every ASCII letter "just to be safe" creates unreadable CMS fields and breaks full-text search. Escape what the output format requires—no more.

Invalid numeric entities

Malformed references such as &#9999999; or incomplete &#xZZ; should fail loudly during batch processing instead of being silently passed through.

Volver al resumen del curso