Lesson 3

Padding and Length

Why `=` appears at the end and how length relates to input bytes.

Base64 output length is predictable once you grasp six-bit grouping. Padding characters (=, sometimes double ==) communicate “this final sextet was synthesized from fewer than three trailing bytes”—no magic, only bookkeeping.

When padding appears

Input length modulo three drives everything:

Input bytes `% 3`	Padding needed
`0`	none
`1`	`==`
`2`	`=`

Why? Each output quantum encodes three bytes ⇒ four Base64 symbols. Partial quanta still emit four symbols, but trailing sextets that would exceed real bits are marked as padded so decoders truncate correctly.

Skipping padding sometimes seems to work because some tooling auto-infers—but cross-language APIs break when one side trims = aggressively and another requires canonical form.

What `=` is not

= is not part of the 64-symbol payload alphabet in classical Base64—it is metachar filler. Decoders discard or interpret it strictly as length signal; accidentally editing it (URL concatenation mishaps, copy/paste truncation) silently corrupts output length checks.

Libraries should reject invalid padding patterns (misplaced equals, trailing garbage after equals) rather than guessing.

Quick mental model with bit counts

Every three octets ⇒ four alphabet symbols.
Extra byte? You still emit four symbols; two of them collectively carry padding semantics.
Two extra bytes yield = once; lone leftover byte yields ==.

Seeing the modulus table beats memorizing bitwise tables for debugging.

Line wrapping and MIME

Historic email encoders inserted newline breaks every 76-ish characters to survive ancient terminals. Plain Base64 APIs often omit breaks; PEM blocks deliberately wrap fixed column width.

If you concatenate lines without stripping \n/\r before decoding, many strict decoders choke. Conversely, some parsers ignore whitespace everywhere—know your stack’s stance.

Example shape (conceptual only)

Imagine encoding a single ASCII letter a (0x61). You emit four printable characters ending with == in standard encodings—libraries show the literal string clearly in docs; verifying with your language’s primitive is worthwhile once.

Never hand-edit padding to “optimize” URLs—URL-safe dialects preserve =-less forms only when explicitly standardized (see next lesson).

Length checks in security-sensitive code

Comparing Base64 ciphertext or signatures? Normalized length mismatches expose:

truncation bugs
double-encoding mistakes
client-side truncation in query strings

Treat Base64 payloads as opaque strings plus strict validation.

Key takeaway

Padding exists because Base64 insists on emitting four output symbols per processing chunk, even when fewer than twenty-four real input bits remain. Respect canonical padding when APIs demand it—and strip/normalize whitespace according to documented rules, not folklore.

← Back to course overview