Lesson 2
Literals and Metacharacters
Character classes, quantifiers, anchors, and escaping.
Most regex syntax falls into two buckets: literals (match themselves) and metacharacters (special symbols with meaning).
Literals and escaping
Letters and digits usually match themselves: cat matches the substring cat.
Metacharacters such as . * + ? [ ] ( ) { } | ^ $ \ need care:
.matches any one character (except line terminators in some modes)- To match a literal dot, escape it:
\.
When in doubt, use a character class for punctuation: [.] matches a dot without escaping debates.
Character classes
[abc] matches one character from the set a, b, or c.
Useful shorthands (JavaScript-style):
| Class | Meaning |
|---|---|
\d | Digit [0-9] |
\w | “Word” char (engine-specific; often [A-Za-z0-9_] ) |
\s | Whitespace |
\D, \W, \S | Negated versions |
Ranges: [a-z] for lowercase letters. Put - first or last if you need a literal hyphen inside a class.
Quantifiers
Repeat the previous atom:
| Quantifier | Meaning |
|---|---|
* | Zero or more |
+ | One or more |
? | Zero or one |
{3} | Exactly 3 |
{2,5} | Between 2 and 5 |
{2,} | Two or more |
Greedy vs lazy: + consumes as much as possible; +? consumes as little as possible. This matters when parsing delimited fields.
Anchors
| Anchor | Meaning (typical) |
|---|---|
^ | Start of input or line (with multiline flag) |
$ | End of input or line |
\b | Word boundary |
^https:// ensures the URL starts at the beginning (or line start), not in the middle of a sentence.
Grouping
( ... ) groups atoms so quantifiers apply to the group: (ab)+ matches ab, abab, etc.
Key takeaway
Build patterns from small tested pieces: anchor + class + quantifier. Complex patterns become readable when each segment has one job.