Lesson 1
What Is a Regular Expression?
Patterns as search programs and where regex appears in development.
A regular expression (regex) is a compact language for describing text patterns. Instead of searching for an exact string like error, you search for a shape: digits, quoted values, email-like tokens, or log lines that match a template.
Patterns, not plain text
Think of regex as a tiny program run by a regex engine:
- You provide a pattern (the regex)
- You provide input text (a string, file excerpt, or buffer)
- The engine returns matches—substrings that fit the pattern—and optionally capture groups
Example intuition: \d{4}-\d{2}-\d{2} describes “four digits, hyphen, two digits, hyphen, two digits”—a common ISO date shape. It is not the same as searching for one fixed date.
Where developers use regex
Common real-world uses:
- Validation hints — email, phone, slug formats (often combined with stronger validators)
- Log parsing — extract timestamps, status codes, or request IDs from unstructured lines
- Refactors — find-and-replace across many files with capture groups
- Syntax highlighting and linters — tokenize code (usually with carefully tuned patterns)
- Routing and rewrite rules — Apache/Nginx location blocks, some framework path rules
Regex excels at local structure in text. It is a poor fit for nested HTML, full JSON documents, or arbitrary programming languages—use parsers for those.
Engines differ
JavaScript’s RegExp, Python’s re, PCRE in PHP/Ri grep, and Rust’s regex crate do not behave identically. Differences include:
- Which metacharacters exist and how Unicode is handled
- Lookahead/lookbehind support
- Default greediness and line-ending rules
When you copy a pattern from another language, re-test it in your target engine.
Key takeaway
A regex describes families of strings, not one string. The next lessons build the vocabulary—literals, classes, quantifiers, flags, and groups—that those families are made from.