The regex patterns every backend developer should bookmark
20 reusable regex patterns for emails, URLs, UUIDs, JWTs, semver, ISO datetimes, IPs and more — each with the edge case that bites you.
The regex patterns every backend developer should bookmark
I have a ~/notes/regex.md file I've been adding to for about a decade. Every time I rewrite the same pattern from scratch, I add a line. Here's the curated version, with the edge case that bit me for each one.
A note before we start: regex is a screwdriver, not a swiss army knife. If you're validating an email to send a verification link, just send the link. If you're parsing HTML, use a parser. Use these patterns where structure is genuinely regular — IDs, dates, prefixes, log lines.
All patterns assume PCRE / ECMAScript flavor unless noted. Test in the Regex Tester before shipping.
Identifiers and tokens
1. UUID (any version, RFC 4122 + 9562)
^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-8][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$
The [1-8] enforces versions 1-8 (v7 and v8 are now standard per RFC 9562). The [89abAB] enforces the RFC 4122 variant bits. Edge case: the nil UUID 00000000-0000-0000-0000-000000000000 won't match, and that's usually what you want.
2. UUID v4 only (when you want to reject v1 timestamp UUIDs)
^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-4[0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$
3. UUID v7 only (sortable, timestamp-prefixed)
^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-7[0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$
4. JWT shape (three base64url segments)
^[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]*$
Note the trailing * not + — unsigned JWTs (alg:none) have an empty signature segment. This is shape validation; it does not verify the signature. Use the JWT Decoder for actual inspection.
5. Semantic version (official, from semver.org)
^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$
Yes, it's long. It rejects 01.2.3 (leading zero) and accepts 1.0.0-alpha.1+build.42. Don't try to simplify this; the official one handles every edge case the spec defines.
6. Loose semver (when you control the input)
^\d+\.\d+\.\d+$
Time
7. ISO 8601 date (YYYY-MM-DD)
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
This rejects 2025-13-01 but happily accepts 2025-02-30. Calendar correctness is not a regex job; pass it to Date.parse or time.Parse and check for errors.
8. ISO 8601 datetime with timezone
^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?(Z|[+-]\d{2}:\d{2})$
I match the T literally. Some producers emit a space — that's RFC 3339 §5.6 NOTE, but if you accept it, document it.
9. Unix timestamp (10 digits, seconds, post-2001)
^1\d{9}$
For milliseconds: ^1\d{12}$. If you're mixing seconds and millis in the same column, you have a different problem; the Unix Timestamp Converter helps you eyeball which is which.
Network
10. IPv4 (strict)
^((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)\.){3}(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)$
The naive \d{1,3}(\.\d{1,3}){3} matches 999.999.999.999. Use the strict one in validation paths.
11. IPv6 (lenient — covers compressed forms)
^(([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|::([0-9a-fA-F]{1,4}:){0,6}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,6}(:[0-9a-fA-F]{1,4}){1})$
This does not handle every dual-form IPv4-mapped address. For correctness, parse with net.ParseIP (Go) or ipaddress (Python). For grep-ing log files, this is good enough.
12. URL (pragmatic)
^https?:\/\/[^\s/$.?#].[^\s]*$
This rejects http://, accepts https://example.com/path?q=1. It does not validate that the host resolves. For strict validation, the WHATWG URL parser (new URL(s) in modern JS, urllib.parse in Python) is the only correct answer.
13. Email (HTML5 spec)
^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$
This is the regex the HTML5 spec uses for <input type="email">. It rejects some technically-valid RFC 5321 addresses (quoted local parts) and accepts some that won't deliver. That tradeoff is usually fine.
Strings, codes, colors
14. Hex color (#fff, #ffffff, #ffffffff)
^#([0-9a-fA-F]{3}|[0-9a-fA-F]{4}|[0-9a-fA-F]{6}|[0-9a-fA-F]{8})$
The 4 and 8 forms include alpha.
15. Base64 (standard, padded)
^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$
For URL-safe (used in JWT): swap +/ for -_ and drop the padding rule.
16. Slug (URL-friendly)
^[a-z0-9]+(?:-[a-z0-9]+)*$
Rejects double-hyphens, leading/trailing hyphens, uppercase. Pair with the Slugify tool to generate compliant ones from arbitrary input.
17. MAC address (colon or hyphen)
^([0-9A-Fa-f]{2}[:-]){5}[0-9A-Fa-f]{2}$
18. Phone E.164
^\+[1-9]\d{1,14}$
E.164 caps at 15 digits including country code. Anything fancier and you need libphonenumber.
Files and paths
19. Filename without traversal (allowlist style)
^[a-zA-Z0-9._-]+$
Use this and check that the resolved path stays inside your intended directory. Regex alone is not path-traversal protection.
20. Common log line (Apache combined)
^(\S+) \S+ \S+ \[([^\]]+)\] "(\S+) ([^"]*) (\S+)" (\d{3}) (\d+|-)
Captures IP, time, method, path, protocol, status, size. Drop into awk -v FS=... or pipe to jq once you've converted.
A note on performance
Most of these are linear. Watch out for nested quantifiers like (a+)+b against aaaaaaaaaa — catastrophic backtracking is a real DoS vector. Go's regexp uses RE2 and won't backtrack, which is one of its quiet superpowers. Node, Python, PHP, and Ruby all use backtracking engines; profile any regex that touches user input.
If you'd rather describe what you want in English and get a tested pattern back, the AI Regex Generator is what I reach for these days when I need a pattern for some odd log format.
Try it
- Regex Tester — paste a pattern and a sample, see matches highlighted, runs in your browser
- AI Regex Generator — describe the format in English, get a pattern back