Skip to main content
TF
10 min readArticle

What Is Regex and How Do You Write Your First Pattern

TF
ToolsFuel Team
Web development tools & tips
Terminal window displaying lines of programming code on a dark screen

Photo by Chris Ried on Unsplash

The Syntax That Scared Me Off for Two Years

The first time I saw a real-world regex pattern, I thought someone had sat on their keyboard.

``` ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ ```


That's an email validation pattern, by the way. I couldn't read it at all in 2021. I copy-pasted it from Stack Overflow, it worked, and I moved on with my life.


I did that for a couple years. Regex snippets from Stack Overflow, regex patterns from whatever library I was using, never really understanding them. Then one afternoon I needed to extract specific data from log files and there was no Stack Overflow answer that matched my exact format. I had to write the pattern from scratch.


I sat down and learned regex properly in about three hours. Turned out the syntax that looked like noise actually has a pretty consistent logic. Once you know maybe fifteen concepts, you can read and write most patterns you'll encounter. This post covers those fifteen concepts, with examples you'll actually recognize from real dev work.

What Regex Actually Is

Dark terminal screen with code running on a developer laptop

Photo by Goran Ivos on Unsplash

Regex (short for regular expression) is a pattern-matching language. You write a pattern, and the regex engine checks whether a string matches that pattern — or finds parts of a string that match.

It's not a programming language on its own. It's a notation that most programming languages support natively: JavaScript, Python, Ruby, Java, PHP, Go — they all have built-in regex support. The syntax is mostly the same across languages with a few small differences.


Here's what regex can do: - **Test** whether a string matches a pattern (`"does this string contain a phone number?"`) - **Extract** parts of a string (`"give me all the email addresses in this document"`) - **Replace** matched patterns (`"replace all whitespace with a single space"`) - **Split** a string on a pattern (split on any whitespace, not just spaces)


Every programming language gives you these operations via different method names but the same underlying idea. JavaScript has `.test()`, `.match()`, `.replace()`. Python has `re.search()`, `re.findall()`, `re.sub()`.


Before writing any pattern, it helps to think about what you're actually asking the regex engine to do. Am I checking if a string matches? Finding parts of it? Replacing something? That shapes how you write the pattern and what flags you'll need.

The Core Syntax: Literals, Character Classes, and Quantifiers

Code on a screen in a dark-lit room showing programming syntax

Photo by Ilya Pavlov on Unsplash

Most regex patterns combine three things: literals, character classes, and quantifiers. Learn these and you can read 80% of real-world patterns.

**Literals** — just characters that match themselves. The pattern `cat` matches the word "cat" literally. No magic. Most characters work this way.


**Character classes** — match any one of a set of characters: - `[abc]` — matches either a, b, or c - `[a-z]` — matches any lowercase letter - `[A-Za-z0-9]` — matches any letter or digit - `[^abc]` — the `^` inside a class means NOT — matches anything except a, b, or c


There are also built-in shorthand classes: - `\d` — any digit (same as `[0-9]`) - `\w` — any word character (letters, digits, underscore — same as `[A-Za-z0-9_]`) - `\s` — any whitespace (spaces, tabs, newlines) - `.` — any character except newline (use `[\s\S]` if you need to match newlines too) - Capital versions negate: `\D` = not a digit, `\W` = not a word char, `\S` = not whitespace


**Quantifiers** — control how many times something must match: - `?` — zero or one (optional) - `*` — zero or more - `+` — one or more - `{3}` — exactly 3 times - `{2,5}` — between 2 and 5 times - `{3,}` — 3 or more


So `\d+` means "one or more digits." `[A-Z]{2}` means "exactly two uppercase letters." `colou?r` matches both "color" and "colour" because the `u` is marked optional with `?`.


Put it together: `\d{3}-\d{4}` matches a phone number fragment like "555-1234". Three digits, a literal hyphen, four digits. You've already written a real pattern.

Anchors, Groups, and Alternation

Developer reading technical documentation on a bright monitor

Photo by Desola Lanre-Ologun on Unsplash

These three concepts are what take you from beginner to actually useful.

**Anchors** pin the match to a position in the string: - `^` — start of the string (outside a character class) - `$` — end of the string - `\b` — a word boundary (the transition between a word character and a non-word character)


`^hello` matches "hello world" but NOT "say hello." The ^ forces the match to start at the beginning of the string.


`hello$` matches "say hello" but NOT "hello world." The $ forces the match to end at the end.


Word boundaries are great for whole-word matching. The pattern `\bcat\b` matches "the cat sat" but not "catalog" or "concatenate."


**Groups** use parentheses to treat part of a pattern as a unit: - `(abc)+` — matches "abc" one or more times - `(foo|bar)` — alternation inside a group — matches either "foo" or "bar" - Groups also capture what they matched, which you can reference later


Capture groups are how you extract data. In JavaScript:


```javascript const date = "Today is 2026-04-11"; const match = date.match(/(\d{4})-(\d{2})-(\d{2})/); // match[1] = "2026", match[2] = "04", match[3] = "11" ```


Group 1 captured the year, group 2 the month, group 3 the day. That's how you pull structured data out of unstructured text.


If you don't need to capture — you just want to group for quantifier purposes — use a non-capturing group: `(?:abc)+`. Same behavior, no capture overhead.


**Alternation** with `|` is like a logical OR: - `cat|dog` matches either "cat" or "dog" - `(jpg|jpeg|png|gif)` matches any of those file extensions


This is how the end of that email pattern works: `\.[a-zA-Z]{2,}$` matches a dot followed by two or more letters at the end — which covers .com, .org, .io, .co.uk, etc.

Reading That Email Regex (Finally)

Programmer analyzing syntax and error messages on a monitor

Photo by AltumCode on Unsplash

Now let's actually break down that email pattern from the beginning:

``` ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ ```


Piece by piece:


- `^` — start of string - `[a-zA-Z0-9._%+-]+` — one or more of: letters, digits, dot, underscore, percent, plus, hyphen (the local part of the email, before @) - `@` — a literal @ symbol - `[a-zA-Z0-9.-]+` — one or more of: letters, digits, dot, hyphen (the domain name) - `\.` — a literal dot (the `\` escapes it, because `.` on its own means "any character") - `[a-zA-Z]{2,}` — two or more letters (the TLD — com, org, io, museum...) - `$` — end of string


See? It's not magic. Every piece is doing something specific.


The backslash escape is worth noting — in regex, certain characters have special meaning (`.`, `*`, `+`, `?`, `(`, `)`, `[`, `]`, `{`, `}`, `^`, `$`, `|`, `\`). If you want to match one of those characters literally, you put a backslash before it. `\.` matches a literal dot. `\(` matches a literal opening parenthesis.


Missing an escape is one of the most common regex bugs. If your pattern isn't working and it contains any of those special characters, check whether they should be escaped.


I'd suggest pulling up
MDN's RegExp documentation alongside this — it has a complete syntax reference and interactive examples you can run in the browser. Much better than a static cheat sheet because you can test patterns live.

You can also use ToolsFuel's free
text processing tools to run find-and-replace operations without writing code — useful when you need to clean up data but don't want to write a full script for a one-time task.

Flags, Greedy vs Lazy, and Common Dev Patterns

**Flags** modify how the regex engine runs:

In JavaScript: `/pattern/flags` — e.g., `/hello/gi`


- `i` — case-insensitive matching - `g` — global (find all matches, not just the first) - `m` — multiline (^ and $ match at each line boundary, not just the whole string) - `s` — dotAll mode (`.` matches newlines too)


Forget the `g` flag and `.replace()` will only replace the first match. That one's bitten me more times than I'd like to admit.


**Greedy vs lazy quantifiers.** By default, quantifiers are greedy — they match as much as possible. `.+` will match everything from the first character to the last. If you're trying to match an HTML tag with `<.+>`, you'll get everything from the first < to the last > in your string — including all the text in between.


Add a `?` after any quantifier to make it lazy — match as little as possible: - `.+?` — matches the shortest possible string - `.*?` — same, but allows zero-length matches


So `<.+?>` matches one tag at a time rather than gobbling everything up.


**Real patterns I use all the time:**


```javascript // Check if a string is a valid positive integer /^\d+$/.test(str)


// Extract hex color codes from CSS str.match(/#[0-9a-fA-F]{3,6}/g)


// Remove extra whitespace (multiple spaces → one space) str.replace(/\s+/g, ' ').trim()


// Check for valid slug format (lowercase, hyphens, no spaces) /^[a-z0-9]+(?:-[a-z0-9]+)*$/.test(slug)


// Extract all URLs from a string str.match(/https?:\/\/[^\s"'<>]+/g)


// Match an ISO date: 2026-04-11 /\d{4}-\d{2}-\d{2}/ ```


That slug pattern is worth studying: `^[a-z0-9]+(?:-[a-z0-9]+)*$`. It means "start with one or more lowercase letters/digits, optionally followed by one or more groups of a hyphen and more letters/digits." It matches "hello-world" and "my-blog-post-2026" but rejects "hello--world" (double hyphen), "-start" (starts with hyphen), and "CAPS" (uppercase).


Once you're writing patterns like that from scratch rather than copying from Stack Overflow, you know regex has clicked. And honestly, it's one of those tools — like understanding
URL encoding or how hashing works — where learning it properly once saves you hours of debugging over the next few years.

Frequently Asked Questions

Is regex the same across all programming languages?

The core syntax is similar across most languages — character classes, quantifiers, anchors, and groups work the same way in JavaScript, Python, Ruby, Go, and Java. But there are differences in how you call the functions, which flags are supported, and some advanced features like lookbehind assertions (which older JavaScript versions didn't support). Python's re module and JavaScript's built-in RegExp are the most commonly used, and if you learn one you can transfer most of your knowledge to the other. The MDN docs and Python docs both have good syntax references.

Why is my regex matching too much (or too little)?

The most common culprit for matching too much is greedy quantifiers — .+ and .* match as much as possible, often consuming more of the string than you intended. Add a ? to make them lazy (.+? and .*?) to match the shortest possible string. Matching too little often comes from missing anchors or using ^ and $ without multiline mode when you're working with multi-line strings. Also check whether you need the g flag — without it, methods like .match() and .replace() only act on the first match.

How do I test a regex pattern before using it in code?

Regex101.com is my go-to tool — paste your pattern and test string and it highlights matches in real time with an explanation of what each part of the pattern does. You can also select the language (JavaScript, Python, etc.) to get language-specific behavior. ToolsFuel's [text processing tools](/tools) can handle search-and-replace tasks without needing to write code at all. For quick testing in JavaScript, you can open browser DevTools and run a pattern directly in the console.

Should I use regex to validate email addresses?

For basic validation — catching obviously wrong inputs like missing @ or spaces — yes, a simple regex is fine. But truly accurate email validation via regex is surprisingly hard. The email spec (RFC 5321) allows unusual characters that most regex patterns don't handle, and even "perfect" email regex patterns can't actually confirm an address exists or accepts mail. The practical approach: use a simple regex to catch obvious typos, then send a confirmation email to verify the address is real. That's what every serious production system does.

What is a capture group in regex?

A capture group is a part of a regex pattern wrapped in parentheses that saves what it matched so you can reference it later. In JavaScript, when you call .match() or .exec(), the capture groups are available as match[1], match[2], etc. (match[0] is always the full match). This is how you extract specific parts of structured text — dates, version numbers, IP addresses, whatever. If you just need to group pattern pieces without capturing, use a non-capturing group (?:...) which is slightly more efficient.

Is there a free tool to test and debug regex patterns?

Yes — ToolsFuel's tools section at toolsfuel.com/tools has text processing utilities for common find-and-replace operations. For full regex debugging with real-time highlighting and pattern explanation, regex101.com is the developer standard. It shows you exactly which part of your pattern matched which text, which makes it much easier to understand why a pattern isn't working the way you expected.

Try ToolsFuel

23+ free online tools for developers, designers, and everyone. No signup required.

Browse All Tools