What Is URL Encoding? Here’s Why %20 Keeps Showing Up
Photo by Markus Spiske on Unsplash
Table of Contents
The %20 Rabbit Hole
That’s when it hit me that I’d been building web apps for years without ever properly understanding URL encoding. I just sort of... knew that spaces became %20 and left it at that. Most developers I know are in the same boat. You reach for encodeURIComponent() when something breaks, you skip it when things work, and you never think about why.
But URL encoding isn’t just a quirk of the web. It’s a fundamental piece of how browsers, servers, and APIs communicate. And misunderstanding it causes real bugs — broken API calls, mangled query parameters, security holes, and those frustrating “page not found” errors where the URL looks perfectly fine to human eyes.
I got curious enough to actually dig into the spec. Turns out the rabbit hole goes deeper than you’d expect.
How Percent Encoding Actually Works
Everything else needs to be encoded. The process is called percent encoding (URL encoding is just the casual name for it), and it works like this: take the character, convert it to its UTF-8 byte representation, and then write each byte as a % followed by two hex digits.
A space becomes %20 because the ASCII value of a space is 32, which is 0x20 in hexadecimal. Simple enough. But a character like é becomes %C3%A9 because in UTF-8 it’s represented as two bytes: 0xC3 and 0xA9. And an emoji like 🔥 becomes %F0%9F%94%A5 — four bytes, four percent-encoded chunks.
I tested this myself using ToolsFuel’s URL encoder/decoder and it immediately made the byte-level conversion visible. Way easier than staring at a hex table trying to work out the math in my head.
The whole system exists because URLs were designed in the early ’90s when ASCII was king. The creators needed a way to represent characters outside the safe set without breaking URL parsers. Percent encoding was the solution — ugly, sure, but it works and it’s universal. Every browser, every server, every HTTP library on the planet understands it.
Characters That Break URLs (and Ones That Don’t)
Photo by Fotis Fotopoulos on Unsplash
Safe characters that never need encoding: letters A-Z and a-z, digits 0-9, and four unreserved marks: - _ . ~. These pass through URLs untouched.
Reserved characters are the tricky ones: : / ? # [ ] @ ! $ & ' ( ) * + , ; =. These have special meaning in URL syntax. A / separates path segments. A ? starts the query string. A # marks the fragment. A & separates query parameters. If you want to use these characters as data — like a search query that contains an & — you need to encode them. If they’re serving their structural purpose in the URL, you leave them alone.
Everything else always needs encoding: spaces, non-ASCII characters like accented letters or CJK characters or emoji, and unsafe characters like <, >, {, }, |, \, ^, and backticks.
The space character deserves special attention because there are two competing conventions. In URL paths, spaces become %20. In HTML form submissions using application/x-www-form-urlencoded, spaces become +. This inconsistency has caused more bugs than I can count. I’ve seen APIs that accept + in query strings but choke on %20, and vice versa. If you’re ever unsure which convention a server expects, just test both — or better yet, run your string through a URL encoder tool to see exactly what bytes get produced.
Here’s a quick reference for the most common encodings you’ll run into:
| Character | Encoded | Why | |-----------|---------|-----| | Space | %20 or + | Most common encoding you’ll see | | & | %26 | Separates query params | | = | %3D | Separates key=value pairs | | ? | %3F | Starts query string | | # | %23 | Fragment identifier | | / | %2F | Path separator | | @ | %40 | Email addresses in URLs | | + | %2B | Already means space in forms |
encodeURI vs encodeURIComponent — Pick Wrong and Watch Things Break
encodeURI() encodes a full URL but leaves structural characters alone. It won’t touch :, /, ?, #, or = because it assumes those are doing their jobs as URL delimiters. Use this when you have a complete URL and just want to clean up weird characters in it.
encodeURIComponent() encodes everything except letters, digits, and the four unreserved marks (- _ . ~). It WILL encode :, /, ?, and all the other reserved characters. Use this when you’re encoding a value that goes into a URL — like a query parameter value or a path segment.
Here’s where it goes wrong in practice:
```javascript // WRONG — this destroys the URL structure encodeURIComponent("https://example.com/search?q=hello world") // → "https%3A%2F%2Fexample.com%2Fsearch%3Fq%3Dhello%20world"
// RIGHT — encode the full URL, preserve structure encodeURI("https://example.com/search?q=hello world") // → "https://example.com/search?q=hello%20world"
// ALSO RIGHT — encode just the parameter value "https://example.com/search?q=" + encodeURIComponent("hello world") // → "https://example.com/search?q=hello%20world" ```
The second and third examples look identical for this input. But if the search query itself contains an & or =, encodeURI() won’t encode them and you’ll get a mangled query string. encodeURIComponent() handles it correctly.
My rule of thumb: if you’re building a URL by concatenating parts, always use encodeURIComponent() on each value. If you have a complete URL string with some bad characters, use encodeURI(). And if you’re not sure, default to encodeURIComponent() — it’s safer to over-encode than under-encode.
If you’ve read about how JWT tokens work, you might’ve noticed they use Base64URL encoding — which is regular Base64 with + and / swapped for - and _. That swap exists specifically because + and / have special meaning in URLs and would need percent encoding otherwise. The JWT folks decided to sidestep the problem entirely.
Double Encoding and Other Real-World Nightmares
Double encoding happens more often than you’d think. It’s common when you’re passing data through multiple layers — your JavaScript encodes it, then an HTTP library encodes the whole URL again, or the browser encodes on top of what you already encoded. The fix is always the same: encode exactly once, at the boundary where you’re constructing the URL. Never encode “just in case.”
There’s another gotcha with non-ASCII characters. JavaScript strings are UTF-16 internally, but URL encoding uses UTF-8 bytes. The encodeURIComponent() function handles this conversion automatically, but if you’re working in another language or doing manual encoding, you need to convert to UTF-8 first or you’ll get garbage.
I ran into this exact issue building an API that accepted Japanese search queries. The query string looked correct in Chrome’s address bar (Chrome decodes percent-encoded characters for display), but the actual encoded bytes were wrong because a middleware layer was using Latin-1 encoding instead of UTF-8. The server decoded them as mojibake. Took me two hours to figure out because everything looked fine visually.
And here’s a fun one: different servers treat URL encoding differently. Some servers decode + as a space in the URL path, not just query strings. Some don’t. Some are case-sensitive about hex digits — %2f vs %2F. The spec says they should be equivalent, but reality and specs don’t always agree.
Oh, and proxy servers. If you’re running behind Nginx or Cloudflare, they might decode and re-encode your URLs as requests pass through. I’ve seen paths with encoded slashes (%2F) get decoded by a reverse proxy, which then changes the routing entirely. You think you’re hitting /api/search%2Fresults but the server sees /api/search/results — a completely different route.
The URL API That Fixes Most of This
```javascript const url = new URL("https://api.example.com/search"); url.searchParams.set("q", "hello world & goodbye"); url.searchParams.set("lang", "日本語"); console.log(url.toString()); // https://api.example.com/search?q=hello+world+%26+goodbye&lang=%E6%97%A5%E6%9C%AC%E8%AA%9E ```
The URL constructor and URLSearchParams handle all the encoding for you. No guessing about which function to use. I’ve been pushing my team to use this instead of manual concatenation and our URL-related bugs dropped to basically zero.
Other situations where encoding still bites you: file download URLs with spaces or Unicode in the filename, redirect chains where encoded URLs pass through multiple hops, webhook URLs that contain query parameters as parameter values (yes, URLs inside URLs — it’s as cursed as it sounds), and API integrations where some endpoints expect + for spaces and others expect %20.
And if you’re doing any kind of Base64 encoding for images, the resulting Base64 string often contains + and /, which will break if you drop them into a URL query parameter without encoding. That’s why data URIs work fine in CSS and HTML attributes but cause problems when you try to pass them as URL parameters.
Honestly, the biggest lesson I’ve taken away from all of this is that URL encoding bugs are almost never about not knowing what %20 means. They’re about encoding at the wrong layer, encoding twice, or assuming all servers interpret the same bytes the same way. Once you internalize that, you stop chasing phantom bugs and start asking the right question: where exactly is this string getting encoded, and is it happening more than once?
Frequently Asked Questions
What does %20 mean in a URL?
The %20 sequence represents a space character. URL encoding (also called percent encoding) converts characters that aren’t allowed in URLs into a % followed by their hexadecimal byte value. Since a space has the ASCII value 32, which is 0x20 in hex, it becomes %20. You’ll see this whenever a URL contains spaces — for example, a filename like “my document.pdf” becomes “my%20document.pdf” in a URL. Some systems use + instead of %20 for spaces in query strings, which is a separate convention from HTML form encoding.
What’s the difference between encodeURI and encodeURIComponent in JavaScript?
encodeURI() is meant for encoding a complete URL — it encodes unsafe characters but leaves structural characters like :, /, ?, and # alone because it assumes they’re serving their purpose as URL delimiters. encodeURIComponent() encodes everything except letters, digits, and - _ . ~, including all those structural characters. Use encodeURIComponent() when encoding a value that goes inside a URL (like a query parameter), and encodeURI() when you have a full URL that just needs cleanup. Getting this wrong is one of the most common JavaScript URL bugs I’ve seen.
Why do some URLs use + for spaces and others use %20?
There are two different encoding standards at play. RFC 3986 (the URL spec) says spaces should be encoded as %20 everywhere. But the application/x-www-form-urlencoded format, which is what HTML forms use when submitting data, encodes spaces as +. Most web servers accept both in query strings, but technically + only means “space” in form-encoded data. In the URL path portion, + is just a literal plus sign. This inconsistency trips up developers all the time, especially when building APIs that need to handle both conventions.
Is URL encoding the same as Base64 encoding?
No, they’re completely different things that solve different problems. URL encoding (percent encoding) makes individual characters safe for use in URLs by converting them to %XX hex sequences. Base64 encoding converts arbitrary binary data into a text string using 64 ASCII characters. You’d use URL encoding when putting a value into a URL query parameter, and Base64 when you need to embed binary data (like an image) as text. Confusingly, Base64 output contains + and / characters which themselves need URL encoding if you put them in a URL — that’s why Base64URL exists as a URL-safe variant.
What is double encoding and how do I avoid it?
Double encoding happens when an already-encoded string gets encoded again. A space becomes %20, then the % in %20 gets encoded to %25, producing %2520. The server receives the literal text “%20” instead of a space. It’s surprisingly common when data passes through multiple layers that each try to encode it. The fix is simple: encode exactly once, at the point where you’re constructing the URL. Don’t encode “just in case” and don’t assume your HTTP library won’t encode things you’ve already encoded. If you’re seeing %25 in your URLs, you’ve almost certainly got a double encoding bug.
Do emoji need URL encoding?
Yes. Emoji are multi-byte Unicode characters and they’re definitely not in the safe ASCII set that URLs allow. A single emoji like 🔥 gets encoded as %F0%9F%94%A5 — four percent-encoded bytes because that’s how many bytes UTF-8 uses to represent it. Modern browsers hide this from you by displaying the decoded emoji in the address bar, but the actual HTTP request always sends the percent-encoded version. If you’re building URLs that might contain emoji (user-generated content, for example), always encode them properly.
Try ToolsFuel
23+ free online tools for developers, designers, and everyone. No signup required.
Browse All Tools