URL Encoding Guide: Percent Encoding, RFC 3986 & Common Mistakes
Learn how URL encoding works, which characters need percent-encoding, and common pitfalls. Covers RFC 3986 rules, encodeURIComponent vs encodeURI, double encoding, and programming examples.
URL Encoding Guide: Percent Encoding, RFC 3986 & Common Mistakes
If you have ever seen %20 in a web address or wondered why your API request broke when you included an & in a query parameter, you have encountered URL encoding. Also known as percent encoding, it is the mechanism that makes it possible to include special characters in URLs without breaking the web.
This guide covers everything developers need to know about URL encoding: which characters need encoding, how the algorithm works, common mistakes that break applications, and encoding functions in seven programming languages. Try our free URL encoder and decoder to see percent encoding in action.
What Is URL Encoding?
URL encoding (officially called percent encoding in RFC 3986) is the process of converting characters into a format that can be safely included in a URL. Characters that have special meaning in URL syntax, or that fall outside the allowed ASCII range, are replaced with a % sign followed by their two-digit hexadecimal value.
For example:
- Space →
%20 - Ampersand
&→%26 - Equals
=→%3D - Slash
/→%2F - Euro sign
€→%E2%82%AC(three UTF-8 bytes)
Why URL Encoding Is Necessary
URLs have a strict syntax defined in RFC 3986. Certain characters serve as structural delimiters:
https://example.com:8080/path/to/page?key=value&other=data#section
___ ___________ ____ ___________ _____________________ _______
scheme host port path query string fragment
Each delimiter character has a defined role:
| Character | Role in URL | What Happens Without Encoding |
|---|---|---|
: | Separates scheme from host, host from port | key=time:30 confuses parsers |
/ | Separates path segments | /search/cats/dogs looks like 3 segments |
? | Starts query string | ?query=what? has ambiguous meaning |
# | Starts fragment identifier | key=C# truncates at # |
& | Separates query parameters | name=Tom&Jerry creates 2 parameters |
= | Separates key from value | equation=2+2=4 has 2 equals signs |
@ | Separates userinfo from host | email=user@host looks like authentication |
+ | Space (in form encoding only) | Ambiguous: plus sign or space? |
When these characters appear as data rather than delimiters, they must be percent-encoded.
Which Characters Need Encoding?
RFC 3986 classifies characters into three groups:
Unreserved Characters (Never Need Encoding)
These characters can appear anywhere in a URL without encoding:
A-Z a-z 0-9 - _ . ~
Total: 66 characters.
Reserved Characters (Encode When Used as Data)
These characters have special meaning in URL syntax. They must be encoded when used as data values:
| Character | Hex Code | Role |
|---|---|---|
: | %3A | Scheme/port separator |
/ | %2F | Path separator |
? | %3F | Query start |
# | %23 | Fragment start |
[ | %5B | IPv6 brackets |
] | %5D | IPv6 brackets |
@ | %40 | Userinfo separator |
! | %21 | Sub-delimiter |
$ | %24 | Sub-delimiter |
& | %26 | Parameter separator |
' | %27 | Sub-delimiter |
( | %28 | Sub-delimiter |
) | %29 | Sub-delimiter |
* | %2A | Sub-delimiter |
+ | %2B | Sub-delimiter / space |
, | %2C | Sub-delimiter |
; | %3B | Sub-delimiter |
= | %3D | Key-value separator |
Everything Else (Always Encode)
All other characters -- including spaces, non-ASCII characters, and control characters -- must always be encoded:
| Character | Encoding | Notes |
|---|---|---|
| Space | %20 | Most common encoding |
" | %22 | Double quote |
< | %3C | Less than |
> | %3E | Greater than |
{ | %7B | Left brace |
} | %7D | Right brace |
| ` | ` | %7C |
\ | %5C | Backslash |
^ | %5E | Caret |
` | %60 | Backtick |
| Non-ASCII | Multi-byte | UTF-8 encoded, then percent-encoded |
How Percent Encoding Works
The encoding algorithm is straightforward:
- Check if the character needs encoding (not unreserved, or is a reserved character used as data)
- Convert the character to its UTF-8 byte sequence
- Express each byte as
%XXwhere XX is the hexadecimal value
Example: Encoding "café"
| Character | UTF-8 Bytes | Percent-Encoded |
|---|---|---|
| c | 63 | c (unreserved) |
| a | 61 | a (unreserved) |
| f | 66 | f (unreserved) |
| é | C3 A9 | %C3%A9 |
Result: caf%C3%A9
Example: Encoding a Full Query String
Original URL:
https://example.com/search?q=hello world&lang=en&tag=c#
Properly encoded:
https://example.com/search?q=hello%20world&lang=en&tag=c%23
Note: & and = are kept unencoded because they serve their structural role. Only the data values (hello world, c#) are encoded.
Spaces: %20 vs + (Plus Sign)
The two encodings for spaces cause endless confusion:
| Format | Space Encoding | Standard | Where Used |
|---|---|---|---|
| Percent encoding | %20 | RFC 3986 | URL paths, general use |
| Form encoding | + | HTML spec (application/x-www-form-urlencoded) | HTML form submissions |
Best practice: Use %20 for spaces in URL paths. In query strings, both %20 and + are widely accepted, but %20 is more universally compatible.
encodeURI vs encodeURIComponent
JavaScript provides two URL encoding functions with critically different behavior:
encodeURI()
Encodes a complete URI. Preserves characters that have structural meaning in URLs:
Does NOT encode: ; , / ? : @ & = + $ - _ . ! ~ * ' ( ) #
encodeURIComponent()
Encodes a URI component (e.g., a single query parameter value). Encodes everything except unreserved characters:
Does NOT encode: A-Z a-z 0-9 - _ . ! ~ * ' ( )
When to Use Which
| Scenario | Function | Example |
|---|---|---|
| Encoding a complete URL | encodeURI() | encodeURI(fullUrl) |
| Encoding a query parameter value | encodeURIComponent() | ?q=${encodeURIComponent(searchTerm)} |
| Encoding a path segment | encodeURIComponent() | /users/${encodeURIComponent(username)} |
| Encoding a redirect URL as a parameter | encodeURIComponent() | ?redirect=${encodeURIComponent(returnUrl)} |
Rule of thumb: If the value might contain &, =, ?, or /, use encodeURIComponent().
Common Mistakes
1. Double Encoding
The most common URL encoding bug. It happens when you encode a string that is already encoded:
Original: hello world
First encode: hello%20world
Double encode: hello%2520world ← WRONG
The % in %20 gets encoded as %25, producing %2520. The server receives literal %20 instead of a space.
How to prevent: Always decode before re-encoding, or track whether a string is already encoded.
2. Not Encoding Query Parameter Values
If searchTerm is "cats & dogs", the wrong version creates /search?q=cats & dogs, which splits into two parameters: q=cats and dogs (or worse).
3. Using encodeURI() for Parameter Values
4. Forgetting Non-ASCII Characters
URLs with non-ASCII characters (accented letters, CJK, emoji) must be encoded:
❌ https://example.com/café
✅ https://example.com/caf%C3%A9
Modern browsers display the decoded version in the address bar, but the actual HTTP request uses percent encoding.
URL Encoding in Programming Languages
| Language | Encode | Decode |
|---|---|---|
| JavaScript | encodeURIComponent(str) | decodeURIComponent(str) |
| Python | urllib.parse.quote(str) | urllib.parse.unquote(str) |
| Java | URLEncoder.encode(str, "UTF-8") | URLDecoder.decode(str, "UTF-8") |
| PHP | rawurlencode($str) | rawurldecode($str) |
| C# | Uri.EscapeDataString(str) | Uri.UnescapeDataString(str) |
| Go | url.QueryEscape(str) | url.QueryUnescape(str) |
| Ruby | CGI.escape(str) | CGI.unescape(str) |
Note: Java's URLEncoder.encode() and PHP's urlencode() use + for spaces (form encoding). Use rawurlencode() in PHP for RFC 3986 compliance with %20 for spaces.
URL Encoding vs HTML Encoding
These two encoding systems are frequently confused:
| Feature | URL Encoding | HTML Encoding |
|---|---|---|
| Purpose | Safe characters in URLs | Safe characters in HTML |
| Format | %XX (hex bytes) | &name; or &#number; |
| Space | %20 | or just a space |
& | %26 | & |
< | %3C | < |
" | %22 | " |
| Standard | RFC 3986 | HTML specification |
| Used for | URL query strings, paths | HTML attributes, content |
They serve completely different purposes and are not interchangeable.
Frequently Asked Questions
What does %20 mean in a URL?
%20 is the percent-encoded representation of a space character. The % indicates encoding, and 20 is the hexadecimal value of the space character (decimal 32). When a browser or server encounters %20 in a URL, it replaces it with a space.
Is URL encoding case-sensitive?
The hex digits in percent encoding (%3A vs %3a) are case-insensitive per RFC 3986, but the standard recommends uppercase. Most implementations accept both. However, the rest of the URL path may be case-sensitive depending on the server (Linux servers are typically case-sensitive, Windows servers are not).
How do international domain names work?
International Domain Names (IDN) use a system called Punycode to convert Unicode domain names to ASCII. For example, münchen.de becomes xn--mnchen-3ya.de. The path and query components use standard percent encoding for non-ASCII characters.
Ready to encode or decode URLs? Try our URL encoder and decoder for instant percent encoding with a character breakdown. For other encoding tools, check out our Base64 encoder and hex to text converter.