Unicode Escape/Unescape

Convert between plain text and \uXXXX Unicode escape sequences

What is Unicode?

Unicode is a universal character encoding standard that assigns a unique code point to every character across all writing systems. Maintained by the Unicode Consortium, the standard currently defines over 149,000 characters covering 159 modern and historic scripts, as well as symbols, emoji, and control characters. Unicode replaced the fragmented set of regional character encodings (like ASCII, Latin-1, Shift_JIS) with a single, unified system.

Unicode code points are written in the format U+XXXX, where XXXX is a hexadecimal number. For example, the Latin letter “A” is U+0041, the Greek letter alpha is U+03B1, and the emoji grinning face is U+1F600. The full Unicode range spans from U+0000 to U+10FFFF.

What is Unicode Escaping?

Unicode escaping is the process of converting characters into a text-based representation using their hexadecimal code points. The most common format is \uXXXX, used in JavaScript, Java, C#, and JSON. For characters outside the Basic Multilingual Plane (above U+FFFF), surrogate pairs or extended syntax like \u{1F600} may be used.

Unicode escaping is needed when you want to include non-ASCII characters in source code files that use ASCII encoding, represent special characters in JSON strings, transmit Unicode data through systems that only support ASCII, or debug character encoding issues.

How to Use the Unicode Escape/Unescape Tool

  1. Paste your text or escaped sequences into the input area
  2. Click “Escape” to convert text to \uXXXX sequences, “Unescape” to convert back, or “Code Points” to view the Unicode code point for each character
  3. Copy the result with the “Copy” button or Ctrl+Shift+C

Unicode Escape Formats Across Languages

LanguageEscape FormatExample (for “A”)
JavaScript/JSON\u0041\u0041
Python\u0041 or \U00000041\u0041
Java\u0041\u0041
C#\u0041\u0041
HTMLA or AA
CSS\0041\0041
Ruby\u0041 or \u{41}\u0041

Common Use Cases for Unicode Escaping

Internationalization (i18n): When building multilingual applications, Unicode escaping ensures that non-Latin characters in translation files and resource bundles are correctly preserved regardless of the file encoding.

JSON Data: The JSON specification requires that certain characters be escaped, and Unicode escaping is the standard way to include non-ASCII characters in JSON payloads when UTF-8 encoding isn’t available.

Debugging Encoding Issues: When text appears garbled or contains unexpected characters, viewing the Unicode code points helps identify whether the issue is a wrong encoding, a missing font, or corrupted data.

Source Code Portability: Escaping non-ASCII characters in source code ensures that the code works correctly even if the file is opened in an editor or system that doesn’t support UTF-8.

Common Unicode Escape Characters — Quick Reference

Here are frequently escaped characters developers encounter in everyday work:

CharacterNameCode PointEscape
©Copyright signU+00A9©
®Registered signU+00AE®
TrademarkU+2122
Euro signU+20AC
£Pound signU+00A3£
¥Yen signU+00A5¥
°Degree signU+00B0°
Em dashU+2014
'Right single quoteU+2019
" "Smart quotesU+201C/U+201D /
EllipsisU+2026
BulletU+2022
Right arrowU+2192
Not equalU+2260
≤ ≥Less/greater-equalU+2264/U+2265 /

These characters frequently cause issues when copy-pasted from word processors, PDFs, or web pages into source code or configuration files. Escaping them prevents encoding mismatches across different systems and editors.

Understanding UTF-8, UTF-16, and Code Points

Unicode defines code points, but the actual byte representation depends on the encoding:

  • UTF-8 uses 1 to 4 bytes per character and is the dominant encoding on the web
  • UTF-16 uses 2 or 4 bytes per character and is used internally by JavaScript and Java
  • UTF-32 uses exactly 4 bytes per character, providing direct code point mapping

The \uXXXX escape format corresponds to UTF-16 code units. Characters in the Basic Multilingual Plane (U+0000 to U+FFFF) use a single \uXXXX escape, while characters above U+FFFF (like emoji) require a surrogate pair of two \uXXXX escapes.

Troubleshooting Unicode Escape Issues

Surrogate pair errors: If you see 😀 instead of a readable emoji, these are UTF-16 surrogate pairs. The pair 😀 decodes to the grinning face emoji (U+1F600). Modern JavaScript engines handle this automatically, but older tools may require manual pairing. This tool correctly decodes surrogate pairs back to their original characters.

Mojibake (garbled text): Text like é instead of é or ’ instead of ' means UTF-8 bytes were interpreted as Latin-1 or Windows-1252. The fix is to ensure every layer in your stack — file encoding, database charset, HTTP Content-Type header, and HTML <meta charset> — consistently uses UTF-8.

Mixed escaped and plain text: It’s valid to have Hello World where only some characters are escaped. The unescape operation in this tool handles mixed content correctly, converting only the \uXXXX sequences while leaving plain text untouched.

Escape format mismatch: Different languages use different escape syntax. If é doesn’t work in your context, check whether your language expects \x{E9} (Perl/PHP regex), \U000000E9 (Python 32-bit), &#x00E9; (HTML), or %C3%A9 (URL encoding). Use the format table above to match the correct syntax.

Frequently Asked Questions

What is Unicode escaping?

Unicode escaping converts characters into their \uXXXX representation, where XXXX is the hexadecimal Unicode code point. For example, the letter 'A' becomes \u0041. This is commonly used in programming languages like JavaScript, Java, Python, and C# to represent non-ASCII characters in source code.

How do I convert text to Unicode escape sequences?

Paste your text into the input area and click the 'Escape' button. Every character will be converted to its \uXXXX representation. You can then copy the result for use in your code.

What are Unicode code points?

A Unicode code point is a unique number assigned to each character in the Unicode standard. Code points are written as U+XXXX (e.g., U+0041 for 'A'). The Unicode standard covers over 149,000 characters from 159 modern and historic scripts.

How do I unescape Unicode sequences?

Paste your escaped text containing \uXXXX sequences into the input area and click the 'Unescape' button. The tool will convert all escape sequences back to their original characters.

Which characters need Unicode escaping?

Any character outside the printable ASCII range (U+0020 to U+007E) should be escaped when working with ASCII-only systems. In JSON, the characters that must be escaped are the double quote (\"), backslash (\\), and control characters (U+0000 to U+001F). In practice, developers escape accented letters, CJK characters, emoji, currency symbols, and mathematical operators.

What is the difference between Unicode escape codes and HTML entities?

Unicode escape codes like \u00E9 are used in programming languages (JavaScript, Java, Python, C#) and JSON. HTML entities like &eacute; or &#233; are used in HTML markup. Both represent the same character but in different contexts. Use Unicode escapes in code and HTML entities in web pages.

What are Unicode escape sequences in JavaScript?

JavaScript supports two Unicode escape formats: \uXXXX for characters in the Basic Multilingual Plane (U+0000 to U+FFFF), and \u{XXXXX} (ES6+) for any code point including emoji and supplementary characters. For example, \u0041 represents 'A' and \u{1F600} represents the grinning face emoji.

How do I fix garbled Unicode text?

Garbled text (mojibake) usually means the text was decoded with the wrong encoding. Paste the garbled text into this tool and try unescaping it. If the text contains sequences like é or ’, it was likely UTF-8 text read as Latin-1. Check that your file, database, and HTTP headers all specify UTF-8 consistently.

Is my data safe when converting?

Yes. All processing happens entirely in your browser using JavaScript. Your data is never sent to any server.