String Inspector
Inspect Unicode characters and invisible strings
String Inspector / Unicode Debugger
How to use String Inspector
- •Paste or type your string into the input field. This can be any text — a string from your code, text copied from a web page, or content from a file.
- •View the character-by-character breakdown displayed below. Each character is shown alongside its Unicode codepoint (e.g., U+0041 for "A"), its UTF-8 byte length, and its Unicode category.
- •Identify invisible characters highlighted in the output. Zero-width spaces, non-breaking spaces, byte order marks, and other invisible characters are flagged so you can spot them instantly.
- •Use the findings to debug encoding issues, remove hidden characters, or understand exactly what your string contains at the byte level.
What are Unicode and invisible characters?
Unicode is the universal character encoding standard that assigns a unique codepoint to every character in every writing system, plus symbols, emoji, and control characters. A Unicode codepoint is written as U+XXXX (e.g., U+0041 for the Latin letter "A"). The standard currently defines over 149,000 characters across 161 scripts.
UTF-8 is the dominant encoding for transmitting Unicode text. It uses 1 to 4 bytes per character: ASCII characters (U+0000 to U+007F) use 1 byte, most Latin/Greek/Cyrillic characters use 2 bytes, CJK characters and most emoji use 3-4 bytes. Understanding byte lengths matters for database column sizing, API payload limits, and protocol constraints.
Invisible characters are the source of countless subtle bugs. A zero-width space (U+200B) looks like nothing but causes string comparisons to fail. A byte order mark (U+FEFF) at the start of a file can break JSON parsers. A non-breaking space (U+00A0) looks identical to a regular space but fails equality checks and can break CSS class names. Right-to-left marks (U+200F) can silently reverse text direction in security-sensitive contexts like file names.
These characters are often introduced by copying text from web pages, PDFs, word processors, or messaging apps. They are invisible in most editors and terminals, making them extremely difficult to diagnose without a dedicated inspection tool.
Common use cases
- •Debugging string comparison failures: Two strings look identical in the console but
===returns false. Inspect both to find the hidden character causing the mismatch. - •Cleaning data imports: CSV or JSON data copied from spreadsheets or PDFs often contains invisible formatting characters that break processing pipelines.
- •Security auditing: Detect homoglyph attacks (characters that look like Latin letters but are from different scripts) and invisible bidirectional text that could disguise malicious file names or URLs.
- •Database troubleshooting: Diagnose why a database query fails to match a value that appears correct — often caused by trailing whitespace, non-breaking spaces, or zero-width characters.
FAQ
Q: What is a zero-width space and how does it get into my text? A: A zero-width space (U+200B) is a Unicode character with no visible width. It is used as a word-break hint in scripts without spaces (Thai, Khmer) and in HTML for breaking long words. It often sneaks in when copying text from web pages.
Q: How do I remove invisible characters from my string?
A: Once this tool identifies them, you can use a regex like /[\u200B-\u200D\uFEFF]/g to strip the most common invisible characters, or replace them manually in your editor.
Q: Does string length count invisible characters?
A: Yes. JavaScript's .length property counts all Unicode code units, including invisible characters. A string that looks like "hello" might have a .length of 6 if it contains a hidden character.
Is my data safe?
Yes. This tool runs entirely in your browser. Your data is never sent to our servers.
How to use String Inspector
- Paste or type your string into the input field. This can be any text — a string from your code, text copied from a web page, or content from a file.
- View the character-by-character breakdown displayed below. Each character is shown alongside its Unicode codepoint (e.g., U+0041 for "A"), its UTF-8 byte length, and its Unicode category.
- Identify invisible characters highlighted in the output. Zero-width spaces, non-breaking spaces, byte order marks, and other invisible characters are flagged so you can spot them instantly.
- Use the findings to debug encoding issues, remove hidden characters, or understand exactly what your string contains at the byte level.
What are Unicode and invisible characters?
Unicode is the universal character encoding standard that assigns a unique codepoint to every character in every writing system, plus symbols, emoji, and control characters. A Unicode codepoint is written as U+XXXX (e.g., U+0041 for the Latin letter "A"). The standard currently defines over 149,000 characters across 161 scripts.
UTF-8 is the dominant encoding for transmitting Unicode text. It uses 1 to 4 bytes per character: ASCII characters (U+0000 to U+007F) use 1 byte, most Latin/Greek/Cyrillic characters use 2 bytes, CJK characters and most emoji use 3-4 bytes. Understanding byte lengths matters for database column sizing, API payload limits, and protocol constraints.
Invisible characters are the source of countless subtle bugs. A zero-width space (U+200B) looks like nothing but causes string comparisons to fail. A byte order mark (U+FEFF) at the start of a file can break JSON parsers. A non-breaking space (U+00A0) looks identical to a regular space but fails equality checks and can break CSS class names. Right-to-left marks (U+200F) can silently reverse text direction in security-sensitive contexts like file names.
These characters are often introduced by copying text from web pages, PDFs, word processors, or messaging apps. They are invisible in most editors and terminals, making them extremely difficult to diagnose without a dedicated inspection tool.
Common use cases
- Debugging string comparison failures: Two strings look identical in the console but
===returns false. Inspect both to find the hidden character causing the mismatch. - Cleaning data imports: CSV or JSON data copied from spreadsheets or PDFs often contains invisible formatting characters that break processing pipelines.
- Security auditing: Detect homoglyph attacks (characters that look like Latin letters but are from different scripts) and invisible bidirectional text that could disguise malicious file names or URLs.
- Database troubleshooting: Diagnose why a database query fails to match a value that appears correct — often caused by trailing whitespace, non-breaking spaces, or zero-width characters.
FAQ
Q: What is a zero-width space and how does it get into my text? A: A zero-width space (U+200B) is a Unicode character with no visible width. It is used as a word-break hint in scripts without spaces (Thai, Khmer) and in HTML for breaking long words. It often sneaks in when copying text from web pages.
Q: How do I remove invisible characters from my string?
A: Once this tool identifies them, you can use a regex like /[\u200B-\u200D\uFEFF]/g to strip the most common invisible characters, or replace them manually in your editor.
Q: Does string length count invisible characters?
A: Yes. JavaScript's .length property counts all Unicode code units, including invisible characters. A string that looks like "hello" might have a .length of 6 if it contains a hidden character.
Is my data safe?
Yes. This tool runs entirely in your browser. Your data is never sent to our servers.