Handling Invisible characters with PHP

As developers, we often assume that the data users enter into our forms is exactly what we see. But in reality, inputs can be deceiving. Sometimes a user enters a phone number or ID that looks correct, but your validation fails. Why? A client was entering a valid number (let’s say 51000000) into a form, but our backend validation kept rejecting it. No errors in the logic. The input looked fine. Manual testing with the same number passed. Why This Happens These invisible characters often sneak in when users copy text from: Messaging apps (e.g., WhatsApp, Slack). Formatted documents (e.g., Word, rich text editors). Common Types of Invisible Characters 1. Whitespace Characters Regular Space (U+0020): The standard space character. Tab (U+0009): Written as \t, adds horizontal spacing. Newlines: Line Feed (U+000A) → \n Carriage Return (U+000D) → \r 2. Zero-Width Characters Zero-Width Space (U+200B): Doesn’t show up visually but still exists in the text. Zero-Width Non-Joiner (U+200C): Used in some languages (like Arabic or Persian) to prevent character joining without adding space. Zero-Width Joiner (U+200D): Used to force characters to join without any visible space. 3. Directional Characters Left-to-Right Mark (LRM) (U+200E): Affects text direction but is invisible. Right-to-Left Mark (RLM) (U+200F): Same as above but for right-to-left languages. 4. Control Characters Soft Hyphen (U+00AD): Invisible in most cases, but may show up if the word is broken across lines. Non-Breaking Space (U+00A0): Looks like a regular space but prevents the line from breaking at that position. What Laravel Trims by Default Laravel automatically trims whitespace (spaces, tabs, new lines) from request input when you use the TrimStrings middleware, which is enabled by default. However, it does not remove invisible Unicode characters like: Zero-width spaces (\u{200B}) Left-to-right marks Other hidden characters

Mar 26, 2025 - 00:06

As developers, we often assume that the data users enter into our forms is exactly what we see. But in reality, inputs can be deceiving. Sometimes a user enters a phone number or ID that looks correct, but your validation fails. Why?

A client was entering a valid number (let’s say 51000000) into a form, but our backend validation kept rejecting it.

No errors in the logic.
The input looked fine.
Manual testing with the same number passed.

Why This Happens

These invisible characters often sneak in when users copy text from:

Messaging apps (e.g., WhatsApp, Slack).
Formatted documents (e.g., Word, rich text editors).

Common Types of Invisible Characters

1. Whitespace Characters

Regular Space (U+0020): The standard space character.
Tab (U+0009): Written as \t, adds horizontal spacing.
Newlines:
- Line Feed (U+000A) → \n
- Carriage Return (U+000D) → \r

2. Zero-Width Characters

Zero-Width Space (U+200B): Doesn’t show up visually but still exists in the text.
Zero-Width Non-Joiner (U+200C): Used in some languages (like Arabic or Persian) to prevent character joining without adding space.
Zero-Width Joiner (U+200D): Used to force characters to join without any visible space.

3. Directional Characters

Left-to-Right Mark (LRM) (U+200E): Affects text direction but is invisible.
Right-to-Left Mark (RLM) (U+200F): Same as above but for right-to-left languages.

4. Control Characters

Soft Hyphen (U+00AD): Invisible in most cases, but may show up if the word is broken across lines.
Non-Breaking Space (U+00A0): Looks like a regular space but prevents the line from breaking at that position.

What Laravel Trims by Default

Laravel automatically trims whitespace (spaces, tabs, new lines) from request input when you use the TrimStrings middleware, which is enabled by default. However, it does not remove invisible Unicode characters like: