PHP’s strip_tags() function seems like an easy solution for allowing certain HTML elements while blocking others. But it has a dangerous flaw.
The Problem
strip_tags() accepts a second parameter to whitelist specific tags:
$clean = strip_tags($input, '<a><b><i>');
Looks safe, right? But what if the input contains:
<a href="javascript:alert('XSS')" onclick="stealCookies()">Click me</a>
The <a> tag is whitelisted, so it passes through - complete with its malicious attributes. strip_tags() removes tags, but preserves their attributes.
The Solution
If you don’t need attributes, strip them entirely:
function strip_tags_with_attributes($string, $allowedTags) {
// First, strip disallowed tags
$string = strip_tags($string, $allowedTags);
// Then remove all attributes from allowed tags
return preg_replace('/<(\w+)[^>]*>/', '<$1>', $string);
}
// Usage
$input = '<a href="javascript:bad()" onclick="evil()">Link</a>';
$clean = strip_tags_with_attributes($input, '<a><b><i>');
// Result: <a>Link</a>
When This Isn’t Enough
If you actually need to preserve safe attributes (like href for links), you need a proper HTML sanitizer:
- HTMLPurifier - The gold standard for PHP
- DOMDocument - Parse and whitelist specific attributes
Key Takeaway
strip_tags() alone is not safe enough when you’re whitelisting tags. Always consider what attributes could slip through and either:
- Remove all attributes (simple approach above)
- Use a proper HTML sanitization library (complex but flexible)
Never trust user input, even when it appears to be sanitized.