This is a sanitizing HTML "parser" done in roughly 100 lines of PHP code. It does tag and attribute whitelisting, checks for protocols to prevent XSS, deals with unclosed and unopened tags, and does some other things. The biggest issue is that it's not well-factored. However, its shortness is appealing, because I understand how it works. I would have hard time trusting a library with thousands of lines of code to do input validation.
https://gist.github.com/1575452
This is a sanitizing HTML "parser" done in roughly 100 lines of PHP code. It does tag and attribute whitelisting, checks for protocols to prevent XSS, deals with unclosed and unopened tags, and does some other things. The biggest issue is that it's not well-factored. However, its shortness is appealing, because I understand how it works. I would have hard time trusting a library with thousands of lines of code to do input validation.