Aug 22, 2013

f Comment

Use PHP Perl Regular Expression To HTML Escape Ampersand & in SECONDS!

Amazon If you are trying to get your webpage HTML5 validated you may want to escape every instance of ampersand & that has not been HTML escaped yet. Namely every unescaped & needs to be &.

Note that we should leave the already escaped values intact. For example & or " or " should not be changed.
Let's see how the regular expression works. We'll be using PHP as well as Perl as they are two of the most popular programming languages.

Solution with PHP
Suppose the variable $output contains HTML code that you want to HTML escape ampersand in. Here's the PHP code to do that.

$output = preg_replace('/&(?![a-zA-Z0-9#]+;)/', '&', $output);
By the way preg_replace() does global replacement by default. I use lookahead zero-width assertions to help achieve this match.

The regular expression I use may seem daunting at first but it really isn't. It simply says "replace every & that does not end in ; with &".

Solution with Unix Command Line with Perl
What if you'd like to HTML escape ampersands in a file or in many files? First make sure you have Perl installed on your Unix box, and execute the following command to perform this task in-place on every PHP file in the current directory.

perl -p -i -e "s/&(?\![a-zA-Z0-9#]+;)/&/g" *.php
Note that the backslash \ is for shell escaping the exclamation mark !. You can change *.php to target any file you want.

Questions? Let me know!
Please leave a comment here!
One Minute Information - by Michael Wen
ADVERTISING WITH US - Direct your advertising requests to Michael