Aug 22, 2013

f Comment

PHP's preg_replace() Adds Unwanted Backslashes in Backreference Replacement Strings! Fix it in SECONDS!

Amazon PHP's preg_replace() inserts backslashes in the strings that replace the backreferences. Why does it do that? How do I fix this problem?

Background
If you use PHP's preg_replace() with e modifier you are in for a surprise. According to http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php, preg_replace() plus the 'e' modifier escapes some characters (namely ', ", \ and NULL) in the strings that replace the backreferences.

Here's an example to see this behavior in action.

$body = "<ul>you'd need to</ul>"; // $body = <ul>you'd need to</ul>

$body = preg_replace('|(<ul>)(.+?)(</ul>)|e', '"$1"."$2"."$3"', $body); // $body = <ul>you\'d need to</ul>

As you can see the backreference $2 is replaced with "you\'d need to", not "you'd need to"! What is that nasty backslash \ doing there?

As explained already this behavior is correct per the implementation of preg_replace().

Solution
To fix this issue use preg_replace_callback() whenever you want to use preg_replace() plus 'e' modifier.

Here's an example.

// Remove all instances of \r\n in the text surrounded by <ul></ul>
$body = preg_replace_callback('|(<ul>)(.+?)(</ul>)|sm',
 function ($matches) {
  return $matches[1].str_replace("\r\n","",$matches[2]).$matches[3];
 },
$body);

If you don't use the 'e' modifier you can use preg_replace(). For example the following is perfectly fine.

$body = preg_replace('|<span class="cmd">(.+?)</span>|sm', '<div class="cmd">$1</div>', $body);
Questions? Let me know!
Please leave a comment here!
One Minute Information - by Michael Wen
ADVERTISING WITH US - Direct your advertising requests to Michael