Jan 28, 2012

f Comment

In MySQL How Do You Extract and Save the Path Of an Absolute URL?

Amazon Say I have a database table called 'article' and it contains the absolute URL of each article in column 'page_url' (e.g. http://www.mensfashionforless.com/zara-black-leather-jacket.html) and I'd like to change the contents of 'page_url' so that it contains ONLY the path of the url (e.g. /zara-black-leather-jacket.html).

How do I do that in pure MySQL commands?

Solution
First you add a new column called 'page_url_old' to preserve the old absolute URLs just in case you want them back later. Remember: Always keep a backup unless you have a good reason not to.

alter table article add column page_url_old varchar(255) default '' not null
Next you copy the contents of 'page_url' over to 'page_url_old':

update article set page_url_old=page_url
Next let's update the table so that 'page_url' column now contains the paths of the URLs. HOWEVER before we do that ALWAYS use 'select' first to see what the results are in case you've made a mistake in your SQL syntax.

To that end here's the logic. We find the last '/' in the URL and extract the sub string from that slash all the way to the end of the URL. We'll use SQL's reverse(), left(), locate() functions to accomplish this task as follows:

select reverse(left(reverse(page_url), locate('/', reverse(page_url)) -1)) from article
You may be wondering why not use substr(). It's because we need to find the LAST slash but MySQL does not have such a function. So we need to reverse the page_url and find the slash index. Then we extract the sub string from the beginning up to the slash; then we reverse the sub string to get the correct path.

An Example
Here's an example if you are still confused. Say the original page_url is 'http://www.mensfashionforless.com/zara-black-leather-jacket.html'. Here's the process of extracting the path from this URL:

1. reverse(page_url) returns 'lmth.tekcaj-rehtael-kcalb-araz/moc.sselrofnoihsafsnem.www//:ptth' less quotes.

2. locate('/', reverse(page_url) returns 31. It's zero based.

3. left(reverse(page_url), 31) returns 'lmth.tekcaj-rehtael-kcalb-araz/' less quotes.

4. reverse(left(reverse(page_url), 31)) returns '/zara-black-leather-jacket.html' less quotes.

The process is straightforward once you understand it. If you find a better way to do this let me know!

You execute this 'select' command and realize it's returning the correct results. Now you can be confident that by employing the identical logic in an 'update' clause you'll be able to set 'page_url' to the correct values, as follows:

update article set page_url=concat('/',reverse(left(reverse(page_url), locate('/', reverse(page_url)) -1)))
There you go! Questions? Let me know!
Please leave a comment here!
One Minute Information - by Michael Wen
ADVERTISING WITH US - Direct your advertising requests to Michael