Jan 22, 2013

f Comment

Quick IT Guide to Troubleshoot a Slow Responding Web Server in MINUTES!

Amazon One day I notice my men's fashion website is crawling to a halt. I kept getting 503 Gateway Timeout errors when I was browsing the site. What could I do? I did not receive any training as an IT specialist but with some working knowledge of how the internet and the web server works I managed to get my web server back on its feet.

Read on to find out how I managed to resolve this issue within MINUTES!

I am using Nginx web server (version nginx/0.7.65) on my Amazon EC2 micro instance running Ubuntu 10.04.4 LTS.
Root Cause
The root cause of getting 503 Gateway Timeout errors is that the backend of your web server takes longer time to handle the request than the timeout set in the web server's configuration. In my case my Nginx server is set to timeout past 30 seconds.

Granted you can set the timeout longer but it wouldn't resolve your issue. When visitors are waiting more than 5 seconds they usually bounce anyway.
A common cause of this issue is that some client is hitting your server with HTTP requests so heavily that your web server is extremely busy serving them. This situation is known as the denial of service attack, or DOS attack, as real visitors are denied your service.

Don't just assume this is the culprit. You need to follow the following steps to diagnose the issue to confirm. When you are done you'll be one step closer to being an expert on fixing slow web servers!

Step 1: Check Running Processes
Check running processes with 'top' command in Unix and you should see something similar to the following:
top - 07:50:58 up 310 days, 16 min,  1 user,  load average: 0.00, 0.00, 0.00
Tasks:  68 total,   1 running,  67 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.6%us,  0.1%sy,  0.0%ni, 98.3%id,  0.2%wa,  0.0%hi,  0.0%si,  0.8%st
Mem:    629976k total,   592408k used,    37568k free,    84700k buffers
Swap:        0k total,        0k used,        0k free,   280460k cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU  %MEM  TIME+   COMMAND
    1 root      20   0  2824 1028  580 S  99.0  0.2   0:15.54 php5.cgi
    2 root      20   0     0    0    0 S  0.0   0.0   0:00.03 kthreadd
    3 root      RT   0     0    0    0 S  0.0   0.0   0:00.00 migration/0
  ...
In my case it's php5.cgi command that takes almost 100% of CPU, meaning my server is crawling to a halt simply because my PHP engine is extremely busy handling the requests handed to it by my web server.

In your case it may be a different situation. If you cannot figure out what to do next let me know!

Step 2: Find Root Cause
Now that we've identified 'php5.cgi' as the offending command it leads me to want to check the access log of my web server to get an idea of how frequently it gets an HTTP request and handles it.

I use 'tail -f' command on my web server's access.log to find who is hitting my web server real time. The following is a portion of the log.

...
15.5.32.12 - - [20/Jan/2013:14:57:42 +0000] "GET /memory-footprint-of-a-function.html HTTP/1.1" 403 143 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
15.5.32.12 - - [20/Jan/2013:14:57:43 +0000] "GET /operator-overloading-function.html HTTP/1.1" 403 143 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
15.5.32.12 - - [20/Jan/2013:14:57:44 +0000] "GET /solving-eight-queens-puzzle-step-3.html HTTP/1.1" 403 143 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
15.5.32.12 - - [20/Jan/2013:14:57:45 +0000] "GET /string-assign-function.html HTTP/1.1" 403 143 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
15.5.32.12 - - [20/Jan/2013:14:57:46 +0000] "GET /string-rfind-function.html HTTP/1.1" 403 143 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
...

Can you see what's happening? Microsoft's search engine, Bing, was hitting my web server once every second. In fact it's possible that Bing was hitting my server more frequently than shown here because some of the requests might not have made their way to the access log.

Why is Bing doing that to me? I have no idea. Let's fix this DOS issue in the next step!

In this step we are basically checking to see what makes the resource consuming process consume so much resource (CPU time, memory, etc) abnormally. Again if you cannot figure out in your situation let me know!
Step 3: Fix the Problem
In my situation I'd like Bing to stop doing hitting my server by taking the following actions.

1. I go to Bing's webmaster tool to adjust crawl rate. Bing seems to honor my request almost immediately. Not sure if I got lucky or Bing is really that efficient.

2. I always return 403 HTTP error to a Bing search engine crawler. The following is Nginx's syntax. My Nginx configuration is at /usr/local/nginx/conf/nginx.conf.
# Block http user agent - bingbot 
if ($http_user_agent ~* (bingbot )) { 
  return 403;
}
1. Note the operator ~* makes it case insensitive as opposed to just a ~.
2. Operator ~ means Nginx performs a regular expression match. So "$http_user_agent ~* (bingbot)" means as long as the user agent contains the text 'bingbot' this is a match.

I restarted my web server. I checked the access log again and confirmed that my web server is returning 403 error immediately in response to requests from Bing search engine spider. I go to my men's fashion website from a browser and see that I can see the web content successfully!

If you have any questions let me know and I will do my best to help you!
Please leave a comment here!
One Minute Information - by Michael Wen
ADVERTISING WITH US - Direct your advertising requests to Michael