The one line script that shows you how your web server is doing

Displaying httpd log file entries per minute

$ sed -n 's/.*20[0-9][0-9]:\(.*\):.. +0.00.*/\1/p' "$@" | uniq -c

Let’s say you suspect one of your web servers is misbehaving.  Users are complaining that some pages are not loading. You do a tail -f on the access log file and entries are whizzing up the screen, so it certainly seems to be doing something right.
However, your eye can’t tell the difference between 500 log entries per minute and 5000 log entries per minute.
Wouldn’t it be nice if you could see how many log file entries are being generated per minute?
With this short script – no problem!
To create a script that contains the above command and place it in your bin directory, do the following:

$ mkdir -p ~/bin
$ cat > ~/bin/apache.byminute
#!/bin/bash
sed -n 's/.*20[0-9][0-9]:\(.*\):.. +0.00.*/\1/p' "$@" | uniq -c
<CTRL-D>
$ chmod +x ~/bin/apache.byminute
$ export PATH=$PATH:~/bin

You have now created a bash shell script file called apache.byminute and made it available as a command on your command line.
The export PATH line can be put in your ~/.bash_profile if it’s not already there, then it will be available to you next time you log in.
You can call this script directly on an apache log, e.g.

$ apache.byminute /var/log/httpd/access_log

This will give you information about all entries for every kind of invocation of the web server. I.e. it will count accesses to every php, js, css, html, jpeg, png file.
You can get more specific and useful information when you combine the command with another grep to get only the patterns that are of interest.

$ grep 'GET /wp-admin/admin.php' /var/log/httpd/apache_access.log | apache.byminute

What’s going on here?
The grep command is looking for the pattern /wp-admin/admin.php in the file apache_access.log. Only lines containing that pattern will be in the output of grep.   Lines recording the GETting of jpegs, css, js, etc will not be counted.
The sed command is looking for the pattern

.*20[0-9][0-9]:\(.*\):.. +0.00.*

That means it’s looking for the following sequence of patterns:
1) any bunch of characters “.*
2) 20 followed by two digits “20[0-9][0-9]
3) a colon “:”
4a) any bunch of characters followed by “\(.*\)
4b) With the “\(.*\)“, the backslash-escaped parentheses mark a part of the pattern for later use in the second part of the sed expression. The substitution pattern “\1” is then used to output only the part of the string matching the pattern within the backlash-escaped parentheses
5) a colon followed by two arbitrary characters followed by a space “:..
6) a plus, a zero, an arbitrary character, two zeroes “+0.00
7) and another bunch of arbitrary characters “.*
What it spits out looks like the following:


20:57
20:57
20:57
20:57
...
20:57
20:58
20:58
20:58
...
20:58
20:59
20:59
20:59
20:59
21:00
21:00
...

This is the hour and minute of each log entry, separated by a colon.
This is piped through uniq.  But not just any old uniq.  This uniq has the option -c, which means “display each duplicated line only once with a count of duplicate lines before the line”
So uniq -c outputs:


317 20:57
221 20:58
4 20:59
244 21:00

This is telling us that /wp-admin/admin.php was fetched 317 times in the minute beginning at 20:57, 221 times at 20:58, only 4 times at 20:59 and 244 times at 21:00.
If you run this command over a few days’ log files you get a feel for how much traffic you normally get per minute.  Any periods of unusually low (or high) traffic are easily spotted.  Gaps, i.e. minutes with no traffic, can also be spotted easily.  If you normally see counts in the hundreds per minute, and suddenly you’re seeing tens per minute, you know something is amiss. If some times are being output in the wrong order, you know that some entries are taking too long to process.
You can easily adjust sed‘s regular expression to display just hours, or hours, minutes and seconds, or even hours, minutes and tens of seconds; the ideal pattern depends on the busy-ness of your server.
Because you are armed with grep, you can issue commands to get any patterns that are of interest to you.  You have massive flexibility to search for different things in a very short space of time.  You can see what’s happening in real time if you use tail -f  to get the output of the log file as it appears and pipe it through the command:

$ tail -f access_log | apache.byminute

As you can see, on the Unix command line you can create extremely powerful ad-hoc tools with just a line or two of code.  This kind of code is applicable to many different situations.
For analysing timestamped log entries, get used to the pattern of
- looking for interesting entries with grep
- using sed with an appropriate regular expression to cut out the pattern that is of immediate interest
- using uniq -c to count the occurrences of those patterns

Popcorn, Nachos und Code!
Hamburg 29. und 30. September 2016 – code.talks
code.talks

Post

Posted by Thursday August 18th, 2016

Leave a Reply

No Comments

appMotivate offers:

  • Consulting
  • Specification
  • Project Management
  • Programming

Last blog article