|
|||
This page is a collection of one-liners I have found useful. These short command line tools have been tested on Solaris tcsh and bash shells, but many of them can also be used on Windows.
Here it is assumed that the web server log files (access_log*) are in Combined Format.
Less can easily handle huge files.
% less -S access_log
% less access_log
To save space log files are usually compressed.
% gzip -dc access_log.gz | less
% bzip2 -dc access_log.bz2 | less
% wc -l access_log 33894
% gzip -dc access_log.gz | wc -l 33894
% egrep '(\.gif |\.jpg |\.png )' access_log 2569
% gzip -dc access_log.gz | egrep -vc '(\.gif |\.jpg |\.png )' 2569
% grep -c `date '+%d/%b/%Y'` access_log 2569
% grep `date '+%d/%b/%Y'` access_log | cut -d" " -f1 | sort -u | wc -l 1196
% grep -c 01/Jan/2001 access_log 2569
% gzip -dc access_log.gz | grep -c 01/Jan/2001 2569
% head -1 access_log; tail -1 access_log foo.example - - [30/Dec/2000:23:55:25 +0200] "GET /~ktmatu/ ... bar.example - - [06/Jan/2001:23:53:37 +0200] "GET /~ktmatu/rates.html ...
% gzip -dc access_log.gz | head -1 ; gzip -dc access_log.gz | tail -1 foo.example - - [30/Dec/2000:23:55:25 +0200] "GET /~ktmatu/ ... bar.example - - [06/Jan/2001:23:53:37 +0200] "GET /~ktmatu/rates.html ...
% cut -d" " -f4 access_log | cut -d"/" -f1 | uniq [30 [31 [01 [03 [04 [05 [06
% gzip -dc wlog0101.gz | cut -d" " -f4 | cut -d"/" -f1 | uniq [30 [31 [01 [03 [04 [05 [06
This is just a very quick and dirty way to check the log.
% perl -ane 'print $_ if (scalar (split /\"/)) != 7' access_log | wc -l 7
% gzip -dc access_log.gz | perl -ane 'print $_ if (scalar (split /\"/)) != 7' | wc -l 7
% grep -n '.*' access_log | grep '^15927\:' 15927:foo.example.com - - [20/Jan/2002:11:23:45 +0200] "GET ...
% grep -n '.*' access_log | grep '^1592.\:' 15920:foo.example.com - - [20/Jan/2002:11:23:40 +0200] "GET ... 15921:foo.example.com - - [20/Jan/2002:11:23:41 +0200] "GET ... 15922:foo.example.com - - [20/Jan/2002:11:23:41 +0200] "GET ... ...
% gzip -dc access_log.gz | grep -n '.*' | grep '^15927\:' 15927:foo.example.com - - [20/Jan/2002:11:23:45 +0200] "GET ...
% gzip -dc access_log.gz | grep -n '.*' | grep '^1592.\:' 15920:foo.example.com - - [20/Jan/2002:11:23:40 +0200] "GET ... 15921:foo.example.com - - [20/Jan/2002:11:23:41 +0200] "GET ... 15922:foo.example.com - - [20/Jan/2002:11:23:41 +0200] "GET ... ...
% grep `date '+%d/%b/%Y'` access_log | awk '{ s += $10 } END {print s}' 13113756
% grep `date '+../%b/%Y'` access_log | awk '{ s += $10 } END {print s}' 569477018
% grep googlebot access_log | awk '{ s += $10 } END {print s}' 29832233
% grep ^169.254.22.12 access_log | awk '{ s += $10 } END {print s}' 46760880
Partial content requests are usually generated by download managers to speed the downloading of big files and Adobe Acrobat Reader to fetch PDF documents page by page. In this example 206 requests generated by Acrobat reader are deleted so that they don't infate the hit count.
% grep -v '\.pdf .* 206 ' access_log > new_log
% grep ' \[../May/2002\:' access_log | gzip -9c > access_log-2002-05.gz
% grep ' \[../May/2002\:' access_log | bzip2 > access_log-2002-05.bz2
% tail -f access_log
% less access_log
Home | Software | Information | Etsin | Chinese | Christmas Calendars | Site Info |