Page 1 of 1

Webalizer?

PostPosted: Tue Dec 23, 2003 7:34 pm
by ZiaTioN
Hey Void. I have looked into the stats program you use here on your site and was wondering what command line options you use. There is like a thousand of them and was just curious as to what you used (or what was really needed).

Also do you run it as a cron job to update every hour or something?

PostPosted: Tue Dec 23, 2003 7:54 pm
by Void Main
Yeah, I run a script from cron every hour:

Code: Select all
0 * * * * /home/voidmain/webalizer/runwebalizer > /home/voidmain/webalizer/runwebalizer.log 2>&1


The cron job runs under the "voidmain" user and as you can see I have a subdirectory called "webalizer" under voidmain's home directory which contain a "webalizer.conf" file and where all the history and work files are stored/created. Here is my runwebalizer script and my webalizer.conf file. You'll see where I have customized it. The only thing I removed from the conf file before placing it under the link above are several "HideSite" and "HideReferrer" statements consisiting of porn sites that have made their way into my "referral" and "site" list (purposefully placed there by dumb asses). I removed them so as not to advertise for them on google.

Of course if you run the job as a user other than root (generally recommended for just about everything) you'll need to make sure that user has read access to the web server log(s), and of course write access to the target HTML directory (/var/www/voidmain/stats in my case).

PostPosted: Tue Dec 23, 2003 8:09 pm
by ZiaTioN
AHH.. I was reading through your config file and saw that you have "robot.txt" hidden. I myself have seen this numerous times in my error logs for my site and had not looked into it yet. Any ideas on what type of exploit these kiddies are looking for?

Also how would someone deliberately work a URL into your referrer pages?

*Edited
Well i just looked it up and it appears it is just google looking for an exempt file in my webroot.

PostPosted: Tue Dec 23, 2003 8:16 pm
by Void Main
ZiaTioN wrote:AHH.. I was reading through your config file and saw that you have "robot.txt" hidden. I myself have seen this numerous times in my error logs for my site and had not looked into it yet. Any ideas on what type of exploit these kiddies are looking for?


I'm not worried about any type of exploit. Good web crawlers (like google) look at this file and use it for it's intended purpose. Just no sense in having it show up on the list. Here is my robots.txt BTW. Sure people with bad intensions will look at this file and see what you don't want indexed and browse around looking for goodies. If you really have goodies that need to be hidden then you should have them protected with encryption and passwords. Any other way is "security by obscurity" which is a Microsoft security practice. :)

Also how would someone deliberately work a URL into your referrer pages?


In the case of the porn refferer spammer they have coded a utility to do it automatically, actually there is probably source for one out there somewhere. Typically I would see around 20-30 hits in a few second time period from a specific IP address and the "refferer" in the log entry would be a porn address. This really is very trivial to do. I wouldn't be surprised if "wget" has a parameter that you can pass so it will throw any referrer to the server you want. I know it can send any user agent you want via a command line parameter. I'm sure they have some sort of automated crawler that searches through google for webalizer pages and then automatically hit them 20-30 times.

They really CeNsOrEd me off because I also run other sites that I *used* to be able to display these stats on. I run the web sites for a couple of clubs I am in that are more family oriented with very young members. It was sort of embarrasing to see porn sites in the referrers list. I feel like grabbing these ass holes and beating the living oops out of them.

PostPosted: Tue Dec 23, 2003 9:18 pm
by ZiaTioN
Yes that would suck. Spammers should all die. :)

Anyway I have done a half-ass job of setting up this webalizer and when i try to run it I get the following error

[ziation@thegnuage webalizer]$ ./webalizer
No valid records found!


Any ideas what this is referring to? I assume it is telling me I have no logs of hits and so forth yet but how would the script run to catch these hits if you ahve to already have hits. LOL.. I know this cannot be what it is saying. Just wondering if you have ever seen this.

PostPosted: Tue Dec 23, 2003 9:42 pm
by Void Main
It reads your Apache log and generates the statistics from that. If you do not have a log with log entries in a valid format I assume you will get this error.

In my httpd.conf my LogFormat entries look like this:

Code: Select all
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %b" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent


And a sample of 4 log entries look like this:

Code: Select all
xxx.xxx.xxx.40 - - [23/Dec/2003:21:36:43 -0600] "GET /i/synaptic_t.png HTTP/1.1" 304 0 "http://voidmain.kicks-ass.net/redhat/redhat_9_apt-get_must_have.html" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; AIRF)"
xxx.xxx.xxx.160 - - [23/Dec/2003:21:40:03 -0600] "GET /redhat/winex2.html HTTP/1.1" 200 15181 "http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=free+winex+download+2.3&btnG=Google+Search" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031007"
64.68.82.204 - - [23/Dec/2003:21:40:26 -0600] "GET /robots.txt HTTP/1.0" 200 735 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
64.68.82.204 - - [23/Dec/2003:21:40:26 -0600] "GET / HTTP/1.0" 200 6267 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"


The 1st/2nd IP I obscured. The 3rd and 4th records are google crawls obviously.