A couple years back I decided to start analyzing the SPAM I was getting to see what I could figure out about the systems it was coming from:
http://voidmain.is-a-geek.net/spam/
I think I did this for a few months straight and in 99% of the cases the SPAM came from Windows machines. Surely an infected zombie that the owner had no clue they were sending out SPAM.
On a somewhat related note we caught a machine on our network at work that was infected with a mass emailer trojan and sending out SPAM. I guess one machine out of around 50,000 isn't bad.
I have been running sendmail+spamassassin+spamass-milter here at home and it has caught nearly 100% of the SPAM that's been coming in. I think around once a month a SPAM message has gotten past it. Only one time has it falsely marked a message as SPAM. I currently have all the SPAM diverted from all mail coming to all address I manage at home in to a separate spam account.
I would like to start an automated interrogation of the addresses that the spam is coming from and insert the resulting data into either a log file or database. Writing the interrogation process and logging to a file or database is simple but I'm having trouble deciding the best way to hook into the sendmail chain to kick off the interrogation. Initially I thought that hacking the spamass-milter would probably be the easiest thing to do. I thought I could create a fifo and in the spamass-milter code where it determines SPAM I could output the IP address the message was received from in the fifo file. I would then have a completely separate perl script reading the fifo and acting on the addresses that come in and logging the results of the nmap, nmblookup, etc.
So I "apt-get source spamass-milter" and started looking at the source and I think I found the place to insert the code to write to the fifo but I'm not quite sure how to get the address. I didn't spend a lot of time on it. I thought another way might be to just have a process that watches the /var/log/maillog file but that isn't the easiest of tasks either as you would have to relate several lines of log to get all of the needed information for a SPAM message.
Just wondering if anyone might have an easy idea other than what I've been able to come up with.
Thanks!


