Apprentice of Artificial Intelligence: Learn to filter spam

Sunday, October 19, 2008

Learn to filter spam

Almost everyday we receive lots of spam emails. Training classifiers to filter spam is one of the most successful real-life applications of machine learning. But spammers are getting smart to counteract spam filters! One trick they use is good word attack. By inserting a set of words that often appear in legitimate messages but not in spams, they make spams look legitimate and confuse the filters (increasing false negative). Even worse, when end users label such emails as spams, adaptive spam filters will learn from these new training samples, and may associate those good words with spam (increasing false positive). These days we do sometimes receive such emails, don't we?

So this paper, "A Multiple Instance Learning Strategy for Combating Good Word Attacks on Spam Filters" (JMLR Jun 08), proposed to use multi-instance learning for spam filters. Multi-instance learning learns from and makes predictions on bags of instances, instead of individual instances. If at least one of the instances in a bag is positive, then the bag is positive; otherwise, the bag is negative. Let a bag be an email, and an instance be a part of the email. You see this is a perfect match for good word attack. Now I'm wondering what spammers will do next....

4 comments:

Unknown said...: Hi,guy! In my opinin, you might misuse the "false positive" and "false negtive" just the reverse.
I apperciate these interesting posts as well as your motivation, wish more discussion with you in future.; October 23, 2008 at 12:28 AM
TU said...: Thank you :)
For the "false positive/negative" issue, here the classifier is trained to identify spam, so "positive" means "classified as spam". This is also the way used by that paper.; October 23, 2008 at 7:24 PM
Unknown said...: Thanks for your illumination,I have misunderstood it.; October 24, 2008 at 4:56 AM
thinktank said...: Hello
This is asraful.
I am new in AI.
And i am interested about "Complex Mapping in Ontology Alignment"
So far i found ur research blog helpful for me.; August 7, 2010 at 11:52 PM

Apprentice of Artificial Intelligence

Sunday, October 19, 2008

Learn to filter spam

4 comments:

Labels

Blog Archive

Guestbook

About Me