A Survey of Content-based Anti-spam Email Filtering
WANG Bin,PAN Wen-feng (Institute of Computing Technology, Chinese Academy of Sciences,Beiji ng 100080,China)
The volume of junk emails on the Internet has grown tremendously in th e past few years and is causing serious problems. Content-based filtering is on e of the mainstream technologies used so far. This paper aims to provide an overv iew on the state of art in this research field, including benchmark corpora, eva luation methods and filtering approaches. Many filtering approaches, including R ipper, Decision Trees, Rough Sets, Rocchio, Boosting, Bayes, kNN, SVM and Winnow , are discussed and compared in this paper. The experimental results show that s ome approaches, such as Boosting, Flexible Bayes, SVM, Winnow, can achieve very good results on research corpora. However, much more work should be done for pra ctical use.