Spam Detection Using Web Page Content: A New Battleground

TitleSpam Detection Using Web Page Content: A New Battleground
Publication TypeConference Paper
Year of Publication2011
AuthorsRibeiro MTúlio, Guerra PHCalais, Vilela L, Veloso A, Guedes D, Meira, Jr. W, Chaves MHPC, Steding-Jessen K, Hoepers C
Other Numbers3187
Abstract

Traditional content-based e-mail spam filtering takes into ac-count content of e-mail messages and apply machine learningtechniques to infer patterns that discriminate spams fromhams. In particular, the use of content-based spam filteringunleashed an unending arms race between spammers and fil-ter developers, given the spammers' ability to continuouslychange spam message content in ways that might circumventthe current filters. In this paper, we propose to expand thehorizons of content-based filters by taking into considerationthe content of the Web pages linked by e-mail messages.We describe a methodology for extracting pages linkedby URLs in spam messages and we characterize the rela-tionship between those pages and the messages. We thenuse a machine learning technique (a lazy associative classi-fier) to extract classification rules from the web pages thatare relevant to spam detection. We demonstrate that theuse of information from linked pages can nicely complementcurrent spam classification techniques, as portrayed by Spa-mAssassin. Our study shows that the pages linked by spamsare a very promising battleground.

Acknowledgment

This paper was partially funded by the Brazilian Agency for Industrial Development and Movimento Brasil Competivo (MBC) through a visiting scholar fellowship.

Copyright ACM 2011. This is the author’s version of the work. It is posted here by permission of the ACM for your personal use. Not for redistribution. ACM 978-1-4503-0788-8/11/09

URLhttp://www.icsi.berkeley.edu/pubs/networking/spamdetectionwebpagecontent11.pdf
Bibliographic Notes

Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS 2011), Perth, Australia.

Abbreviated Authors

M. Ribeiro, P. Guerra, L. Vilela, A. Veloso, D. Guedes, W. Meira Jr., M. Chaves, K. Steding-Jessen, and C. Hoepers

ICSI Research Group

Networking and Security

ICSI Publication Type

Article in conference proceedings