Book details of 'Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification'

| Title | Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification |
| Author(s) | Jonathan Zdziarski |
| ISBN | 1593270526 |
| Language | English |
| Published | July 2005 |
| Publisher | No Starch Press |
Back to shelf Computer security
Amazon.com info for Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification
The Virtual Bookcase Reviews of 'Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification':
Reviewer Rob Slade wrote:
The preface states that the book is for those seriously interested in
spam identification technologies, and concentrates on Bayesian and
related statistical filtering
Part one is an introduction to spam filtering. Chapter one reviews
the history of spam, although many of the early entries are simply
annoyances or chain letters rather than the commercial or fraudulent
items considered under the banner today, and the author does not seem
to realize that 419 scams predated email by a considerable margin. A
look at the development of spam filtering (excluding Bayesian) is
presented in chapter two, along with some non-filtering. Bayesian
analysis is explained in chapter three, and the statistical filtering
basis is outlined in chapter four.
The fundamental actuarial core is expanded in part two. Chapter five
covers message coding. Tokenization, chunking characters into
identifiable items, is examined in chapter six. Tricks spammers use
to avoid filters, and the solutions to avoid falling for them, are
outlined in chapter seven. Storage and performance issues raised by
the data rules required by statistical filters are addressed in
chapter eight. Chapter nine looks at aspects of scaling to systems
supporting large numbers of users.
Part three deals with advanced concepts in statistical filtering.
Chapter ten delves into testing which, because of the individual and
adaptive nature of Bayesian filtering, presents unique challenges.
Tokenization is revisited in chapter eleven, in more advanced forms.
Markovian discrimination, with it's examination of stateful entities,
is explained in chapter twelve. Having noted many kinds of features
in the book, chapter thirteen explores ways to reduce the items used
(and data required) while maintaining accuracy. Collaborative rule-
building with other users, groups, or systems is reviewed in chapter
fourteen.
As the preface implies, this is *not* a book for users who just want
to install POPFile (although that and other programs are explored in
an appendix). For those who are seriously involved in managing and
developing spam filtering, however, the book does provide very useful
advice, pointers, and research.
copyright Robert M. Slade, 2005
Add my review for Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification