<html>

<body>

At 04:40 PM 11/6/2003, Vernon Schryver wrote:<br>

<blockquote type=cite class=cite cite>&nbsp; - keyword and other scoring

filters including so called &quot;Bayesian&quot;<br>

&nbsp;&nbsp; systems<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Except for some individuals and for them

only some of the time,<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; these have non-trivial false positive

rates.</blockquote><br>

I use an excellent open-source Bayesian filter, called POPFile (see

sourceforge).&nbsp;&nbsp; It's long-term accuracy for classifying

messages into 25 personally invented buckets (including e2e messages) is

displayed as follows:

<dl>

<dd><h2><b>Classification Accuracy</b></h2>

<dd>Messages classified: 75,001

<dd>Classification errors: 301<hr>


<dd>Accuracy: 99.59%<br>


<dd>Bucket&nbsp;&nbsp; Classification Count False Positives False

Negatives

<dd>...

<dd><font color="#FF0000">spam</font>&nbsp;&nbsp;

<x-tab>&nbsp;</x-tab>51,953 (69.26%)

<x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab>205

<x-tab>&nbsp;&nbsp;&nbsp;&nbsp;</x-tab><x-tab>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</x-tab>81

<dd>...

</dl>It takes me about 15 seconds to scan a folder of 100 messages that

are classed as spam to detect these false positives, and the false

negatives are of course less of a problem.&nbsp;&nbsp; A 0.2% false

positive rate is quite reasonable.&nbsp; Note that I have deliberately

resisted using POPFile's whitelist capability - I ONLY use the Bayesian

learning filter.<br><br>

The advantage, of course, is that what I consider to be spam is a purely

personal decision, which is Joe Touch's point - it's a very bad idea to

impose a notion like &quot;solicitation&quot; as a criterion for

rejecting stuff.&nbsp;&nbsp;&nbsp; Email is by definition unsolicited, in

almost all instances.&nbsp;&nbsp; The Nobel Prize phone call is equally

unsolicited.&nbsp;&nbsp; Perhaps you don't want to get it, but I'd prefer

to have the choice to be given my Nobel, thank you.<br><br>

&nbsp;</body>

</html>