<html>
<body>
At 04:40 PM 11/6/2003, Vernon Schryver wrote:<br>
<blockquote type=cite class=cite cite> - keyword and other scoring
filters including so called "Bayesian"<br>
systems<br>
Except for some individuals and for them
only some of the time,<br>
these have non-trivial false positive
rates.</blockquote><br>
I use an excellent open-source Bayesian filter, called POPFile (see
sourceforge). It's long-term accuracy for classifying
messages into 25 personally invented buckets (including e2e messages) is
displayed as follows:
<dl>
<dd><h2><b>Classification Accuracy</b></h2>
<dd>Messages classified: 75,001
<dd>Classification errors: 301<hr>
<dd>Accuracy: 99.59%<br>
<dd>Bucket Classification Count False Positives False
Negatives
<dd>...
<dd><font color="#FF0000">spam</font>
<x-tab> </x-tab>51,953 (69.26%)
<x-tab> </x-tab>205
<x-tab> </x-tab><x-tab> </x-tab>81
<dd>...
</dl>It takes me about 15 seconds to scan a folder of 100 messages that
are classed as spam to detect these false positives, and the false
negatives are of course less of a problem. A 0.2% false
positive rate is quite reasonable. Note that I have deliberately
resisted using POPFile's whitelist capability - I ONLY use the Bayesian
learning filter.<br><br>
The advantage, of course, is that what I consider to be spam is a purely
personal decision, which is Joe Touch's point - it's a very bad idea to
impose a notion like "solicitation" as a criterion for
rejecting stuff. Email is by definition unsolicited, in
almost all instances. The Nobel Prize phone call is equally
unsolicited. Perhaps you don't want to get it, but I'd prefer
to have the choice to be given my Nobel, thank you.<br><br>
</body>
</html>