l i n u x - u s e r s - g r o u p - o f - d a v i s
L U G O D
 
Next Meeting:
January 6: Social gathering
Next Installfest:
TBD
Latest News:
Nov. 18: Club officer elections
Page last updated:
2003 Oct 14 18:41

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
[vox-tech] bogofilter question
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[vox-tech] bogofilter question



dear fellow bogofilter users,

i'm in the process of training bogofilter.  i've decided to not do it
automatically, with the -u(pdate) option, since the docs warn not to.
instead, everytime an email comes in, i've aliased ^n to pass it to
bogofilter as nospam and ^s to pass it to bogofilter as spam.  so far,
i've got about 600 spams and 600 "hams" (as the bogofilter docs call
it):

p@gabriel$ bogoutil -w ~/.bogofilter .MSG_COUNT
                       spam   good
.MSG_COUNT              559    575

side note: i was hardly surprised that half my incoming email is spam.
according to the docs, i'm lucky since you want these numbers to be
roughly the same.


now at some point, i'm going to have to edit .procmail to start sending
email to /dev/null if it sees:

   * ^X-Bogosity:.*Yes

which means that i'll only be training bogofilter on false negatives
(and unfortunately, false positives will be gone, but the whole point is
to not see spam anymore, not just to save them in some file that i have
sort through periodically).

the bogofilter docs recommend that i should do this at about 10,000
emails.  a bogofilter website (one of the developers) said this number
should be more like 20,000.

that's absurd.  i've only seen a false positive once or twice, when
i first started to use bogofilter.  false negatives are rare.  maybe one
or two a week.

are there any more experienced bogofilter people out there who thought
about this issue?  if so, what was your conclusion?  i can't see doing
this for much past 1000 emails in each ham/spam bin.


lastly, the docs recommend not to share databases with other people
because the whole point is to tailor bogofilter for the type of spam and
ham that arrives in YOUR inbox.  not other people's inboxes.  otherwise,
you might as well use a lexical analyzer like spamcop.  are there any
experienced bogofilter users here that have thought about this issue?  i
suspect the docs may overstate this claim.  we all get offered XXX
videos, penis enlargements and international bank transfers.  but then
again, i'm still vaguely a bogofilter newbie, so i'd like some guidance
if anybody has actually thought about this issue.

thanks,
pete

-- 
GPG Instructions: http://www.dirac.org/linux/gpg
GPG Fingerprint: B9F1 6CF3 47C4 7CD8 D33E 70A9 A3B9 1945 67EA 951D
_______________________________________________
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech



LinkedIn
LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
facebook
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Appahost Applications
For a significant contribution towards our projector, and a generous donation to allow us to continue meeting at the Davis Library.