Conditional probability is the likelihood of an outcome occurring, based on a previous outcome occurring. The tool to train SpamAssassin is sa-learn. In default usage, it will take a directory of spam or ham emails and add their tokens to the database.
A token is a sequence of words or short characters that are commonly found in spam or ham. You can either manually run sa-learn or preferably add it to a cron job to routinely update the database.
The utility sa-learn will ignore emails that have already been processed to prevent adding extra weight to certain tokens. The below commands will learn spam and ham respectively from a folder containing emails. This is the common method for use with the Maildir format. Read about additional options on the man page. This is helpful to quickly update the Bayes database for many users.
You may also use curly braces to identify one of many possible folder names in the patch as well. There are a few sources of spam and ham emails online available to download. There are also available SpamAssassin backups that you can restore to get started. Using public spam data is helpful to get started but may not be specific to your use case.
This is my preferred data source for training as it has an initial database you can restore. Every day the service archives newly received spam for you to use to train with.
An archive of spam received since Our Support Engineers are well-versed with the different aspects of Server Management and more. Never again lose customers to poor server speed! Let us help you. Your email address will not be published. Submit Comment. Or click here to learn more. When you visit any website, it may store or retrieve information on your browser, mostly in the form of cookies.
This information might be about you, your preferences or your device and is mostly used to make the site work as you expect it to. The information does not usually directly identify you, but it can give you a more personalized web experience.
Because we respect your right to privacy, you can choose not to allow some types of cookies. Click on the different category headings to find out more and change our default settings. However, blocking some types of cookies may impact your experience of the site and the services we are able to offer. Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website.
The website cannot function properly without these cookies. Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously. Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.
IDE - Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user. These cookies are used to collect website statistics and track conversion rates.
The ID is used for serving ads that are most relevant to the user. DV - Google ad personalisation. If specified, sa-learn will learn to the journal file. Note: --sync and --no-sync can be specified on the same commandline, which is slightly confusing. In this case, the --no-sync option is ignored since there is no learn operation.
Note that this is currently ignored, as current versions of SpamAssassin will not perform network access while learning; but future versions may. Can also be used with the --dbpath path option to specify the location of the Bayes files to use. There are now multiple backend storage modules available for storing user's bayesian data.
As such you might want to migrate from one backend to another. Here is a simple procedure for migrating from one backend to another.
Note that if you have individual user databases you will have to perform a similar procedure for each one of them. Otherwise, you must run sa-learn as the user who database you are restoring. It's reasonably readable, even if statistics make me break out in hives.
SpamAssassin 2. This is a new feature, quite powerful, and is disabled until enough messages have been learnt. Tell that to your users! Let SpamAssassin proceed, learning stuff. This is handy for binding to a key in your mail user agent. It's very fast, as all the time-consuming stuff is deferred until you run with the --sync option. Learning filters require training to be effective. If you don't train them, they won't work.
In addition, you need to train them with new messages regularly to keep them up-to-date, or their data will become stale and impact accuracy. You need to train with both spam and ham mails.
One type of mail alone will not have any effect. Note that if your mail folders contain things like forwarded spam, discussions of spam-catching rules, etc.
0コメント