Anti-spam techniques


Various []anti-spam techniques are used to prevent email spam.
No technique is a complete solution to the spam problem, and each has trade-offs between incorrectly rejecting legitimate email as opposed to not rejecting all spam – and the associated costs in time, effort, and cost of wrongfully obstructing good mail.
Anti-spam techniques can be broken into four broad categories: those that require actions by individuals, those that can be automated by email administrators, those that can be automated by email senders and those employed by researchers and law enforcement officials.

End-user techniques

There are a number of techniques that individuals use to restrict the availability of their email addresses, with the goal of reducing their chance of receiving spam.

Discretion

Sharing an email address only among a limited group of correspondents is one way to limit the chance that the address will be "harvested" and targeted by spam. Similarly, when forwarding messages to a number of recipients who don't know one another, recipient addresses can be put in the "bcc: field" so that each recipient does not get a list of the other recipients' email addresses.

Address munging

Email addresses posted on webpages, Usenet or chat rooms are vulnerable to e-mail address harvesting. Address munging is the practice of disguising an e-mail address to prevent it from being automatically collected in this way, but still allow a human reader to reconstruct the original: an email address such as, "no-one@example.com", might be written as "no-one at example dot com", for instance. A related technique is to display all or part of the email address as an image, or as jumbled text with the order of characters restored using CSS.

Avoid responding to spam

A common piece of advice is to not to reply to spam messages as spammers may simply regard responses as confirmation that an email address is valid. Similarly, many spam messages contain web links or addresses which the user is directed to follow to be removed from the spammer's mailing list – and these should be treated as dangerous. In any case, sender addresses are often forged in spam messages, so that responding to spam may result in failed deliveries – or may reach completely innocent third parties.

Contact forms

Businesses and individuals sometimes avoid publicising an email address by asking for contact to come via a "contact form" on a webpage – which then typically forwards the information via email. Such forms, however, are sometimes inconvenient to users, as they are not able to use their preferred email client, risk entering a faulty reply address, and are typically not notified about delivery problems. Further, contact forms have the drawback that they require a website with the appropriate technology.
In some cases contact forms also send the message to the email address given by the user. This allows the contact form to be used for sending spam, which may incur email deliverability problems from the site once the spam is reported and the sending IP is blacklisted.

Disable HTML in email

Many modern mail programs incorporate web browser functionality, such as the display of HTML, URLs, and images.
Avoiding or disabling this feature does not help avoid spam. It may, however, be useful to avoid some problems if a user opens a spam message: offensive images, being tracked by web bugs, being targeted by JavaScript or attacks upon security vulnerabilities in the HTML renderer. Mail clients which do not automatically download and display HTML, images or attachments have fewer risks, as do clients who have been configured to not display these by default.

Disposable email addresses

An email user may sometimes need to give an address to a site without complete assurance that the site owner will not use it for sending spam. One way to mitigate the risk is to provide a disposable email address — an address which the user can disable or abandon which forwards email to a real account. A number of services provide disposable address forwarding. Addresses can be manually disabled, can expire after a given time interval, or can expire after a certain number of messages have been forwarded.
Disposable email addresses can be used by users to track whether a site owner has disclosed an address, or had a security breach.

Ham passwords

Systems that use "ham passwords" ask unrecognised senders to include in their email a password that demonstrates that the email message is a "ham" message. Typically the email address and ham password would be described on a web page, and the ham password would be included in the subject line of an email message. Ham passwords are often combined with filtering systems which let through only those messages that have identified themselves as "ham".

Reporting spam

Tracking down a spammer's ISP and reporting the offense can lead to the spammer's service being terminated and criminal prosecution. Unfortunately, it can be difficult to track down the spammer, and while there are some online tools such as SpamCop and Network Abuse Clearinghouse to assist, they are not always accurate. Historically, reporting spam in this way has not played a large part in abating spam, since the spammers simply move their operation to another URL, ISP or network of IP addresses.
In many countries consumers may also forward unwanted and deceptive commercial email to the authorities, e.g. in the US to the email address maintained by the US Federal Trade Commission, or similar agencies in other countries.

Automated techniques for email administrators

There are now a large number of applications, appliances, services, and software systems that email administrators can use to reduce the load of spam on their systems and mailboxes. In general these attempt to reject, the majority of spam email outright at the SMTP connection stage. If they do accept a message, they will typically then analyze the content further – and may decide to "quarantine" any categorised as spam.

Authentication

A number of systems have been developed that allow domain name owners to identify email as authorized. Many of these systems use the DNS to list sites authorized to send email on their behalf. After many other proposals, SPF, DKIM and DMARC are all now widely supported with growing adoption. While not directly attacking spam, these systems make it much harder to spoof addresses, a common technique of spammers - but also used in phishing, and other types of fraud via email.

Challenge/response systems

A method which may be used by internet service providers, by specialized services or enterprises to combat spam is to require unknown senders to pass various tests before their messages are delivered. These strategies are termed "challenge/response systems".

Checksum-based filtering

Checksum-based filter exploits the fact that the messages are sent in bulk, that is that they will be identical with small variations. Checksum-based filters strip out everything that might vary between messages, reduce what remains to a checksum, and look that checksum up in a database such as the Distributed Checksum Clearinghouse which collects the checksums of messages that email recipients consider to be spam ; if the checksum is in the database, the message is likely to be spam. To avoid being detected in this way, spammers will sometimes insert unique invisible gibberish known as hashbusters into the middle of each of their messages, to make each message have a unique checksum.

Country-based filtering

Some email servers expect to never communicate with particular countries from which they receive a great deal of spam. Therefore, they use country-based filtering – a technique that blocks email from certain countries. This technique is based on country of origin determined by the sender's IP address rather than any trait of the sender.

DNS-based blacklists

There are large number of free and commercial DNS-based Blacklists, or DNSBLs which allow a mail server to quickly look up the IP of an incoming mail connection - and reject it if it is listed there. Administrators can choose from scores of DNSBLs, each of which reflects different policies: some list sites known to emit spam; others list open mail relays or proxies; others list ISPs known to support spam.

URL filtering

Most spam/phishing messages contain an URL that they entice victims into clicking on. Thus, a popular technique since the early 2000s consists of extracting URLs from messages and looking them up in databases such as Spamhaus' Domain Block List, SURBL, and URIBL.

Strict enforcement of RFC standards

Many spammers use poorly written software or are unable to comply with the standards because they do not have legitimate control of the computer they are using to send spam. By setting tighter limits on the deviation from RFC standards that the MTA will accept, a mail administrator can reduce spam significantly - but this also runs the risk of rejecting mail from older or poorly written or configured servers.
Greeting delay – A sending server is required to wait until it has received the SMTP greeting banner before it sends any data. A deliberate pause can be introduced by receiving servers to allow them to detect and deny any spam-sending applications that do not wait to receive this banner.
Temporary rejection – The greylisting technique is built on the fact that the SMTP protocol allows for temporary rejection of incoming messages. Greylisting temporarily rejects all messages from unknown senders or mail servers – using the standard 4xx error codes. All compliant MTAs will proceed to retry delivery later, but many spammers and spambots will not. The downside is that all legitimate messages from first-time senders will experience a delay in delivery.
HELO/EHLO checking – says that an SMTP server "MAY verify that the domain name argument in the EHLO command actually corresponds to the IP address of the client. However, if the verification fails, the server MUST NOT refuse to accept a message on that basis." Systems can, however, be configured to
Invalid pipelining – Several SMTP commands are allowed to be placed in one network packet and "pipelined". For example, if an email is sent with a CC: header, several SMTP "RCPT TO" commands might be placed in a single packet instead of one packet per "RCPT TO" command. The SMTP protocol, however, requires that errors be checked and everything is synchronized at certain points. Many spammers will send everything in a single packet since they do not care about errors and it is more efficient. Some MTAs will detect this invalid pipelining and reject email sent this way.
Nolisting – The email servers for any given domain are specified in a prioritized list, via the MX records. The nolisting technique is simply the adding of an MX record pointing to a non-existent server as the "primary" – which means that an initial mail contact will always fail. Many spam sources do not retry on failure, so the spammer will move on to the next victim; legitimate email servers should retry the next higher numbered MX, and normal email will be delivered with only a brief delay.
Quit detection – An SMTP connection should always be closed with a QUIT command. Many spammers skip this step because their spam has already been sent and taking the time to properly close the connection takes time and bandwidth. Some MTAs are capable of detecting whether or not the connection is closed correctly and use this as a measure of how trustworthy the other system is.

Honeypots

Another approach is simply creating an imitation MTA that gives the appearance of being an open mail relay, or an imitation TCP/IP proxy server that gives the appearance of being an open proxy. Spammers who probe systems for open relays and proxies will find such a host and attempt to send mail through it, wasting their time and resources, and potentially, revealing information about themselves and the origin of the spam they are sending to the entity that operates the honeypot. Such a system may simply discard the spam attempts, submit them to DNSBLs, or store them for analysis by the entity operating the honeypot that may enable identification of the spammer for blocking.

Hybrid filtering

, Policyd-weight and others use some or all of the various tests for spam, and assigns a numerical score to each test. Each message is scanned for these patterns, and the applicable scores tallied up. If the total is above a fixed value, the message is rejected or flagged as spam. By ensuring that no single spam test by itself can flag a message as spam, the false positive rate can be greatly reduced.

Outbound spam protection

Outbound spam protection involves scanning email traffic as it exits a network, identifying spam messages and then taking an action such as blocking the message or shutting off the source of the traffic. While the primary impact of spam is on spam recipients, sending networks also experience financial costs, such as wasted bandwidth, and the risk of having their IP addresses blocked by receiving networks.
Outbound spam protection not only stops spam, but also lets system administrators track down spam sources on their network and remediate them – for example, clearing malware from machines which have become infected with a virus or are participating in a botnet.

PTR/reverse DNS checks

The PTR DNS records in the reverse DNS can be used for a number of things, including:
Content filtering techniques rely on the specification of lists of words or regular expressions disallowed in mail messages. Thus, if a site receives spam advertising "herbal Viagra", the administrator might place this phrase in the filter configuration. The mail server would then reject any message containing the phrase.
Header filtering looks at the header of the email which contains information about the origin, destination and content of the message. Although spammers will often spoof fields in the header in order to hide their identity, or to try to make the email look more legitimate than it is many of these spoofing methods can be detected, and any violation of the standard on how the header is to be formed can also serve as a basis for rejecting the message.

SMTP callback verification

Since a large percentage of spam has forged and invalid sender addresses, some spam can be detected by checking that this "from" address is valid. A mail server can try to verify the sender address by making an SMTP connection back to the mail exchanger for the address, as if it was creating a bounce, but stopping just before any email is sent.
Callback verification has various drawbacks: Since nearly all spam has forged return addresses, nearly all callbacks are to innocent third party mail servers that are unrelated to the spam; When the spammer uses a trap address as his sender's address. If the receiving MTA tries to make the callback using the trap address in a MAIL FROM command, the receiving MTA's IP address will be blacklisted; Finally, the standard VRFY and EXPN commands used to verify an address have been so exploited by spammers that few mail administrators enable them, leaving the receiving SMTP server no effective way to validate the sender's email address.

SMTP proxy

SMTP proxies allow combating spam in real time, combining sender's behavior controls, providing legitimate users immediate feedback, eliminating a need for quarantine.

Spamtrapping

Spamtrapping is the seeding of an email address so that spammers can find it, but normal users can not. If the email address is used then the sender must be a spammer and they are black listed.
As an example, if the email address "spamtrap@example.org" is placed in the source HTML of a web site in a way that it isn't displayed on the web page, human visitors to the website would not see it. Spammers, on the other hand, use web page scrapers and bots to harvest email addresses from HTML source code - so they would find this address. When the spammer later sends to the address the spamtrap knows this is highly likely to be a spammer and can take appropriate action.

Statistical content filtering

Statistical, or Bayesian, filtering once set up requires no administrative maintenance per se: instead, users mark messages as spam or nonspam and the filtering software learns from these judgements. Thus, it is matched to the end user's needs, and as long as users consistently mark/tag the emails, can respond quickly to changes in spam content. Statistical filters typically also look at message headers, considering not just the content but also peculiarities of the transport mechanism of the email.
Software programs that implement statistical filtering include Bogofilter, DSPAM, SpamBayes, ASSP, CRM114, the email programs Mozilla and Mozilla Thunderbird, Mailwasher, and later revisions of SpamAssassin.

Tarpits

A tarpit is any server software which intentionally responds extremely slowly to client commands. By running a tarpit which treats acceptable mail normally and known spam slowly or which appears to be an open mail relay, a site can slow down the rate at which spammers can inject messages into the mail facility. Depending on the server and internet speed, a tarpit can slow an attack by a factor of around 500. Many systems will simply disconnect if the server doesn't respond quickly, which will eliminate the spam. However, a few legitimate email systems will also not deal correctly with these delays. The fundamental idea is to slow the attack so that the perpetrator has to waste time without any significant success.
An organization can successfully deploy a tarpit if it is able to define the range of addresses, protocols, and ports for deception. The process involves a router passing the supported traffic to the appropriate server while those sent by other contacts are sent to the tarpit. Examples of tarpits include the Labrea tarpit, Honeyd, SMTP tarpits, and IP-level tarpits.

Automated techniques for email senders

There are a variety of techniques that email senders use to try to make sure that they do not send spam. Failure to control the amount of spam sent, as judged by email receivers, can often cause even legitimate email to be blocked and for the sender to be put on DNSBLs.

Background checks on new users and customers

Since spammer's accounts are frequently disabled due to violations of abuse policies, they are constantly trying to create new accounts. Due to the damage done to an ISP's reputation when it is the source of spam, many ISPs and web email providers use CAPTCHAs on new accounts to verify that it is a real human registering the account, and not an automated spamming system. They can also verify that credit cards are not stolen before accepting new customers, check the Spamhaus Project ROKSO list, and do other background checks.

Confirmed opt-in for mailing lists

A malicious person can easily attempt to subscribe another user to a mailing list — to harass them, or to make the company or organisation appear to be spamming. To prevent this, all modern mailing list management programs support "confirmed opt-in" by default. Whenever an email address is presented for subscription to the list, the software will send a confirmation message to that address. The confirmation message contains no advertising content, so it is not construed to be spam itself, and the address is not added to the live mail list unless the recipient responds to the confirmation message.

Egress spam filtering

Email senders typically now do the same type of anti-spam checks on email coming from their users and customers as for inward email coming from the rest of the Internet. This protects their reputation, which could otherwise be harmed in the case of infection by spam-sending malware.

Limit email backscatter

If a receiving server initially fully accepts an email, and only later determines that the message is spam or to a non-existent recipient, it will generate a bounce message back to the supposed sender. However, if, the sender information on the incoming email was forged to be that of an unrelated third party then this bounce message is backscatter spam. For this reason it is generally preferable for most rejection of incoming email to happen during the SMTP connection stage, with a 5xx error code, while the sending server is still connected. In this case then the sending server will report the problem to the real sender cleanly.

Port 25 blocking

s and routers can be programmed to not allow SMTP traffic from machines on the network that are not supposed to run Mail Transfer Agents or send email. This practice is somewhat controversial when ISPs block home users, especially if the ISPs do not allow the blocking to be turned off upon request. Email can still be sent from these computers to designated smart hosts via port 25 and to other smart hosts via the email submission port 587.

Port 25 interception

can be used to intercept all port 25 traffic and direct it to a mail server that enforces rate limiting and egress spam filtering. This is commonly done in hotels, but it can cause email privacy problems, as well making it impossible to use STARTTLS and SMTP-AUTH if the port 587 submission port isn't used.

Rate limiting

Machines that suddenly start sending lots of email may well have become zombie computers. By limiting the rate that email can be sent around what is typical for the computer in question, legitimate email can still be sent, but large spam runs can be slowed down until manual investigation can be done.

Spam report feedback loops

By monitoring spam reports from places such as spamcop, AOL's feedback loop, and Network Abuse Clearinghouse, the domain's abuse@ mailbox, etc., ISPs can often learn of problems before they seriously damage the ISP's reputation and have their mail servers blacklisted.

FROM field control

Both malicious software and human spam senders often use forged FROM addresses when sending spam messages. Control may be enforced on SMTP servers to ensure senders can only use their correct email address in the FROM field of outgoing messages. In an email users database each user has a record with an email address. The SMTP server must check if the email address in the FROM field of an outgoing message is the same address that belongs to the user's credentials, supplied for SMTP authentication. If the FROM field is forged, an SMTP error will be returned to the email client.

Strong AUP and TOS agreements

Most ISPs and webmail providers have either an Acceptable Use Policy or a Terms of Service agreement that discourages spammers from using their system and allows the spammer to be terminated quickly for violations.

Legal measures

From 2000 onwards, many countries enacted specific legislation to criminalize spamming, and appropriate and can have a significant impact on spamming activity. Where legislation provides specific text that bulk emailers must include, this also makes "legitimate" bulk email easier to identify.
Increasingly, anti-spam efforts have led to co-ordination between law enforcement, researchers, major consumer financial service companies and Internet service providers in monitoring and tracking email spam, identity theft and phishing activities and gathering evidence for criminal cases.
Analysis of the sites being spamvertised by a given piece of spam can often be followed up with domain registrars with good results.

New solutions and ongoing research

Several approaches have been proposed to improve the email system.

Cost-based systems

Since spamming is facilitated by the fact that large volumes of email are very inexpensive to send, one proposed set of solutions would require that senders pay some cost in order to send email, making it prohibitively expensive for spammers. Anti-spam activist Daniel Balsam attempts to make spamming less profitable by bringing lawsuits against spammers.

Machine-learning-based systems

Artificial intelligence techniques can be deployed for filtering spam emails, such as artificial neural networks algorithms and Bayesian filters. These methods use probabilistic methods to train the networks, such as examination of the concentration or frequency of words seen in the spam versus legitimate email contents.

Other techniques

Channel email is a new proposal for sending email that attempts to distribute anti-spam activities by forcing verification when the first email is sent for new contacts.

Research conferences

Spam is the subject of several research conferences, including: