Anti-Spam Techniques In PHP - Part 2

This tutorial provides a few simple techniques for protecting yourself and your web site from spammers.
Provided by Quentin Zervaas

Introduction

This short series of articles provides a few simple techniques for protecting yourself and your web site from spammers.

It does this from two perspectives:

This is the second article in the series, and covers the issue of protecting your web site from spammers. In this article we cover a number of ways of overcoming spammers, including:

1. CAPTCHA
2. Manual approval (moderation)
3. Text filters (blocking URLs)
4. Email Validation.

Protecting your web site

Introduction To CAPTCHA

This is the main method we're going to look at in this article for preventing spam. CAPTCHA is acronym which stands for Completely Automated Public Turing test to tell Computers and Humans Apart.

You've probably seen one of these tests before - it's basically an image that contains distorted text (and potentially a background image) to prevent automated tools (such as spam bots) from registering at web site or posting comments. Users are required to enter the text that appears in these images in order to proceed past the form they are completing.

How you can use CAPTCHA

You can use CAPTCHA in your web site in exactly the same way as I described above. All you need to do is add an CAPTCHA image and a text input to the form you're trying to protect. This may be used in the following situations:

  • Protecting a registration form to stop automated registrations on your site
  • Protecting your publicly-editable Wiki from being overrun by automated posts
  • Protecting your blog from comments posted by spam bots

How this is implemented

Implementing CAPTCHA using PHP is fairly straightforward. We will be doing this using PEAR's Text_CAPTCHA class and PHP's sessions. The basic algorithm goes something like this:

  • In the form:
    • Generate a phrase for the user to enter, and write it to the session
    • Place a HTML img tag to call the CAPTCHA script (which we will create)
    • Place a text input box where the user can input the phrase
    • Create the script referenced in the image tag, which:
    • Creates the Text_CAPTCHA object
    • Fetches the phrase from the user's session
    • Outputs the image in the form processor
  • Check that the submitted value matches the value in the session

Implementing A PHP CAPTCHA Solution

We will now implement the actual PHP code to do this.

The first thing that needs to be done is to install the Text_CAPTCHA class. At time of writing, this classes also depends on Text_Password and Image_Text, so if you don't already have these they must also be installed.

In order to install these via the Linux command line, you should use commands similar to the following:

Listing 1. Commands used to install PEAR's Text_CAPTCHA.
# pear install -f Text_CAPTCHA
# pear install -f Image_Text

The captcha.php file

This is the script that generates the CAPTCHA image. This script assumes that a phrase has already been set in the session.

Additionally, you must have TrueType font present in the same directory as captcha.php. This is the font used to write the secret phrase. If you use a Windows computer, you can find a bunch of these in your C:\Windows\Fonts directory.

The captcha.php file looks like this, from start to end:

Listing 2. captcha.php
<?php
require_once('Text/CAPTCHA.php');

session_start();
$phrase = isset($_SESSION['captcha']) ? $_SESSION['captcha'] : 'Error';

$options = array('font_size' => 24'font_file' => 'georgia.ttf');

$cap Text_CAPTCHA::factory('Image');
$cap->init(12060$phrase$options);

header('Content-type: image/png');
echo 
$cap->getCAPTCHAAsPNG();
?>

Note that this doesn't do any error checking, so you may want to improve this. Additionally, the text used if no phrase was found is "Error", so if this is the case, then this text will appear in the image, and the user will probably never be able to complete submission of the form.

Now, here's the PHP code for the form and form processor. Note that the whole thing has been over-simplified, and real form-processing would have much more to it than this.

Listing 3. listing-3.php
<?php
session_start
();

if(isset(
$_POST['process'])){
    if(!isset(
$_SESSION['captcha']))
        die(
'Form accessed incorrectly');

    if(isset(
$_POST['captcha']) && $_POST['captcha'] == $_SESSION['captcha']){
        die(
'CAPTCHA text matched! Phrase was '.$_SESSION['captcha']);
    }else{
        die(
'CAPTCHA text did not match. Phrase was '.$_SESSION['captcha'].
            
', you entered '.$_POST['captcha']);
    }
}else{
    
// generate a new CAPTCHA phrase
    
$_SESSION['captcha'] = substr(md5(uniqid(null)), 04);
}
?>
<html>
    <head>
        <title>CAPTCHA Demo</title>
    </head>
    <body>
        <form method="post" action="<?php echo $_SERVER['PHP_SELF'?>;">
            <img src="captcha.php" /><br />
            Enter phrase: <input type="text" name="captcha" /><br />

            <input type="submit" name="process" value="Submit" />
        </form>
    </body>
</html>

In this code, we're just generating a random string of text using MD5 and uniqid() for our phrase. This has a lot of scope for improvement or change, although this will do the trick.

That's all there is to it! The images generate by Text_CAPTCHA are somewhat straightforward. There are other implementations on the Internet for creating CAPTCHA images, but essentially they all do the same thing and the algorithm I've provided above will work with all of them.

Drawbacks Of CAPTCHA

While CAPTCHA is very useful and widely adapted across the Internet, there are some drawbacks to using it.

Probably the biggest issue is to do with accessibility. People that are vision impaired may have great difficulty in using your CAPTCHA forms, so you should provide alternatives. At minimum, you should offer a description of it and how it works, and a contact form that people can contact you with so you can complete the form for them.

In fact, even people who are not vision impaired may have difficulty in using CAPTCHA. Sometimes the generated images are just really hard to read. So make sure they are random (e.g. even if the phrase stays the same, then the noise image and/or text placement changes). Also indicate to the user that they can refresh the page so the image is recreated and possibly easier to read.

One last note to be aware of, is that CAPTCHA is not totally foolproof. People have written bots that do OCR (Optical Character Recognition) in order to foil these tests. Obviously the more complex the CAPTCHA image becomes, the harder it is to do text recognition on. There's a bit of information on Breaking a Visual CAPTCHA at UC Berkeley Computer Vision Group.

Other Methods Of Spam Prevention

We've had a look at CAPTCHA to prevent spam, but it's not the only way. There are some other ways to help combat this, but be aware that this list is by no means exhaustive. In fact, it may be a hybrid of these methods that works best for you.

Manual Approval / Moderation

The way this works is that for every person who signs up (or for every comment that is posted, etc.), a trusted person (e.g. the site administrator), checks this new account (or comment) and manually validates it.

The biggest drawback with this method is that there's a lot more work involved. It's important to spend a bit more time in the development of the approval system in order to make your life easier down the track:

  • Have a script in your administration section which lists all the items awaiting approval. Each item could have a Approve/Delete checkbox next to it which processes the items in bulk.
  • Email each new item to you when it is submitted, with a single link for each item you can click to approve or delete the item

From an implementation point-of-view, all this really requires is a single extra field in your SQL table which indicates whether or not an item is approved or not. You may also want to track the date/time items were approved, but this depends on your requirements.

Text Filters

If you're trying to protect your blog or Wiki from unwanted spam, you can use a text filter. This basically checks for "spammy" words and blocks the post if they are found. Wordpress has built-in functionality for this, and as such they have a nice list of common spam words that you can check for.

Just be aware that people may create posts that legitimately use these words - so make sure you know your topic.

Blog comment spam is useless unless it has links in it, since the spammers are aiming to improve their backlinks for Search Engine Optimization purposes. As such, you could also automatically approve posts that have no links in them, but put posts that have links in an approval queue (this is a hybrid of this method and the previous method).

Email Validation

The final method I'll briefly cover is to use email validation. This is a similar idea to CAPTCHA but works somewhat differently. This method involves the user entering their email address with their comment submission or account registration. Then in order for the account to be approved, then must click a link that is sent to their email address. This ensures they are using a valid email address.

Note that this method is probably very easy for spammers to defeat with a bot, but at least they'll need a working email address to do so. Just make sure you then make them validate their new email address if they ever want to change it.

Conclusion

In this article, we covered a few ways of protecting your web site from spammers. The methods were covered fairly briefly, but hopefully give you a few ideas for stopping (or at least minimizing) spam on your web site.

CAPTCHA is a very effective method that is widely used, and you should strongly consider it as a solution to the spamming problem - just make sure you're aware of its drawbacks.

Resources

Comments


Please post only comments related to the original tutorial. Be polite and helpful, do not spam or offend others.
Create Your Free Account
Please remember that this information is essential to use our services correctly.
After creating the account you will be able to download all of our FREE products.
Fields marked with * are mandatory






Please send me information about updates, new products, specials and discounts from ApPHP!
We recommend that your password should be at least 6 characters long and should be different from your username/email. Please use only letters of the English alphabet to enter your name.

Your e-mail address must be valid. We use e-mail for communication purposes (order notifications, etc). Therefore, it is essential to provide a valid e-mail address to be able to use our services correctly.

All your private data is confidential. We will never sell, exchange or market it in any way. Please refer to Privacy Policy.

By clicking "Create Account", you are indicating that you have read and agree to the ApPHP Terms & Conditions.