How to Protect Email Addresses on Web Sites

May 23, 2010

You probably know by now that the bad email guys out there get many of the email addresses they use by harvesting them from sites on the Web. "Bots" constantly roam the Web, reading Web sites and looking for anything that resembles an email address. The simplest attack looks for MailTo links in the HTML code of a page but, importantly, text email addresses like noreply@fastie.com will be found. It's too bad that MailTo links are so easily breached because live email links on a Web page are a great convenience to site visitors.

One way to thwart the bots is to use a bit of JavaScript to create the email address in the DOM. Because only the JavaScript code appears in the page source, the bots can not "see" the address.

The first solution I saw that used this technique was created by Dan Benjamin, formerly of Automatic Labs and HiveWare. He created an online application called Enkoder. Enter an email address and Enkoder will create a chunk of Javascript that contains an encrypted form of that email address. Plop this down in your HTML at the right spot and you're in business. If you look at the code, you'll see that no bot is likely to "dekode" this slug.

I learned about Enkoder from my old friend Brian Livingston, who among other things was the editor of the superb Web site and newsletter WindowsSecrets.com. He wrote the eBook "Spam-Proof Your E-Mail Address," which I own and can recommend.

I like the Enkoder approach but the code is big and ugly, making a mess of the HTML in your pages. Also, you would need to run Enkoder for each email address you wanted to Enkode, then drop in that unique script. I wanted something simpler, easier to read and remember, and with less overhead. I wrote "Nobots" to meet my needs.

Let's take a look at the code.

// nobots.js
// Written by Will Fastie, 19 Aug 2005
// Rewritten for Global Abatement 07 Jul 2008
// Revised for generality 10 Jul 2008
 
/*global WF */
WF = {};
 
// My Domain - Convert a code into a domain name or return the code
WF.myd = function (mycode) {
    var dn = 'fastie';
    var tldc = '.com';
    var d;
    switch (mycode) {
    case "fc":
        d = dn + tldc;
        break;
    case "gm":
        d = 'gmail' + tldc;
        break;
    case "ao":
        d = 'aol' + tldc;
        break;
    case "yh":
        d = 'yahoo' + tldc;
        break;
    default:
        d = mycode;
        break;
    }
    return d;
};
 
// Construct Address - create an email address out of the pieces
WF.constructaddr = function (ename, edomain) {
    var atsign = "@";
    var addr = ename + atsign + edomain;
    return addr;
};
 
// NoBots - write a properly constructed MailTo link into the DOM
WF.nb = function (ename, edomaincode, edisplay, esubj) {
    var subj = "?subject=";
    var addr = WF.constructaddr(ename, WF.myd(edomaincode));
    var display = addr;
    var atag;
    if (((arguments.length === 4) || (arguments.length === 3)) && (edisplay !== "")) {
        display = edisplay;
    }
    if ((arguments.length === 4) && (esubj !== "")) {
        addr = addr + subj + esubj;
    }
    atag = "<a " +  " href="mailto:" + addr + "">" + display + "";
    document.write(atag);
    return null;
};

Now let's take a look at the sections of the code. One of the less familiar constructs may be this:

/*global WF */
WF = {};

Over time, I have become a tiny bit more capable with JavaScript. This led me to the excellent teachings of Douglas Crockford, who advocates something called global abatement. He correctly points out that with a lot of JavaScript modules loading on the same page, eventually there is bound to be a conflict. So rather than defining a function with the global identifier nb and risking a collision, a single global, in my case my initials WF, is created and all the internal functions are defined in terms of it. Thus the main function in my routine is called WF.nb, which is much less likely to collide with anything. Naturally, longer global names reduce the risk even further; I probably should use FASTIE or WILLF or FSYS.

The main function WF.nb is called with up to four arguments, the last two optional:

ename the name portion of an email address, such as widgets
edomaincode a code denoting which domain name should be used. If the code does not match one of the predefined codes, it is assumed to be a full domain name instead.
edisplay the text to be displayed as the clickable link. If this argument is not provided, the email address is used as the link.
esubj the text to be used for the subject line of the message.

The two helper functions provide assistance with the construction of the email address. WF.constructaddr assembles the pieces. WF.myd converts the supplied edomaincode into a domain name.

The code is represents a certain amount of overkill. I have gone to some lengths never to allow any portion of the code to remotely resemble anything like an email address. The @ symbol is never used in a string that could be parsed as an email address and even domain names are obfuscated by the WF.myd routine.

If a bot was sophisticated enough, it could parse the entire JavaScript file looking for my function names and then act based on that information. That's why I recommend you change the name. If your name is Jane Q. Public, substitute JQP for WF in the code. You can also modify the myd routine so that it selects the domain name in a different way. But even if you make no changes except to the domain names of interest, bots probably are not going to be that sophisticated. They are mostly looking for low-hanging fruit.

Does It Work?

For five years after I wrote this code, my site included a page that contained a secret email address protected by my code. The page was entirely visible to search engines and therefore to bots.

The secret email address never received any email.

That doesn't prove anything, of course. My test was therefore a bit more sophisticated. During the first year that the secret page was online, it also contained an email address that was not protected. I received spam at the unprotected address within three months. I killed off that email address and tried another; it was captured as well.

That's still not proof positive. It's a pretty good indicator, enough so that I now provide my nobots code to all my clients as part of my basic Web site package and use it to protect every exposed email address on a site. I'm happy to do so and happy to provide the code here; spam is a scourge.

I do not believe that a bot will find anything it can parse into a valid email address in nobots. If you think I'm wrong, . On the other hand, if you think this code is the greatest thing since sliced bread, .

Tags: Email, Programming, Web

A total of 43 related articles were found. See them all...