If you are the administrator of an online forum, a wiki or any website which accepts user data, you problably also know this feeling: Bots are a pain and fighting them looks like an endless loop! Your websites are constantly scanned by bots which try to create fake accounts then pages with spam. They are thousands (millions?) of computers on the Internet which scan community websites like yours. Often it’s just to optimize the ranking of websites in search engines. Nothing dangerous but very annoying for your regular visitors. There exists lot of techniques to fight against bots and reduce the chances for them to create accounts but it’s a recurrent problem. They become more and more sofisticated and tend to escape classic checks. They are plenty of protections available.
Some examples:
- Use domains or IP addresses blacklists?
→Forget this, totally unmanageable - Use a blacklist of User-Agents?
→They use UA from well-known browsers (same problem as above) - Check the HTTP Referrer?
→They follow the regular registration path and do not access directly the registration page - CAPTCHA?
→That remains the most common way to reduce spam but…
In the world of CAPTCHA, the Rolls-Royce remains reCaptcha (developed by Google) even if it has already reported as broken several times. True or false? From my personal experience, I already saw in my logs accounts created by bypassing this test! They are tons of other CAPTCHA implementations:
Alternatives to text recognition exists like the Microsoft project ASIRRA which asks the user to identify cats amongst dogs. Others ask to resolve simple mathematic expressions (they are broken too) or ask to answer a simple question like “What’s the color of a banana?“.
A few days ago, I was dealing with an huge increase of fake accounts created on one of the BruCON wiki’s. Some bots successfully bypassed the CAPTCHA system in place. I asked for some help on Twitter and received an interesting reply from lcx_at. I investigated his suggestion and implemented it. The proposed technique is to define a hidden field and check with ModSecurity (or any other WAF) if it contains some data.
Honestly, I’m not a big fan of WAF (“Web Application Firewalls“). Why? There are often seen by developers as an ultimate protection and they reinforce the idea that they don’t have to care about security (“No worries, we have a WAF!“). For me, a WAF is used in very specific cases:
- Protect legacy applications which cannot be easily moved/upgraded to a new platform
- Temporary protect websites against new threats (while developers are fixing their code or a patch made available by a vendor)
- Reduce security costs if applications to be protected require a huge amount of money for maintenance.
And also to protect against bots! Let’s see how…
First, in your registration form, add a new input field and hide it using a CSS. Be sure to use “{display:none}” to not break the page design:
<style type="text/css"> div .AntiBot {display:none;} </style> <td><div class="AntiBot"> <input name="AntiBot_RmBo9X20Yo" type="text"> </div></td>
Assign a unique name to your input field, this will make your life easier for detection and reporting. Then, create a ModSecurity rule which will block all POST requests with a value added to the hidden field:
SecRule ARGS:AntiBot_RmBo9X20Yo "(\S+)" \ "auditlog,deny,log,msg:'Denied user creation by a bot'"
The rule says: in POST request, inspect the argument called “AntiBot_RmBo9X20Yo” and if it contains any caracter, deny the request and log it using the provided message. More details about ModSecurity rules are available here.
A regular user (human) will not see the hidden field and leave it empty. On the other side, some (intelligent) bots parse the HTML code and automatically fill all the detected fields. Here is an example of request performed by such a bot. It submitted our field with a value of “Create+account“:
--9924cd21-C-- wpCaptchaWord=xxx&wpCaptchaId=440890724&wpName=xxx& \ wpPassword=3xxx&wpRetype=xxx&wpEmail=&wpRealName=& \ wpRemember=1&AntiBot_RmBo9X20Yo=Create+account& \ wpCreateaccount=Create+account& \ wpCreateaccountToken=e36684c69e31b655b56a00f3254f48cf --9924cd21-F-- HTTP/1.1 403 Forbidden Vary: Accept-Encoding Content-Length: 211 Keep-Alive: timeout=15, max=256 Connection: Keep-Alive Content-Type: text/html; charset=iso-8859-1
Result: access denied with an HTTP/403 error! Since the implementation of this check, I blocked an everage of two requests per day. This is of course not bullet-proof but if it can reduce the bot traffic by a few percents, I’m already happy.
This post is also a good opportunity to demonstrate that the goal of a WAF is not to be deployed with its standard configuration. It could be a wonderful tool but requires fine-tuning and customization…
Hi Luis,
Thank you for your comment. You’re right: A motivated attacker could bypass this control. The goal is here to kick off stupid bots and reduce the noise.
Note also that the post is 5Y old 😉
It looks interesting but as you say, it is not bulletproof, since an attacker could spam manually by filling out the forms many times or performing a custom program for the attacked site, filling all fields except that of antibot.
great article. I habe implemented this and the first results are very promising.
Thank you
Thanks for sharing!
Actually I’ve set it up to use user agent filtering as well — I still get about 10 spam comments a month, but that’s about it. Even though many of them use standard User-Agent, there are subtleties that let you filter for most of them.
Take a look at my ruleset: http://www.flameeyes.eu/projects/modsec