URL Filtering with Squid

Web FilteringNext to my digital life, I’m also the happy father of two young girls. The first one is already ten years old and smoothly discovers the “Wonderful Internet“. Being an Infosec guy, it sounds logical for me to implement some safeguards.

First, let the technical stuff aside and talk! Some security awareness is always good. The first lesson was to learn how to use a password and  not share it with her sister. It must be seen like a game! Both have their own access on the family laptop. Then, discuss with your children and explain, using simple words, what the Internet is and what they can find: the best as the worst! Communication is the key.  Countermeasures will not be efficient if you don’t explain the reasons. For me also, two golden rules:

  • Do not let the children use the computer in a closed room. Shoulder surfing is good in this case.
  • If they find something “strange”, encourage them to report it to you and to not be afraid to “ask“.

Unfortunately, it won’t take time before the children will experiment and try to break the rules. Risks are big to find nasty stuff. So, how to protect them and keep an eye on their online activity? Let’s use open source software! I’m not a big fan of the  commercial parental control solutions for several reasons. Why pay if you can build something for free? We have at home more than one computer, not always running Windows. Finally, I can integrate the alerts in my personal SIEM (read “OSSEC” ;-)).

Disclaimer: this setup will protect your children from unappropriated content but has also limitations. It can be used as a starting point in a corporate environment but it must be hardened. It could be easily bypassed by experienced people.

Step one, installation of Squid. This software is the number-one proxy cache that supports multiple protocols. Squid is available as a package on many Linux distributions. The installation is pretty straight forward. The out-of-the-box configuration will work for most of the environments. But one big choice must be made: how will you tell the browser to use the proxy? In corporate environments, you can use a GPO, a ProxyPAC files but at home? You can manually configure the proxy settings in the browser but it could be disabled once they will parse the menu. Do not underestimate them! My choice was to use Squid as a transparent proxy. At firewall level, all the web traffic (as well as other protocols) is redirected to the Squid box. My firewall is based on pf. The traffic redirection is done via a simple rule:

  ...
  proxy_protocols = '{80, 443, 8080, 8000, 21}'
  int="fxp0"
  rdr pass on $i inet proto tcp to any port $proxy_protocols -> $i port 3128
  ...

Step two, installation of SquidGuard. Squid has a nice feature called “URL rewriting”. You can pass all URLs processed by Squid to an external program.

  url_rewrite_program /usr/local/bin/squidGuard -c /usr/local/squidGuard/squidGuard.conf

SquidGuard works like this. All URLs are passed to SquidGuard which compares them against a database. If one of them matches a blacklist condition, a default URL is returned (which will display a warning or any other protection). SquidGuard is easy to compile and install, just follow the INSTALL file.

Step three, setup your policy. For a successful protection, SquidGuard must rely on a strong database. Two approaches are available: working with a whitelist or a blacklist. The first one will be more easy to build but very restrictive. Don’t forget that modern web services are often split across multiple platforms. It will quickly become a nightmare to maintain such a list. Blacklists contain a list of prohibited URLs. In this case, there are always risks to see new sites not listed. My choice was to implement a blacklist. Here again, commercial lists are for sale on the Internet but let’s try to keep the solution free of charge. After some investigations, I found a nice blacklist maintained by the University of Toulouse, France. The blacklist contains the following categories of “bad” sites:

Category Referenced Sites
adult 996483
agressif 340
audio-video 1934
blog 423
cleaning 158
dangerous_material 38
drogue 901
financial 76
forums 203
gambling 717
hacking 293
mobile-phone 35
phishing 63516
publicite 1301
radio 150
redirector 51399
strict_redirector 51183
strong_redirector 51183
tricheur 35
warez 701
webmail 86
games 8443
mixed_adult 107
filehosting 732
reaffected 8
sexual_education 13
shopping 137
dating 3111
marketingware 180
astrology 25
sect 144
celebrity 642
manga 596
child 17
malware 234609
press 38
chat 210
remote-control 14

Another big advantage of this selection of sites: They contain lot of French websites (my daughters speak French). Once you downloaded the blacklists and compiled them (to speed up the lookups), it’s time to create your policy. SquidGuard is powerful and can restrict/allow access based on the time, the source IP address, the authenticated user. In my home setup, the proxy is transparent. IP addressed are assigned via DHCP. By using fixed leases, it’s possible to allow all the traffic from those IP addresses. Here is a sample of my configuration:

  #
  # CONFIG FILE FOR SQUIDGUARD
  #
  dbhome /data/squid/squidGuard/db
  logdir /data/squid/log
  dest porn {
    domainlist      porn/domains
    urllist         porn/urls
    log             blocked.log
  }
  # --- All categories are configured as "porn" ---
  src trusted {
    ip 192.168.254.1-192.168.254.10
  }
  acl {
    trusted {
      pass all
    }
    default {
      pass !porn !agressif !astrology !celebrity !chat !child !dangerous_material \
           !dating !drugs !filehosting !financial !forums !gambling !games !hacking \
           !malware !manga !marketingware !mixed_adult !mobile-phone !phishing \
           !publicite !reaffected !redirector !remote-control !sect !sexual_education \
           !strict_redirector !strong_redirector !tricheur !warez !webmail all
      redirect http://proxy.home/block.php?clientaddr=%a&targetgroup=%t&url=%u
    }
  }

This configuration will allow an unrestricted access to the 10 first IP addresses of the subnet. All other users (dynamic IP’s) will have all the categories restricted. Blocked websites will be logged to the “blocked.log” file. I don’t use authentication for two reasons: first, it’s not supported by Squid in transparent mode. Second, I don’t want a double authentication on the computer. When an URL is rejected by Squid/SquidGuard, the following page is displayed in the browser:

Blocked Site
(Click to enlarge)

To serve this page, a small Apache instance must be available in your network.

Step four, setup the alerts. Squid can now prevent my kids to access nasty content but I don’t have time to keep an eye on the generated logfile. Why not use OSSEC to notify me when an URL has been blocked? Here is an example of event generated by SquidGuard:

2011-01-17 17:33:53 [9039] Request(default/porn/-) http://playboy.com/ 192.168.254.240/- - GET REDIRECT

Add the new logfile in the list of files to monitor at agent level and create a new alert:

  <!-- SquidGuard Alert -->
  <rule id="100026" level="7">
    <regex>^\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d [\d+] Request\.</regex>
    <description>Unauthorized URL blocked by SquidGuard</description>
  </rule>

This solution will prevent your children from most bad sites until… they will find other ways to access online resources. Don’t underestimate them! Are you ready to play the cat & mouse game? 😉

7 comments

  1. Hi,
    just like to let you know that squidGuard has no active development nor support, and the faster alternative is ufdbGuard.
    ufdbGuard also comes with ufdbhttpd to serve the “access denied” messages so there is no need for Apache.
    Marcus

  2. Hi i am new to Apache and squid. i have configured squid & squidguard it is working fine. problem is to identify the catogery in which squid is blocking can u help me with the Apache program that you have used for the redirection

  3. Right! Like said in the article:

    “First, let the technical stuff aside and talk! … Then, discuss with your children and explain, using simple words, what the Internet is and what they can find: the best as the worst! Communication is the key.”

    Anyway, I’d like to keep my children from some content. Not that I want to protect them from the real life. But there are really chocking pictures which must be kept away from them…

  4. I don’t believe in children and it’s obviously not my business how you raise yours, but I used to be a child a long time ago…

    Do you really think that trying to shelter kids from the “evil world of the internet” is a good idea? Wouldn’t it be more effective to show them what the evil is, and then let them make their own decisions on what they want to watch and what not? Trying to keep growing kids away from porn is like trying to keep software engineers away from IRC: frustration for all involved.

    Inevitably, the filter will miss bits. Or will filter bits it shouldn’t miss.

  5. I agree, OpenDNS also provides a web-filtering feature. I tested it a long time ago.
    First issue, is the usage of IP addresses instead of FQDN. SquidGuard may reject URLs based on IP addresses. You cannot redirect to your own page and (maybe the most important) I like my freedom! 🙂 I like the freedom to add/remove sites by myself. Example: to prevent my kids to visit the latest Hannah Montana’s website 🙂

    Thanks to OSSEC, I’ve an history of all the events and can be alerted in real time.

    But indeed, OpenDNS is much more easy to setup and does not require extra software/hardware components on your network.

  6. Did you thought about using OpenDNS? It also has option to block defined categories of sites, even in free option. It’s probably easier to set up for most users, and can be faster. And disadvantages of OpenDNS comparing to Squid solution?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.