URL Filtering with Squid

Web Filtering Next to my digital life, I’m also the happy father of two young girls. The first one is already ten years old and smoothly discovers the “Wonderful Internet“. Being an Infosec guy, it sounds logical for me to implement some safeguards.

First, let the technical stuff aside and talk! Some security awareness is always good. The first lesson was to learn how to use a password andÂ not share it with her sister. It must be seen like a game! Both have their own access on the family laptop. Then, discuss with your children and explain, using simple words, what the Internet is and what they can find: the best as the worst! Communication is the key.Â Countermeasures will not be efficient if you don’t explain the reasons. For me also, two golden rules:

Do not let the children use the computer in a closed room. Shoulder surfing is good in this case.
If they find something “strange”, encourage them to report it to you and to not be afraid to “ask“.

Unfortunately, it won’t take time before the children will experiment and try to break the rules. Risks are big to find nasty stuff. So, how to protect them and keep an eye on their online activity? Let’s use open source software! I’m not a big fan of theÂ commercial parental control solutions for several reasons. Why pay if you can build something for free? We have at home more than one computer, not always running Windows. Finally, I can integrate the alerts in my personal SIEM (read “OSSEC” ;-)).

Disclaimer: this setup will protect your children from unappropriated content but has also limitations. It can be used as a starting point in a corporate environment but it must be hardened. It could be easily bypassed by experienced people.

Step one, installation of Squid. This software is the number-one proxy cache that supports multiple protocols. Squid is available as a package on many Linux distributions. The installation is pretty straight forward. The out-of-the-box configuration will work for most of the environments. But one big choice must be made: how will you tell the browser to use the proxy? In corporate environments, you can use a GPO, a ProxyPAC files but at home? You can manually configure the proxy settings in the browser but it could be disabled once they will parse the menu. Do not underestimate them! My choice was to use Squid as a transparent proxy. At firewall level, all the web traffic (as well as other protocols) is redirected to the Squid box. My firewall is based on pf. The traffic redirection is done via a simple rule:

  ...
  proxy_protocols = '{80, 443, 8080, 8000, 21}'
  int="fxp0"
  rdr pass on $i inet proto tcp to any port $proxy_protocols -> $i port 3128
  ...

Step two, installation of SquidGuard. Squid has a nice feature called “URL rewriting”. You can pass all URLs processed by Squid to an external program.

  url_rewrite_program /usr/local/bin/squidGuard -c /usr/local/squidGuard/squidGuard.conf

SquidGuard works like this. All URLs are passed to SquidGuard which compares them against a database. If one of them matches a blacklist condition, a default URL is returned (which will display a warning or any other protection). SquidGuard is easy to compile and install, just follow the INSTALL file.

Step three, setup your policy. For a successful protection, SquidGuard must rely on a strong database. Two approaches are available: working with a whitelist or a blacklist. The first one will be more easy to build but very restrictive. Don’t forget that modern web services are often split across multiple platforms. It will quickly become a nightmare to maintain such a list. Blacklists contain a list of prohibited URLs. In this case, there are always risks to see new sites not listed. My choice was to implement a blacklist. Here again, commercial lists are for sale on the Internet but let’s try to keep the solution free of charge. After some investigations, I found a nice blacklist maintained by the University of Toulouse, France. The blacklist contains the following categories of “bad” sites:

Category	Referenced Sites
adult	996483
agressif	340
audio-video	1934
blog	423
cleaning	158
dangerous_material	38
drogue	901
financial	76
forums	203
gambling	717
hacking	293
mobile-phone	35
phishing	63516
publicite	1301
radio	150
redirector	51399
strict_redirector	51183
strong_redirector	51183
tricheur	35
warez	701
webmail	86
games	8443
mixed_adult	107
filehosting	732
reaffected	8
sexual_education	13
shopping	137
dating	3111
marketingware	180
astrology	25
sect	144
celebrity	642
manga	596
child	17
malware	234609
press	38
chat	210
remote-control	14

Another big advantage of this selection of sites: They contain lot of French websites (my daughters speak French). Once you downloaded the blacklists and compiled them (to speed up the lookups), it’s time to create your policy. SquidGuard is powerful and can restrict/allow access based on the time, the source IP address, the authenticated user. In my home setup, the proxy is transparent. IP addressed are assigned via DHCP. By using fixed leases, it’s possible to allow all the traffic from those IP addresses. Here is a sample of my configuration:

  #
 Â # CONFIG FILE FOR SQUIDGUARD
 Â #

  dbhome /data/squid/squidGuard/db
Â  logdir /data/squid/log
  dest porn {
    domainlistÂ Â Â Â Â  porn/domains
    urllistÂ Â Â Â Â Â Â Â  porn/urls
    logÂ Â Â Â Â Â Â Â Â Â Â Â  blocked.log
  }
  # --- All categories are configured as "porn" ---
  src trusted {
    ip 192.168.254.1-192.168.254.10
  }
  acl {
    trusted {
      pass all
    }
    default {
      pass !porn !agressif !astrology !celebrity !chat !child !dangerous_material \
           !dating !drugs !filehosting !financial !forums !gambling !games !hacking \
           !malware !manga !marketingware !mixed_adult !mobile-phone !phishing \
           !publicite !reaffected !redirector !remote-control !sect !sexual_education \
           !strict_redirector !strong_redirector !tricheur !warez !webmail all
      redirect http://proxy.home/block.php?clientaddr=%a&targetgroup=%t&url=%u
    }
  }

This configuration will allow an unrestricted access to the 10 first IP addresses of the subnet. All other users (dynamic IP’s) will have all the categories restricted. Blocked websites will be logged to the “blocked.log” file. I don’t use authentication for two reasons: first, it’s not supported by Squid in transparent mode. Second, I don’t want a double authentication on the computer. When an URL is rejected by Squid/SquidGuard, the following page is displayed in the browser:

To serve this page, a small Apache instance must be available in your network.

Step four, setup the alerts. Squid can now prevent my kids to access nasty content but I don’t have time to keep an eye on the generated logfile. Why not use OSSEC to notify me when an URL has been blocked? Here is an example of event generated by SquidGuard:

2011-01-17 17:33:53 [9039] Request(default/porn/-) http://playboy.com/ 192.168.254.240/- - GET REDIRECT

Add the new logfile in the list of files to monitor at agent level and create a new alert:

  <!-- SquidGuard Alert -->
  <rule id="100026" level="7">
    <regex>^\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d [\d+] Request\.</regex>
    <description>Unauthorized URL blocked by SquidGuard</description>
  </rule>

This solution will prevent your children from most bad sites until… they will find other ways to access online resources. Don’t underestimate them! Are you ready to play the cat & mouse game? 😉

7 comments

Marcus says:

June 28, 2013 at 00:02

Hi,
just like to let you know that squidGuard has no active development nor support, and the faster alternative is ufdbGuard.
ufdbGuard also comes with ufdbhttpd to serve the “access denied” messages so there is no need for Apache.
Marcus
Trevor Mathray says:

March 6, 2012 at 14:02

Hi i am new to Apache and squid. i have configured squid & squidguard it is working fine. problem is to identify the catogery in which squid is blocking can u help me with the Apache program that you have used for the redirection
Xavier says:

January 20, 2011 at 18:52

Right! Like said in the article:

“First, let the technical stuff aside and talk! … Then, discuss with your children and explain, using simple words, what the Internet is and what they can find: the best as the worst! Communication is the key.”

Anyway, I’d like to keep my children from some content. Not that I want to protect them from the real life. But there are really chocking pictures which must be kept away from them…
trouble says:

January 20, 2011 at 11:57

I don’t believe in children and it’s obviously not my business how you raise yours, but I used to be a child a long time ago…

Do you really think that trying to shelter kids from the “evil world of the internet” is a good idea? Wouldn’t it be more effective to show them what the evil is, and then let them make their own decisions on what they want to watch and what not? Trying to keep growing kids away from porn is like trying to keep software engineers away from IRC: frustration for all involved.

Inevitably, the filter will miss bits. Or will filter bits it shouldn’t miss.
Xavier says:

January 19, 2011 at 12:54

I agree, OpenDNS also provides a web-filtering feature. I tested it a long time ago.
First issue, is the usage of IP addresses instead of FQDN. SquidGuard may reject URLs based on IP addresses. You cannot redirect to your own page and (maybe the most important) I like my freedom! 🙂 I like the freedom to add/remove sites by myself. Example: to prevent my kids to visit the latest Hannah Montana’s website 🙂

Thanks to OSSEC, I’ve an history of all the events and can be alerted in real time.

But indeed, OpenDNS is much more easy to setup and does not require extra software/hardware components on your network.
rozie says:

January 19, 2011 at 09:04

Did you thought about using OpenDNS? It also has option to block defined categories of sites, even in free option. It’s probably easier to set up for most users, and can be faster. And disadvantages of OpenDNS comparing to Squid solution?
Kris says:

January 19, 2011 at 07:36

OpenDNS is also a good solution.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Leave a Reply