Adding Data Leakage Protection into Apache

Data leakage is a major risk for many organizations today. As more and more data are used in a digital format, it’s easy to copy them or send them outside the security perimeter. Leaked data can have a major impact on the business (loss of revenue, loss of confidentiality or loss of credibility – customers, shareholders or media).

A common vector of data leak is the web traffic the “evil” port 80! Every day, we read news about mis-configured web sites which allow access to unprotected resources or made available by hackers (once on the web server, they can be easily retrieved).

I was looking for a solution to reduce the risk of data leakage for a specific website running on Apache. The solution must be able to address two requirements:

  • To mitigate the problem on the fly – When confidential information can be accessed by the web server, it’s already too late. And practically, it’s impossible to be available 24×7 to react to an incident.
  • To generate a notification in real-time – In the same time as the mitigation is performed, alerts must be sent and processed by a monitoring tool. To goal is to be notified as soon as possible when an event occurred.

I browsed the web to find my dream-solution and found an old (it was released in 2004) module called mod_replace . It was developed as a complement to mod_proxy (when using Apache as a reverse proxy). The behavior is simple: mod_replace replaces text strings based on regular expressions.

Based on the code written by Science Computing, I created my mod_dlp module with the following changes:

  • The code has been fixed to compile successfully with recent Apache versions (it did not compile anymore due to changes in the Apache API).
  • Unwanted functions have been removed (which originally rewrote HTTP headers).
  • Support for Syslog has been added.

Honestly, the base of the module is the same as the original one. All the credits must go to the original developers. The major change is the Syslog support. The original module did not generate logs (or only in debug mode). Using the Apache log file for real-time notifications is not relevant. Flat files are not processed in real time and consume a lot of resources to be interpreted. Syslog is the best solution (easy to implement and manage).

How does the module work? Once successfully loaded by Apache, three new commands are available:

  • DLPCreateFilter <filtername> [options]
  • DLPDefinePattern <filtername> <pattern> <newstring>
  • DLPSyslogNotify <facility> <level>

The filter name is always a mandatory argument. It is used to build the repository of filters. For DLPCreateFilter, two options are available: “CaseIgnore” which search for patterns in a case nonsensitive way and “intype=xxx” used to restrict the search to content of the specified MIME-type (Example: intype=text/html).

DLPDefinePattern is the principal command and creates pair of patterns and replacement strings. Patterns must be written in Perl regular expression (they will be parsed by the PCRE library).

The third command DLPSyslogDefine is optional. If specified (with a Syslog facility and priority), messages will be sent to the local Syslog daemon.

Let’s start with an example. To prevent the Apache server to send credit card numbers in HTTP packets, let’s create a filter which will replace all credit card numbers by a warning message.

In the main httpd.conf, add the following commands:

DLPCreateFilter cc-filter CaseIgnore
DLPDefinePattern cc-filter \
        "\d{4}[\- ]?\d{4}[\- ]?\d{2}[\- ]?\d{2}[\- ]?\d{1,4}" \
        "<font color=\"red\">Prohibited-Content-Removed</font>"

To be notified when a credit-card number is overwritten, add the command:

DLPSyslogNotify LOG_LOCAL5 LOG_INFO

The module is based upon Apache’s filter mechanism (Output Filter). In the main configuration, in directories or in the virtual host(s) you want to protect, add the following command:

SetOutputFilter cc-filter

For performance purposes, enable the filter at the right places. Finally, restart the Apache server. If the new configuration is correctly parsed by Apache, the filter will be immediately operational.

A live demo is available here: Enter a credit-card number to match the regular expression and check the result. Note that the same change is performed if you try to read the page source. In the same time, a Syslog message will be generated:

Jan 15 22:24:29 webserver httpd: ALERT: Filter cc-filter matched "1234-1234-1234-1234". Request: GET /index.php?cc=1234-1234-1234-1234 HTTP/1.0

By default, all MIME-types are filtered (except if you use the “intype=” option described above). That’s useful to filter data embedded in other document formats. Check out this second demo. Your browser will open a RTF document containing the same credit card number. Another one: imagine a webserver with a readable UNIX password file. It’s possible to mask all details like here. Important remark: only data sent in clear text will be successfully parsed. If the server exchanges compressed traffic with the browser, it won’t work.

If you need more details on how the module performs its job, have a look at the original documentation of mod_replace which describes perfectly the internals of the module.

To test by yourself, feel free to download the source code. On your Apache server, use the following command:

apxs [-I alternate_include_dir] -i -a -c mod_dlp.c

This command will compile the module and active it by modifying your httpd.conf. To conclude, this module is certainly not perfect. Your data will be protected only if a correct regular expression protects them. The maintenance and definition of the filter are a pain but it works! If you have good ideas, comments, contact me or feel free to fork your own version of the module.

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.