Keep an Eye on your Data using OpenDLP

Data Leak A new tool has been released (version 0.1) today on code.google.com: OpenDLP. “DLP”, “Data Loss Protection” or “Data Leak Protection”, a buzz-word! Even if the problem is real and critical for some organizations, my opinion is the following: Instead of spending money in expensive solutions (and DLP solutions ARE expensive!), spend some money to deploy correct work environments for your team members!

Often, the data losses are caused by motivated people just trying to do their job! Example: John has an important meeting tomorrow at nine. Due to regular server problems in the afternoon, he was not able to complete his presentation. What will John think? “No problem, I’ll send my data to my personal Gmail account and finish my presentation tonight from home. Bzzzzz wrong! Another example: due to a poor reporting engine in the sales tool, a manager  exports a database of customers on his own laptop to process them using a self-made Excel sheet. Again, wrong!

Let’s focus now on OpenDLP which can help you to implement basic scanning on files laying on your organization workstations and servers. As defined by the author: “OpenDLP is a free and open source, agent-based, centrally-managed, massively distributable data loss prevention tool released under the GPL“. It looks interesting!

OpenDLP Architecture
Click to enlarge

OpenDLP is a solution running on a central Apache webserver. The code is written in Perl and stores data into a MySQL database. The philosophy is to deploy an agent to Windows hosts  with a configuration. This is achieved using Samba and Windows credentials. The agent, installed as a service, performs the scan, grabs results, uploads them to the server and is finally removed . This process is fully transparent to the Windows user and non intrusive. As the agent is installed as a service, it survives reboots. Once installed, simple communication is available between the central server and agents (example: to pause a running scan process). A simple system has also been implemented to mark some matches as false-positive.

OpenDLP is released as GPL and all the source code is provided. I quickly reviewed the agent source sources:

  • The agent scans the following storage types: floppy, thumb drive, flash card reader, HDD, flash drive, CD-ROM and RAM disk.
  • White/blacklists are available to prevent some files to be scanned.
  • Filters based on file extensions.
  • libcurl is used to communicate with the server (using SSL, good!)

Once finished, the scan results can be reviewed via the WebGUI and exported as XML for further processing. The search for breaches is performed via Perl regular expressions. The default set of regexp is very low but gives a good idea of the “power” of regexp:

Credit_Card_Track_1:(\\D|^)\\%?[Bb]\\d{13,19}\\^[\\-\\/\\.\\w\\s]{2,26}\\^[0-9][0-9][01][0-9][0-9]{3}
Credit_Card_Track_2:(\\D|^)\\;\\d{13,19}\\=(\\d{3}|)(\\d{4}|\\=)
Credit_Card_Track_Data:[1-9][0-9]{2}\\-[0-9]{2}\\-[0-9]{4}\\^\\d
Mastercard:(\\D|^)5[1-5][0-9]{2}(\\ |\\-|)[0-9]{4}(\\ |\\-|)[0-9]{4}(\\ |\\-|)[0-9]{4}(\\D|$)
Visa:(\\D|^)4[0-9]{3}(\\ |\\-|)[0-9]{4}(\\ |\\-|)[0-9]{4}(\\ |\\-|)[0-9]{4}(\\D|$)
AMEX:(\\D|^)(34|37)[0-9]{2}(\\ |\\-|)[0-9]{6}(\\ |\\-|)[0-9]{5}(\\D|$)
Diners_Club_1:(\\D|^)30[0-5][0-9](\\ |\\-|)[0-9]{6}(\\ |\\-|)[0-9]{4}(\\D|$)
Diners_Club_2:(\\D|^)(36|38)[0-9]{2}(\\ |\\-|)[0-9]{6}(\\ |\\-|)[0-9]{4}(\\D|$)
Discover:(\\D|^)6011(\\ |\\-|)[0-9]{4}(\\ |\\-|)[0-9]{4}(\\ |\\-|)[0-9]{4}(\\D|$)
JCB_1:(\\D|^)3[0-9]{3}(\\ |\\-|)[0-9]{4}(\\ |\\-|)[0-9]{4}(\\ |\\-|)[0-9]{4}(\\D|$)
JCB_2:(\\D|^)(2131|1800)[0-9]{11}(\\D|$)
Social_Security_Number_dashes:(\\D|^)[0-9]{3}\\-[0-9]{2}\\-[0-9]{4}(\\D|$)
Social_Security_Number_spaces:(\\D|^)[0-9]{3}\\ [0-9]{2}\\ [0-9]{4}(\\D|$)

If the project grows, a good idea could be to build a repository of regular expressions shared between users. I did not test OpenDLP across lot of agents but it looks to do a good job! There is at least one big missing (feature especially for me): a lack of logging! I could by a killer feature to receive events in a third party tool like OSSEC or maybe automatically export the results in XML and process them with a specific OSSEC parser?

The developer has already a list of future enhancements. Amongst others, I’m expecting:

  • Zip support to agent to read Office 2007 and OpenOffice files
  • Support for Microsoft Word and OpenOffice formats
  • A sniffer mode listening for outbound sensitive data

Definitively a nice tool to follow!

8 comments

  1. Benefit:

    Performs checks on potential credit card numbers (and social security numbers) or any other string of text on end point (laptops) using very specific string expressions. Can read plain text only in Office 2007, OpenOffice files and inside ZIP files.

    Limitations:

    1. Scans stored data on endpoints only. Unproven on Windows servers or clusters so one must test very carefully.

    2. Unable to scan non-plain-text or some compressed file formats, including current versions of Office (the .XXXx XML formats) or databases.

    3. No advanced content analysis — regex only, which limits the types of scannable content.

    4. Requires NetBIOS… which some environments ban.

    5. The code is a work in progress and may be a bit messy… which some consider a security concern.

    History:

    OpenDLP (version 0.1) was released in April 2010 and popularized by releasing dozens of blogs. This open source software runs on a central Apache/MySQL web server which is designed only for end point leakage. The last popular update was in August 2010: VirtualBox VM with OpenDLP 0.2.2. Since May 2010, most development has centered on scanning zip files without locking or crashing.

    Conclusion:

    One might add this to laptops so an organization can watch and catch people copying a customers folder locally. Thus this is a narrow implementation of DLP. In summary, this release is too soon to deploy in any production capacity, but definitely worth checking out in a lab over 90 days.

    To produce a report with open source software will require 90+ days in testing then production environments. In my opinion as a previous compliance officer, I would budget $22,800 for the open source project with an expected return of 85%.

  2. I released version 0.2 a couple of days ago that reads inside ZIP files (including Office 2007 and OpenOffice files).

    Version 0.3 will likely include support for looking at databases if you have appropriate credentials. I will probably start with Microsoft SQL server and MySQL, then add more databases in future versions.

  3. While I recognize that this is very early in development, commercial DLP solutions provide much more than just local discovery. Since the issue of data loss is very complex (data types and leakage vectors), an open source solution must be able to address this all.

    Features of commercial DLP products beyond local discovery:

    * Detection capabilities beyond regular expressions. Fingerprinting of database elements and unstructured data, content/context analysis.
    * Network gateway monitoring
    * Blocking of Web/FTP/HTTPS (integration w/ proxy devices via ICAP)
    * Blocking of SMTP (MTA)
    * Device control for endpoints to prevent leakage
    * Content awareness on the endpoint (not just regular expressions)

    –and all this among many other features!

    Not an easy road ahead to really provide an open source alternative.

  4. Very nice project! It definitely fills a large gap in the existing Open Source security offerings. Just a crazy thought but it might make a great component to add into another project like OSSIM to give a more complete end to end Security platform. Also may be able to take advantage of code already in OSSIM to more rapidly deploy new features. FYI I am not working for the OSSIM guys just use the tool and see this as a good fit. Thoughts, comments, etc. let me know.

    Keep up the great work and if I can do anything to help let me know!!

    Thanks –

    Brad

  5. You have to keep in mind that a DLP solution is also a tool to help identify the “broken business processes” you use as examples. A centralized solution can help track and prioritize those issues and help direct remediation resources to those that will have the greatest impact. OpenDLP will certainly find a place among the commercial offerings if it is actively developed.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.