Tag Archives: Ossec

Tracking Processes/Malwares Using OSSEC

TerminatorFor a while, malwares are in front of the security stage and the situation is unlikely to change in the coming months. When I give presentations about malwares, I always like to report two interesting statistics in my slides. They come from the 2012 Verizon DBIR: In 66% of investigated incidents, detection was a matter of months or even more and 69% of data breaches are discoverd by third parties. The problem of malwares can be addressed at two levels: infection & detection. To protect against infection, more and more solutions are provided by security vendors and some are quite performant but they don’t fully protect you. To contain the malware, the detection process is also critical. If you can’t prevent some malwares to be installed, at least let’s try to detect them as soon as possible. To track malicious activity, there is no magic: you have to search for what’s abnormal, to look for stuff occurring below the radar. Malwares try to remain stealthy but they have to perform some actions like altering the operating system and contacting their C&C. To detect such activity, OSSEC is a wonderful tool. I already blogged about a way to detect malicious DNS traffic with OSSEC and the help of online domains blacklists like malwaredomains.com.

Read More →

Keep an Eye on Your Amazon Cloud with OSSEC

Cloud LogsThe Amazon conferencere:Invent” is taking place in Las Vegas at the moment. For a while, I’m using the Amazon cloud services (EC2) mainly to run lab and research systems. Amongst the multiple announcements they already made during the conference, one of them caught my attention: “CloudTrail“. Everything has already been said over the pro & con of cloud computing. But one of them is particularly frustrating if, like me, you like to know what’s happening and to keep an eye on your infrastructure (mainly from a security point of view): who’s doing what, when and from where with your cloud resources? CloudTrail can help you to increase your visibility and is described by Amazon as follow:

CloudTrail provides increased visibility into AWS user activity that occurs within an AWS account and allows you to track changes that were made to AWS resources. CloudTrail makes it easier for customers to demonstrate compliance with internal policies or regulatory standards.

As explained in the Amazon blog post, once enabled, CloudTrail will generate files with events in a specific S3 bucket (that you configure during the setup). Those files will be available like any other data. What about grabbing files at regular interval and create a local logfile that could be processed by a third party tool like… OSSEC?

Generated events are stored as JSON data in gzipped files. I wrote a small Python script which downloads these files and generates a flat file:

$ ./getawslog.py -h
Usage: getawslog.py [options]

Options:
  --version             show program's version number and exit  
  -h, --help            show this help message and exit
  -b LOGBUCKET, --bucket=LOGBUCKET
                        Specify the S3 bucket containing AWS logs
  -d, --debug           Increase verbosity
  -l LOGFILE, --log=LOGFILE
                        Local log file
  -j, --json            Reformat JSON message (default: raw)
  -D, --delete          Delete processed files from the AWS S3 bucket

$ ./getawslog.py -b xxxxxx -l foo.log -d -j -D
+++ Debug mode on
+++ Connecting to Amazon S3
+++ Found new log: xxxxxxxxxxxx_CloudTrail_us-east-1_20131114T1325Z_xxx.json.gz
+++ Found new log: xxxxxxxxxxxx_CloudTrail_us-east-1_20131114T1330Z_xxx.json.gz
+++ Found new log: xxxxxxxxxxxx_CloudTrail_us-east-1_20131114T1335Z_xxx.json.gz
+++ Found new log: xxxxxxxxxxxx_CloudTrail_us-east-1_20131115T0745Z_xxx.json.gz
+++ Found new log: xxxxxxxxxxxx_CloudTrail_us-east-1_20131115T0745Z_xxx.json.gz
+++ Found new log: xxxxxxxxxxxx_CloudTrail_us-east-1_20131115T0750Z_xxx.json.gz

By default, the script will just append the JSON data into the specified file. If you use the “-j” switch, it will parse the received event and store them in a much more convenient way to be further processed by OSSEC (using “items:values” pairs). Here is an example of parsed event:

"eventVersion":"1.0","eventTime":"2013-11-15T07:55:53Z", "requestParameters":"{u'instancesSet': {u'items': [{u'instanceId': u'i-415f473b'}]}}","responseElements":"{u'instancesSet': {u'items': [{u'instanceId': u'i-415f473b', u'currentState': {u'code': 32, u'name': u'shutting-down'}, u'previousState': {u'code': 16, u'name': u'running'}}]}}","awsRegion":"us-east-1","eventName":"TerminateInstances","userIdentity":"{u'principalId': u'xxxxxxxxxxxx', u'accessKeyId': u'xxxxxxxxxxxxxxxxxxxx', u'sessionContext': {u'attributes': {u'creationDate': u'2013-11-15T07:48:03Z', u'mfaAuthenticated': u'false'}}, u'type': u'Root', u'arn': u'arn:aws:iam::xxxxxxxxxxxx:root', u'accountId': u'xxxxxxxxxxxx'}","eventSource":"ec2.amazonaws.com","userAgent":"EC2ConsoleBackend","sourceIPAddress":"xxx.xxx.xxx.xxx"

Within OSSEC, create a new decoder which will extract the information you may find relevant for you. Here is mine:

<decoder name="cloudtrail">
 <prematch>^"eventVersion":"\d.\d"</prematch>
 <regex>"awsRegion":"(\S+)"\.+"eventName":"(\S+)"\.+"sourceIPAddress":"(\d+.\d+.\d+.\d+)"$</regex>
 <order>data,action,srcip</order>
</decoder>

And the event below decoded by OSSEC:

**Phase 1: Completed pre-decoding.
 full event: '"eventVersion":"1.0","eventTime":"2013-11-15T07:55:53Z","requestParameters":"{u'instancesSet': {u'items': [{u'instanceId': u'i-415f473b'}]}}","responseElements":"{u'instancesSet': {u'items': [{u'instanceId': u'i-415f473b', u'currentState': {u'code': 32, u'name': u'shutting-down'}, u'previousState': {u'code': 16, u'name': u'running'}}]}}","awsRegion":"us-east-1","eventName":"TerminateInstances","userIdentity":"{u'principalId': u'xxxxxxxxxxxx', u'accessKeyId': u'xxxxxxxxxxxxxxxxxxxx', u'sessionContext': {u'attributes': {u'creationDate': u'2013-11-15T07:48:03Z', u'mfaAuthenticated': u'false'}}, u'type': u'Root', u'arn': u'arn:aws:iam::xxxxxxxxxxxx:root', u'accountId': u'xxxxxxxxxxxx'}","eventSource":"ec2.amazonaws.com","userAgent":"EC2ConsoleBackend","sourceIPAddress":"xxx.xxx.xxx.xxx"'
 hostname: 'boogey'
 program_name: '(null)'
 log: '"eventVersion":"1.0","eventTime":"2013-11-15T07:55:53Z","requestParameters":"{u'instancesSet': {u'items': [{u'instanceId': u'i-415f473b'}]}}","responseElements":"{u'instancesSet': {u'items': [{u'instanceId': u'i-415f473b', u'currentState': {u'code': 32, u'name': u'shutting-down'}, u'previousState': {u'code': 16, u'name': u'running'}}]}}","awsRegion":"us-east-1","eventName":"TerminateInstances","userIdentity":"{u'principalId': u'xxxxxxxxxxxx', u'accessKeyId': u'xxxxxxxxxxxxxxxxxxxx', u'sessionContext': {u'attributes': {u'creationDate': u'2013-11-15T07:48:03Z', u'mfaAuthenticated': u'false'}}, u'type': u'Root', u'arn': u'arn:aws:iam::xxxxxxxxxxxx:root', u'accountId': u'xxxxxxxxxxxx'}","eventSource":"ec2.amazonaws.com","userAgent":"EC2ConsoleBackend","sourceIPAddress":"xxx.xxx.xxx.xxx"'
**Phase 2: Completed decoding.
 decoder: 'cloudtrail'
 extra_data: 'us-east-1'
 action: 'TerminateInstances'
 srcip: 'xxx.xxx.xxx.xxx'

So easy! Schedule the script via a cron job to grab automatically new events and happy logging! The CloiudTrail service is still in beta and is not (yet) available everywhere (ex: not in the EU region) but seems to be working quite well. My script is available here.

 

Review: Instant OSSEC Host-Based Intrusion Detection System

OSSEC Host-based Intrusion Detection nsystemThe guys from Packt Publishing asked me to review a new book from their “Instant” collection: “OSSEC Host-Based Intrusion Detection“. This collection proposes books with less than 100 pages about multiple topics. The goal is to go straight forward to the topic. OSSEC being one of my favorite application, I could not miss this opportunity! The book author is Brad Lhotsky, a major contributor to the OSSEC community. Amongst the list of reviewers, we find JB Cheng, the OSSEC project manager responsible for OSSEC releases. It is a guarantee of quality for the book!

Read More →

Improving File Integrity Monitoring with OSSEC

File Integrity ErrorFIM or “File Integrity Monitoring” can be defined as the process of validating the integrity of operating system and applications files with a verification method using a hashing algorythm like MD5 or SHA1 and then comparing the current file state with a baseline. A hash will allow the detection of files content modification but other information can be checked too: owner, permissions, modification time. Implemeting file integrity monitoring is a very good way to detect compromized servers. Not only operating system files can be monitored (/etc on UNIX, registry on Windows, share libraries, etc) but also applications (monitoring your index.php or index.html can reveal a defaced website).

During its implementation, a file integrity monitoring project may face two common issues:

  • The baseline used to be compared with the current file status must of course be trusted. To achieve this, it must be stored on a safe place where attacker cannot detect it and cannot alter it!
  • The process must be fine tuned to react only on important changes otherwise they are two risks: The real suspicious changes will be hidden in the massive flow of false-positives. People in charge of the control could miss interesting changes.

There are plenty of tools which implement FIM, commercial as well as free. My choice went to OSSEC for a while. My regular followers know that I already posted lot of articles about it. I also contributed to the project with a patch to add Geolocatization to alerts. This time, I wrote another patch to improve the file integraty monitoring feature of OSSEC.

Read More →

Malicious DNS Traffic: Detection is Good, Proactivity is Better

Google Malware WarningIt looks that our beloved DNS protocol is again the center of interest for some security $VENDORS. For a while, I see more and more the expression “DNS Firewall” used in papers or presentations. It’s not a new buzz… The DNS protocol is well-known to be a excellent vector of infection and/or data exfiltration. But what is a “DNS firewall” or “Strong DNS Resolver“?

Read More →

Howto: Distributed Splunk Architecture

Distributed ArchitectureImplementing a good log management solution is not an easy task! If your organisation decides (should I add “finally“?) to deploy “tools” to manage your huge amount of logs, it’s a very good step forward but it must be properly addressed. Devices and applications have plenty of ways to generate logs. They could send SNMP traps, Syslog messages, write in a flat file, write in a SQL database or even send smoke signals (thanks to our best friends the developers). It’s definitively not an out-of-the-box solution that must be deployed. Please, do NOT trust $VENDORS who argue that their killing-top-notch-solution will be installed in a few days and collect everything for you! Before trying to extract the gold of your logs, you must correctly collect events. This mean first of all: do not loose some of them. It’s a good opportunity to remind the Murphy’s laws here: The lost event will always be the one which contained the most critical piece of information! In most cases, a log management solution will be installed on top of an existing architecture. This involves several constraints:

  • From a security point of view, firewalls will for sure block flows used by the tools. Their policy must be adapted. The same applies to the applications or devices.
  • From a performance point of view, the tools can’t have a negative impact on the “business” traffic.
  • From a compliance point of view, the events must be properly handled in respect of the confidentiality, integrity and availability (you know the well-know CIA principle).
  • From a human point of view (maybe the most important), you will have to fight with other teams and ask them to change the way they work. Be social! ;-)

To achieve those requirements, or at least trying to reach them, your tools must be deployed in a distributed architecture. By “distributed“, I mean using multiple software componants desployed in multiple places in your infrastructure. The primary reason for this is to collect the events as close as possible to their original source. If you do this, you will be able to respect the CIA principle and:

  • To control the resources usage to process them and centralise them
  • To get rid of proprietary or open multiple protocols
  • To control the good processing of them from A to Z.

For those who are regular readers of my blog, you know that I’m a big fan of OSSEC. This solution implements a distributed architecture with agents installed on multiple collection points to grab and centralise the logs:

OSSEC SchemaOSSEC is great but lack of a good web interface to search for events and generate reports. Lot of people interconnect their OSSEC server with a Splunk instance. There is a very good integration of both products using a dedicated Splunk app. Usually, Splunk is deployed on the OSSEC server itself. The classic way to let Splunk collect OSSEC events is to configure a new Syslog destination for alerts like this (in your ossec.conf file):

<syslog_output>
<server>10.10.10.10</server>
<port>10001</port>
</syslog_output>

This configuration blog will send alerts (only!) to Splunk via Syslog messages sent to 10.10.10.10:10001 (where Splunk will listen for them). Note that the latest OSSEC version (2.7) can write native Splunk events over UDP. Personally, I don’t like this way of forwarding events because UDP remains unreliable and only OSSEC alerts are forwarded. I prefer to process the OSSEC files using the file monitor feature of Splunk:

[monitor:///data/ossec/logs]
whitelist=\.log$

But what if you have multiple OSSEC server across multiple locations? Splunk has also a solution for this called the “Universal Forwarder“. Basically, this is a light Splunk instance which is installed without any console. This goal is just to collect events in the native format and forward them to a central Splunk instance (the “Indexer“):

Splunk Schema

If you have experience with ArcSight products, you can compare the Splunk Indexer with the ArcSight Logger and the Universal Forwarder with the SmartConnector. The configuration is pretty straight forward. Let’s assume that you already have a Splunk server running. In your $SPLUNK_HOME/etc/system/local/inputs.conf, create a new input:

[splunktcp-ssl:10002]
 disabled = false
 sourcetype = tcp-10002
 queue = indexQueue

[SSL]
 password = xxxxxxxx
 rootCA = $SPLUNK_HOME/etc/auth/cacert.pem
 serverCert = $SPLUNK_HOME/etc/auth/server.pem

Restart Splunk and it will now bind to port 10002 and wait for incoming traffic. Note that you can use the provided certificate or use your own. It’s of course recommended to encrypt the traffic over SSL! Now install an Universal Forwarder. Like the regular Splunk, packages are available for most modern OS. Let’s play with Ubuntu:

# dpkg -i splunkforwarder-5.0.1-143156-linux-2.6-intel.deb

Configuration can be achieved via the command line but it’s very easy to do it directly by editing the *.conf files. Configure your Indexer in the $SPLUNK_HOME/etc/system/local/outputs.conf:

[tcpout]
 defaultGroup = splunkssl

[tcpout:splunkssl]
 server = splunk.index.tld:10003
 sslVerifyServerCert = false
 sslCertPath = $SPLUNK_HOME/etc/auth/server.pem
 sslPassword = xxxxxxxx
 sslRootCAPath = $SPLUNK_HOME/etc/auth/cacert.pem

The Universal Forwarder inputs.conf file is a normal one. Just define all your sources there and start the process. It will start forwarding all the collected events to the forwarder. This is a quick example which demonstrate how to improve your log collection process. The Universal Forwarder will take care of the collected events and send them safely to your central Splunk instance (compressed, encrypted) and will queue them in case of outage.

A final note, don’t ask me to compare Splunk, OSSEC or ArcSight. I’m not promoting a tool. I just gave you an example of how to deploy a tool, whatever your choice is ;-)

MySQL Attacks Self-Detection

Injection

I’m currently attending the Hashdays security conference in Lucerne (Switzerland). Yesterday I attended a first round of talks (the management session). Amongst all the interesting presentations, Alexander Kornbrust got my attention with his topic: “Self-Defending Databases“. Alexander explained how databases can be configured to detect suspicious queries and prevent attacks. Great ideas but there was only one negative point for me: Only Oracle databases were covered. But it sounds logical; In 2008, Oracle was the first (70%) in terms of database deployments, followed by Microsoft SQLServer (68%) and MySQL (50%). I did not find more recent numbers but the top-3 should remain the same. Alexander gave me the idea to investigate how to do the same with MySQL.

Read More →

Attackers Geolocation in OSSEC

GeolocalizationIf you follow my blog on a regularly basis, you probably already know that I’m a big fan of OSSEC. I’m using it to monitor all my personal systems (servers, labs, websites, etc). Being a day-to-day user, I have always new ideas to extend the product , by using 3rd party tools or by adding features. One of the missing feature (at least for me), is the lack of information when an alert is generated. Tracking the attackers source IP addresses is very nice. Example: OSSEC can trigger active-response scripts to blacklist them during a short period but how can we benefit of more “visibility” with those addresses. When you think how to give more visibility to IP addresses, you immediately think: Geolocation! I already posted an article about the power of geolocation (link) to map alerts onto a Google map (example: a brute-force attack). This is very interesting but required manual actions.  As IP addresses are already known by OSSEC (saved in the variable “srcip“), why not let OSSEC do the job for us in real time?

The problem solved by the geolocation is the following: How to convert IP addresses in coordinates (longitude/latitude) first and then map them to a country and/or city? MaxMind has the solution. This company maintains a database of all the assigned IP addresses (Over 99.5% on a country level and 78% on a city level for the US within a 40 kilometer radius – source: Maxmind) with their mapping to geographic locations. Note that they provide two databases: IPv4 and IPv6. They propose API‘s for several languages like C. Good news OSSEC is written in C! So, I wrote a patch which perform a geolocation of the ‘srcip’ attackers to display the location of the IP address. Let’s go…

Step 1: Install the MaxMind GeoIP API

This is easy as any other open source tool/library. Download the source code and install it. It should not a problem on classic Linux flavors.

# wget http://www.maxmind.com/download/geoip/api/c/GeoIP-1.4.8.tar.gz
# tar xzvf GeoIP-1.4.8.tar.gz
# cd GeoIP-1.4.8
# ./configure
# make
# make install

By default, everything will be installed under /usr/local but you are free to change the multiple paths via the configure script. The API provides C include files and (dynamic/static) libraries, very common.

Step 2: Install or recompile OSSEC with GeoIP localization

If you have already an OSSEC running, you could simply apply my patch to the original source tree. Take care if you already performed personal changes! I created the patch from a standard 2.6 tarball. To enable the GeoIP feature, you have to enable it (like the MySQL support).

# wget http://www.ossec.net/files/ossec-hids-2.6.tar.gz
# tar xzvf ossec-hids-2.6.tar.gz
# wget http://blog.rootshell.be/wp-content/uploads/2012/06/ossec-geoip.patch
# cd ossec-hids-2.6
# patch -p1 < ../ossec-geoip.patch
# cd src
# make setgeoip
# cd ..
# ./install.sh

Step 3: Install the MaxMing GeoIP DBs

MaxMind provides different versions of the databases. I’m using the GeoCityLite. This is the free version which provides precision up to the city. This is far enough for me. Databases are not provided with the API, they must be installed manually:

# cd /var/ossec/etc
# wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz
# gzip -d GeoLiteCity.dat.gz
# wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCityv6-beta/GeoLiteCityv6.dat.gz
# gzip -d -c - >GeoLiteCityv6.dat.gz

I suggest you to download both databases (v4 & v6) and don’t forget that databases are regularly updated! It’s interesting to setup a small cron to install new versions to keep a good accuracy in the results.

Step 4: Fix the OSSEC configuration files

Once the patch installed, new parameters are available to set the GeoIP environment. First, in the ossec.conf “global” section, define the path of your databases.

<global>
  <geoip_db_path>/etc/GeoLiteCity.dat</geoip_db_path>
  <geoip6_db_path>/etc/GeoLiteCityv6.dat</geoip6_db_path>
</global>

In the “alerts” section, activate the GeoIP feature:

<alerts>
  <use_geoip>yes</use_geoip>
</alerts>

Finally, in the internal_options.conf, enable (or disable) the display of GeoIP information in the notification emails:

# Maild GeoIP support (0=disabled, 1=enabled)
maild.geoip=1

How does it work? The OSSEC process which performs the GeoIP lookups is “ossec-analysisd“. When an alert must be logged and if a “srcip” has been decoded, the IP address is passed to a new function GeoIPLookup() which calls the MaxMind API and returns a string with the geolocation data. The data is added to the alert text. The second component which has been patched is “ossec-maild” which parses the alerts and send emails also with the GeoIP data (if enabled).

During the configuration, don’t forget that “ossec-analysisd” runs chrooted (in the main OSSEC directory). Don’t forget to adapt the path to the GeoIP databases! (they must be defined relative to the chroot environment).

Here follow some examples of alerts with GeoIP data enabled:

** Alert 1338899194.500996: - apache,invalid_request,
2012 Jun 05 14:26:34 (xxx) x.x.x.x->/var/log/apache/error_log
Rule: 30115 (level 5) -> 'Invalid URI (bad client request).'
Src IP: 94.45.169.133
Src Location: RU,Moscow City,Moscow
[Tue Jun 05 14:26:34 2012] [error] [client 94.45.169.133] [deleted]

** Alert 1338901319.507426: - syslog,postfix,spam,
2012 Jun 05 15:01:59 (xxx) x.x.x.x->/var/log/syslog
Rule: 3303 (level 5) -> 'Sender domain is not found (450: Requested mail action not taken).'
Src IP: 188.142.66.168
Src Location: NL,Zuid-Holland,Alphen
Jun  5 15:01:43 xxx postfix/smtpd[7397]: NOQUEUE: reject: [deleted]

Received From: (xxx) x.x.x.x->/var/log/apache/access_log
Rule: 100106 fired (level 15) -> "PHP CGI-bin vulnerability attempt."
Src Location: IL,Tel Aviv,Tel Aviv-yafo
Portion of the log(s):

77.126.209.178 - - [05/Jun/2012:16:42:07 +0200] [deleted]

Note that GeoIP lookups will be successful only for alerts which have a valid “srcip” field! In all case, the returned location will be “(null)“! What about the impact on performances? I’m using this patch in production for a few days and I did not notice any performance degradation. The GeoIP lookup is performed only once and parsed later by ossec_maild. (I’ve an average of 2000 alerts logged per day).

My OSSEC server is compiled with the support for MySQL. I did not detect any incompatibility between MySQL and my patch. At this time, the geolocation data are not sent to the MySQL alert table. But this could be done easily: Storing the latitude/longitude could be helpful to map attacks in real time using a 3rd party tool.

My patch is available here. Feel free to download it, use it and maybe improve it. Comments and suggestions will be appreciated. Final disclaimer, I won’t be responsible of you break your current OSSEC setup…

Monitor your Monitoring Tools

Check YuorselfWe (and I’m fully part of it) deploy and use plenty of security monitoring tools daily. As our beloved data is often spread across complex infrastructures or simply across multiple physical locations, we have to collect interesting information and bring them in a central place for further analysis. That’s called “log management“. Based on your collected events, you can generate alerts, build reports. Nice! But… if systems and applications generate [hundreds|thousands|millions] of events, those ones are processed by the same kind of hardware running some piece of software. Hardware may fail (network outage, power outage, disk crash) and softwares have bugs (plenty of).

Read More →

Monitoring pastebin.com within your SIEM

Pastebin Cat

(Source: pastebin.com)

For those who (still) don’t know pastebin.com, it’s  a website mainly for developers. Its purpose is very simple: You can “paste” text on the website to share it with other developers, friends, etc. You paste it, optionally define an expiration date, if it’s public or private data and your are good. But for a while, this on-line service is more and more used to post “sensitive” information like passwords or emails lists. By “sensitive“, I mean “stolen” or “leaked” data. Indeed, pastebin.com allows anybody to use their services without any authentication, it’s easy to remain completely anonymous (if you submit data via proxy chains, Tor or any other tool which takes care of your privacy)

In big organizations, marketing departments or agencies learned how to use social networks for a long time. They can follow what has been said about their products and marketing campaigns. In my opinion, it is equally important to follow what’s posted about your organization on pastebin.com! Many people are looking for interesting data on pastebin.com from an offensive point of view. Let’s see how this can also benefit to the defensive side.

For me, pastebin.com became an important source of information and I keep an eye on it every day. But, due to the huge amount of information posted every minute, it is impossible to process it manually. Of course, you can search for some keywords but it’s totally inefficient. In a first time, I grabbed and processed some HTML content using the classic UNIX tools. Later, I found a nice Python script developed by Xavier Garcia: python.py. It checks continuously for data leaks on pastebin.com using regular expressions. I kept it running for a while on a Linux box and it did a quite good job but I needed more! Xavier’s script send the found “pasties” on the console. It is possible to dump the detected pasties by sending a signal to the process. Not always easy. That’s why I decided to go a step further and write my own script! The principle remains the same as the script in Python (why re-invent the wheel?) but I added two features that I found interesting:

  • It must run as a daemon (fully detached from the console) and started at boot time.
  • It must write its finding in a log file.

The next step sounds logical: If you have a log file, why not process it automatically: Let’s monitor pastebin.com within your SIEM! If you find information posted on pastebin.com, it could be very interesting to be notified (a great added-value for your DLP processes). My script generates Syslog messages and (optionally) CEF (“Common Event Format“) events which can be processed directly by an ArcSight infrastructure. Syslog messages can be processed by any SIEM or log management solution like OSSEC (see below). It is now possible to completely automate the process of detecting potentially sensitive leaked data and to generate alerts on specific conditions.

First install the script on a Linux machine. Requirements are light: a Perl interpreter with a few modules are required (normally all of them are already installed on recent distribution) and a web connectivity to http://pastebin.com:80. If you are behind a proxy, you can define the following environment variable, it will be used by the script:

  # export HTTP_PROXY=http://proxy.company.com:8080

The script can be started with some useful options:

  Usage: ./pastemon.pl --regex=filepath [--facility=daemon ] [--ignore-case][--debug] [--help]
                       [--cef-destination=fqdn|ip] [--cef-port=<1-65535>] [--cef-severity=<1-10>]
  Where:
  --cef-destination : Send CEF events to the specified destination (ArcSight)
  --cef-port        : UDP port used by the CEF receiver (default: 514)
  --cef-severity    : Generate CEF events with the very easy to process and can be specified priority 
                      (default: 3)
  --debug           : Enable debug mode (verbose - do not detach)
  --facility        : Syslog facility to send events to (default: daemon)
  --help            : What you're reading now.
  --ignore-case     : Perform case insensitive search
  --regex           : Configuration file with regular expressions (send SIGUSR1 to reload)

Once running, the script scans for newly uploaded pasties and search for interesting content using regular expressions. There is no limitation on the number of regular expressions (defined in a text file). To not disturb pastebin.com webmasters, the script waits a random number of seconds between each GET requests (between 1 and 5 seconds). There is only one mandatory parameter ‘–regex‘ which gives the text files with all the regular expressions to use (one per line). If one of the regular expressions matches, the following information will be sent to the local Syslog daemon:

  Jan 16 14:43:24 lab1 pastemon.pl[29947]: Sending CEF events to 127.0.0.1:514 (severity 10)
  Jan 16 14:43:24 lab1 pastemon.pl[29947]: Loaded 17 regular expressions from /data/src/pastemon/pastemon.conf
  Jan 16 14:43:24 lab1 pastemon.pl[29947]: Running with PID 29948
  <time flies>
  Jan 16 15:57:48 lab1 pastemon.pl[29948]: Found in http://pastebin.com/raw.php?i=hXYg93Qy : CREATE TABLE (9 times) -- phpMyAdmin SQL Dump (1 times)

All matching regular expressions are listed with their number of occurrences. This can be easily processed by OSSEC using the following decoder:

  <decoder name="pastemon">
    <program_name>^pastemon.pl</program_name>
  </decoder>

  <decoder name="pastemon-alert">
    <parent>pastemon</parent>
    <regex>Found in http://pastebin.com/raw.php?i=\.+ : (\.+) \(</regex>
    <order>data</order>
  </decoder>

The first regular expression is stored in the OSSEC “data” variable to be used as  conditions in rules. Here is an example: The rule #100203 will trigger an alert if some yahoo.com email addresses are leaked in pastebin.com. (Note: This regular expression must be defined in the script configuration file!)

  <rule id="100203" level="0">
    <decoded_as>pastemon</decoded_as>
    <description>Data found on pastebin.com.</description>
  </rule>

  <rule id="100204" level="7">
    <if_sid>100203</if_sid>
    <description>Detected yahoo.com email addresses on pastebin.com!</description>
    <extra_data>@yahoo\.com$</extra_data>
  </rule>

If you have an ArcSight infrastructure, you can enable the CEF events support. The same event as above will be sent to the configured CEF destination and port:

<29>Jan 16 15:57:48 CEF:0|blog.rootshell.be|pastemon.pl|v1.0|regex-match|One or more regex matched|10|request=http://pastebin.com/raw.php?i=hXYg93Qy destinationDnsDomain=pastebin.com msg=Interesting data has been found on pastebin.com.
cs0=CREATE TABLE cs0Label=Regex0Name cn0=9 cn0Label=Regex0Count cs1=-- phpMyAdmin SQL Dump cs1Label=Regex1Name cn1=1 cn1Label=Regex1Count

To process the CEF events on ArcSight’s side, configure a new SmartConnector, a new UDP CEF receiver and the events should be correctly parsed:

Parsed pastemon.pl events

(Click to enlarge)

That looks great! But the next question is: “What to look for on pastebin.com?“. Well, it depends on you… Based on your organization or business, there are things that you can’t miss. Here is a list of useful regular expressions that I often use:

RegEx                                                                  Purpose
---------------------------------------------------------------------  -----------------------------------
company\.com                                                           Your company domain name
@company\.com                                                          Corporate e-mail addresses
CompanyName                                                            Company name
MyFirstName MyLastName                                                 Your full name
@xme                                                                   Twitter account
192.168.[1-3].[0-255]                                                  IP addresses ranges
anonbelgium                                                            Hackers groups
#lulz                                                                  Trending Twitter hashtags
#anonymous
#antisec
-----BEGIN RSA PRIVATE KEY-----                                        Interesting data!
-----BEGIN DSA PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
-- MySQL dump                                                          Interesting dumps!
belgium                                                                My country
city                                                                   My city
((4\d{3})|(5[1-5]\d{2})|(6011))-?\d{4}-?\d{4}-?\d{4}|3[4,7]\d{13}      Credit cards

If you have interesting regular expressions or ideas, feel free to share!

Source is available here. As usual, this is provided “as is” without any warranty. Happy monitoring!