Monitoring pastebin.com within your SIEM

January 17, 2012 Security, Software, Websites 57 comments

For those who (still) don’t know pastebin.com, it’sÂ a website mainly for developers. Its purpose is very simple: You can “paste” text on the website to share it with other developers, friends, etc. You paste it, optionally define an expiration date, if it’s public or private data and your are good. But for a while, this on-line service is more and more used to post “sensitive” information like passwords or emails lists. By “sensitive“, I mean “stolen” or “leaked” data. Indeed, pastebin.com allows anybody to use their services without any authentication, it’s easy to remain completely anonymous (if you submit data via proxy chains, Tor or any other tool which takes care of your privacy)

In big organizations, marketing departments or agencies learned how to use social networks for a long time. They can follow what has been said about their products and marketing campaigns. In my opinion, it is equally important to follow what’s posted about your organization on pastebin.com! Many people are looking for interesting data on pastebin.com from an offensive point of view. Let’s see how this can also benefit to the defensive side.

For me, pastebin.com became an important source of information and I keep an eye on it every day. But, due to the huge amount of information posted every minute, it is impossible to process it manually. Of course, you can search for some keywords but it’s totally inefficient. In a first time, I grabbed and processed some HTML content using the classic UNIX tools. Later, I found a nice Python script developed by Xavier Garcia: python.py. It checks continuously for data leaks on pastebin.com using regular expressions. I kept it running for a while on a Linux box and it did a quite good job but I needed more! Xavier’s script send the found “pasties” on the console. It is possible to dump the detected pasties by sending a signal to the process. Not always easy. That’s why I decided to go a step further and write my own script! The principle remains the same as the script in Python (why re-invent the wheel?) but I added two features that I found interesting:

It must run as a daemon (fully detached from the console) and started at boot time.
It must write its finding in a log file.

The next step sounds logical: If you have a log file, why not process it automatically: Let’s monitor pastebin.com within your SIEM! If you find information posted on pastebin.com, it could be very interesting to be notified (a great added-value for your DLP processes). My script generates Syslog messages and (optionally) CEF (“Common Event Format“) events which can be processed directly by an ArcSight infrastructure. Syslog messages can be processed by any SIEM or log management solution like OSSEC (see below). It is now possible to completely automate the process of detecting potentially sensitive leaked data and to generate alerts on specific conditions.

First install the script on a Linux machine. Requirements are light: a Perl interpreter with a few modules are required (normally all of them are already installed on recent distribution) and a web connectivity to http://pastebin.com:80. If you are behind a proxy, you can define the following environment variable, it will be used by the script:

  # export HTTP_PROXY=http://proxy.company.com:8080

The script can be started with some useful options:

  Usage: ./pastemon.pl --regex=filepath [--facility=daemon ] [--ignore-case][--debug] [--help]
Â Â   Â Â Â Â Â Â Â Â Â Â Â Â Â       [--cef-destination=fqdn|ip] [--cef-port=<1-65535>] [--cef-severity=<1-10>]
  Where:
  --cef-destination : Send CEF events to the specified destination (ArcSight)
  --cef-port        : UDP port used by the CEF receiver (default: 514)
  --cef-severityÂ Â Â  : Generate CEF events with the very easy to process and can be specified priority 
                      (default: 3)
  --debugÂ Â Â Â Â Â Â Â Â Â  : Enable debug mode (verbose - do not detach)
  --facilityÂ Â Â Â Â Â Â  : Syslog facility to send events to (default: daemon)
  --helpÂ Â Â Â Â Â Â Â Â Â Â  : What you're reading now.
  --ignore-caseÂ Â Â Â  : Perform case insensitive search
  --regexÂ Â Â Â Â Â Â Â Â Â  : Configuration file with regular expressions (send SIGUSR1 to reload)

Once running, the script scans for newly uploaded pasties and search for interesting content using regular expressions. There is no limitation on the number of regular expressions (defined in a text file). To not disturb pastebin.com webmasters, the script waits a random number of seconds between each GET requests (between 1 and 5 seconds). There is only one mandatory parameter ‘–regex‘ which gives the text files with all the regular expressions to use (one per line). If one of the regular expressions matches, the following information will be sent to the local Syslog daemon:

  Jan 16 14:43:24 lab1 pastemon.pl[29947]: Sending CEF events to 127.0.0.1:514 (severity 10)
  Jan 16 14:43:24 lab1 pastemon.pl[29947]: Loaded 17 regular expressions from /data/src/pastemon/pastemon.conf
  Jan 16 14:43:24 lab1 pastemon.pl[29947]: Running with PID 29948
  <time flies>
 Â Jan 16 15:57:48 lab1 pastemon.pl[29948]: Found in http://pastebin.com/raw.php?i=hXYg93Qy : CREATE TABLE (9 times) -- phpMyAdmin SQL Dump (1 times)

All matching regular expressions are listed with their number of occurrences. This can be easily processed by OSSEC using the following decoder:

  <decoder name="pastemon">
Â    <program_name>^pastemon.pl</program_name>
  </decoder>

  <decoder name="pastemon-alert">
Â    <parent>pastemon</parent>
Â    <regex>Found in http://pastebin.com/raw.php?i=\.+ : (\.+) \(</regex>
Â    <order>data</order>
  </decoder>

The first regular expression is stored in the OSSEC “data” variable to be used asÂ conditions in rules. Here is an example: The rule #100203 will trigger an alert if some yahoo.com email addresses are leaked in pastebin.com. (Note: This regular expression must be defined in the script configuration file!)

  <rule id="100203" level="0">
Â Â Â  <decoded_as>pastemon</decoded_as>
Â Â Â  <description>Data found on pastebin.com.</description>
Â  </rule>

Â  <rule id="100204" level="7">
Â Â Â  <if_sid>100203</if_sid>
Â Â Â  <description>Detected yahoo.com email addresses on pastebin.com!</description>
Â Â Â  <extra_data>@yahoo\.com$</extra_data>
Â  </rule>

If you have an ArcSight infrastructure, you can enable the CEF events support. The same event as above will be sent to the configured CEF destination and port:

<29>Jan 16 15:57:48 CEF:0|blog.rootshell.be|pastemon.pl|v1.0|regex-match|One or more regex matched|10|request=http://pastebin.com/raw.php?i=hXYg93Qy destinationDnsDomain=pastebin.com msg=Interesting data has been found on pastebin.com.
cs0=CREATE TABLE cs0Label=Regex0Name cn0=9 cn0Label=Regex0Count cs1=-- phpMyAdmin SQL Dump cs1Label=Regex1Name cn1=1 cn1Label=Regex1Count

To process the CEF events on ArcSight’s side, configure a new SmartConnector, a new UDP CEF receiver and the events should be correctly parsed:

Parsed pastemon.pl events — (Click to enlarge)

That looks great! But the next question is: “What to look for on pastebin.com?“. Well, it depends on you… Based on your organization or business, there are things that you can’t miss. Here is a list of useful regular expressions that I often use:

RegEx                                                                  Purpose
---------------------------------------------------------------------  -----------------------------------
company\.com                                                           Your company domain name
@company\.com                                                          Corporate e-mail addresses
CompanyName                                                            Company name
MyFirstName MyLastName                                                 Your full name
@xme                                                                   Twitter account
192.168.[1-3].[0-255]                                                  IP addresses ranges
anonbelgium                                                            Hackers groups
#lulz                                                                  Trending Twitter hashtags
#anonymous
#antisec
-----BEGIN RSA PRIVATE KEY-----                                        Interesting data!
-----BEGIN DSA PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
-- MySQL dump                                                          Interesting dumps!
belgium                                                                My country
city                                                                   My city
((4\d{3})|(5[1-5]\d{2})|(6011))-?\d{4}-?\d{4}-?\d{4}|3[4,7]\d{13}      Credit cards

If you have interesting regular expressions or ideas, feel free to share!

Source is available here. As usual, this is provided “as is” without any warranty. Happy monitoring!

57 comments

Robert says:

January 30, 2015 at 11:14

You can check if your data have been leaked here : http://privacy.is-lost.org
Heiko says:

January 27, 2014 at 13:43

Hi,

has anyone experienced pastemon eating up memory resulting in swap being filled up? Or leaving zombie processes?
h2 says:

September 4, 2013 at 18:51

Hi Xavier, have a questions.
This actually works fine? You need applied a change in the source code like url parsing:

“$p = ‘http://pastebin.com/raw.php?i=’ . $p;” for “$p = ‘http://pastebin.com/’ . $p;

In parallel in pastebin.com create a paste and searched that with the propioursly search tool in the site and don’t be found.

In the comment for disable proxies cant see what need to remove or comment. Please, can you tell me what remove for not use proxies and go out directly?

“Xavier
9 months ago
To completely disable proxies, just remove or comment the following line in your pastemon.conf file:
â€¦”
Enrique says:

July 25, 2013 at 17:08

Hello Xavier
When I run the script in debug mode displays the following message

DBI::db=HASH(0x201a878)->disconnect invalidates 1 active statement handle (either destroy statement handles or call finish on them before disconnecting) at ./pastemon.pl line 760.

any idea?

Thanks
Jamie says:

July 19, 2013 at 20:22

This looks great, but I only want it to send to ArcSight via CEF and cant get past the SMTP server error.
Xavier says:

July 17, 2013 at 08:17

Hi,
Because this is the purpose of the –dump feature 😉
Filipe says:

July 16, 2013 at 20:47

Hi, the dump function is dumping all date and not only the matched pastie.

Thanks!
Rick Elangbaum says:

June 24, 2013 at 07:10

Hi Xavier,
I m running the script on a VM and trying to forward the cef output to a LOgger running on another VM on the same host. The problem is i am getting just “cannot fetch pastebin.com:500 ….” and “disabled unreliable proxy” messages in the syslog. Please let me know what am i not doing correctly?
Frank says:

May 10, 2013 at 10:34

Ok. Thanks 🙂
Xavier says:

May 9, 2013 at 15:24

Hi Frank,
There must be a bug somewhere. I’ll have a look at it.
Frank says:

May 9, 2013 at 13:54

Hi i’m testing your pastemon, everything ok, but i’ve a question, in a database when keyword is found the column “matched” is blank. If i would search a pastebin that matched a regular expression in a database. How can i do?? Thanks
Xavier says:

April 12, 2013 at 17:37

No problem! Enjoy pastemon!
trynreadme says:

April 12, 2013 at 00:42

Apologize for previous post, I should have read a few more lines down. As for the wordpress issue, it stands. Thanks.
trynreadme says:

April 12, 2013 at 00:08

I’m testing out pastemon for production use and my end goal is to have it post to a WP site, but while running in debug I keep receiving the error: ‘WordPress configuration disabled: WordPress::XMLRPC not installed’. ‘xmlrpc.php’ is active and available on my site, I do not have ‘http://’ or ‘/’ on my URL within the .conf, I have triple checked my username and password, and verified the category is available within my WP site. Can you offer any help in this area? Is there a dependency I’m missing?

Also when using the default proxies.conf I’m constantly receiving: “Cannot fetch http://www.pastebin.com: 500 Can’t connect to ip.ad.dre.ss:port (timeout)
+++ Disabled unreliable proxy http://ip.ad.dre.ss:port (956 active proxies)”

Any help would be greatly appreciated.
TIA.
phocean says:

January 12, 2013 at 17:20

Fantastic work, very useful! Thank you.
Xavier says:

November 20, 2012 at 08:42

To completely disable proxies, just remove or comment the following line in your pastemon.conf file:
…
fil says:

November 20, 2012 at 00:55

Thank you for your fast response. It seems like i have my wires crossed, but i dont understand how i can disable the support of proxies. Any advice? Thanks in advance!
Xavier says:

November 19, 2012 at 12:43

To use proxied connections, you must provide a list of proxies (format is [IP|FQDN]:port one by line). Proxies will be selected randomly and removed if not available. It’s up to you to build a reliable list of proxies…
fil says:

November 18, 2012 at 11:23

Hi Xavier,
much appreciation for your effort in this script from Austria.
Just updated to the newest version, but the proxy-list doen’t work with the provided entries. Get a lot of log-entries saying “Disabled unreliable proxy…”.
Any suggestions?
Xavier says:

October 15, 2012 at 12:13

Thank you for your contribution! Your patch has been added in the source code.
Corey says:

October 5, 2012 at 03:24

Xavier,

Thank you very much.

I’d suggest the following patch to fix the input sanitization that seems to have gotten mangled and to add the ability to specify multiple email recipients.

http://pastebin.com/bQenPqzL
Heiko says:

September 24, 2012 at 13:50

Andriy, Perl supports the SOCKS protocol. I have installed a tor client locally and use http_proxy=socks://127.0.0.1:9050.

Heiko
Bkay says:

May 26, 2012 at 13:59

æœ¬ç”°æŠ€ç ”å·¥æ¥æ ªå¼ä¼šç¤¾
æœ¬ç”°
ã‚½ãƒ‹ãƒ¼æ ªå¼ä¼šç¤¾
ã‚½ãƒ‹ãƒ¼

you can try with these
Xavier says:

May 17, 2012 at 08:38

Hello Benny,
Output supports non-roman characters:
open(DUMP, “>:encoding(UTF-8)”, “$dumpDir/$pastie.raw”)
The regex file is opened and processed as a regular file. Not tested honestly! Do you have some example? I will test.
Bkay says:

May 17, 2012 at 08:25

For the file with regular expressions, can you use non-roman characters? Umlaut? Cyrillic? Arabic? etc??
Pingback: Monitoring pastebin.com within your SIEM | SecOps
Xavier says:

April 14, 2012 at 15:00

Tor support is on my todo list!
Andriy says:

April 10, 2012 at 15:18

HI! Very useful tool!

Has it TOR support?
I’ve been having troubles with proxies I use, because they go offline much times
Pingback: OSINT monitoring with scripts | Overhack
Xavier says:

February 27, 2012 at 19:48

I also noticed that a few days ago… Code has been updated!
AfterShell.com says:

February 27, 2012 at 17:00

Hi Xavier,

I noticed that the source code of pastebin.com/archive has been changed…that means that no pastie can be fetched.

Regards,
AfterShell.com
Aftershell.com says:

February 27, 2012 at 15:53

Hi guys,

if you have problems behind a proxy and “export” doesn’t work you can also replace the following line:
$ua->env_proxy;

with:
# $ua->env_proxy;
$ua->proxy([‘http’], ‘http://proxy.company.com:port/’);

It is used twice in the script. 😉

Regards,
AfterShell.com
Xavier says:

February 17, 2012 at 12:50

Glad to ear! Enjoy!
Geoffrey says:

February 17, 2012 at 12:33

Many Thanks Xavier! I’ve modified the script to monitor our sensitive data posted on pastebin. OSSEC is once again my best friend!!!

Geoffrey
Pingback: Monitor RSS feeds of Wordpress blogs/websites within SIEM (ArcSight) – WPrssmon « AfterShell.com – Information Security Blog
Xavier says:

February 15, 2012 at 16:37

This is maybe my fault… Are you sure to use HTTP_PROXY? (upper case). This is important on case-sensitive systems like UNIX. I tested again here and it works!
CS says:

February 14, 2012 at 10:39

@Xavier
Yes Xavier I tried this command, with and without quotes, but nothing worked. 🙁
Xavier says:

February 13, 2012 at 14:55

CS,
How do you define your proxy?
$ export http_proxy=”http://my.proxy.com:3128″
CS says:

February 13, 2012 at 11:34

The only prob I have is that it doesn’t work behind a proxy. I used the export command but the script doesn’t use the proxy trying to connect to pastebin.
Sens0r says:

February 6, 2012 at 16:58

Nice tool. One thing that might help others spare a huge amount of time:
if you use –debug the tool is _NOT_ writing to any logfile.

Regards
Pingback: How to monitor pastebin.com within your SIEM (ArcSight) « AfterShell.com – Information Security Blog
Heiko says:

January 31, 2012 at 09:40

Thanks for your effort, Xavier! And indeed since the index is initialized with 1, the Device Custom parameters come in.
Xavier says:

January 30, 2012 at 17:16

Indeed, the CEF dictionary mentions 1-6 custom fields. I fixed this in the script (this will be available after the next commit). Thank you for your tests!
Xavier says:

January 30, 2012 at 16:18

Hmm… I don’t have this issue!?
In the meantime, I’ll make the timeout configurable…
Heiko says:

January 30, 2012 at 15:39

I think I know why I did not see the matched expressions:

Jan 30 14:18:37 CEF:0|blog.rootshell.be|pastemon.pl|v1.3|regex-match|One or more regex matched|3|request=http://pastebin.com/raw.php?i=wcE9Au01 destinationDnsDomain=pastebin.com msg=Interesting data has been found on pastebin.com. cs0=vodafone cs0Label=Regex0Name cn0=1 cn0Label=Regex0Count

It starts indexing at 0 not at 1. I changed line 384 to “my $i = 1;”. Hope it works now…
Heiko says:

January 30, 2012 at 09:42

I can confirm Josh’s issue, I also receive some “it seems you are requesting a little bit too much from Pastebin”. I now doubled the wait timers (i.e. random(3)*2 and random(5)*2) and I am curious to see if it persists…

Heiko
Heiko says:

January 30, 2012 at 09:34

Hi Xavier,

thank for picking this up.

What I mean with the CEF comment is that I cannot see the matching regexes in ArcSight. Now I reviewed your script and can see that it puts the matches in the event. I now configured ArcSight to preserve the raw event so that I can see what the script actually submits to see what happens…

Regards,
Heiko
Xavier says:

January 28, 2012 at 11:53

Hello Heiko,

Thank for you the suggestion/report. I just committed release 1.3 of my script:
– You can know define your own PID file (–pidfile)
– Sample of data can be printed (–sample)

About your commend on the CEF event, the matching regex and their count is already reported using deviceCustomStringX and deviceCustomIntegerX. Or I didn’t understand your remark? Feel free to give me more details.
Heiko says:

January 25, 2012 at 17:57

Hi,

great idea! I implemented the script, forwarding CEF events to ArcSight. I’m curious to see what it catches.

However, I came across two issues:

1. The script tries to write the daemon’s pid to /var/run. Because the script runs as a normal user, this does not work (at least on my machine). I changed this in the script to /tmp, but I would prefer if it was either default or configurable.

2. It would be great to have the matched pattern in the CEF event, e.g. in Message or in requestContext. Then one could tell at a glance from the ArcSight console what was found where.

3. Another idea would be to put a piece of the pastie, e.g. the line containing the matched pattern, in a CEF field like deviceCustomString1

Regards,
Heiko
Xavier says:

January 24, 2012 at 09:27

Hello Nicolas,
Thanks for the idea! I just published a new version of the script which implements this feature.
You can now define rules like “regex1 _EXCLUDE_ regex2”. This could help to get rid of false positives. A good example is looking for countries: If you look for “belgium”, there is a good chance that you will catch HTML code with list of countries. Using “belgium _EXCLUDE_ belize” (Belize is the next country in alphabetical order), you won’t be notified.
Xavier says:

January 24, 2012 at 09:05

Josh,
It’s strange… I’m monitoring pasties constantly for days and no issue here!?
Josh says:

January 23, 2012 at 17:29

So I’ve been try this and it appears that I’ve gotten my IP address blocked on Pastebin. I guess trying it every 30 seconds was a bit overkill.
Xavier says:

January 20, 2012 at 23:22

Hi Sertan,
That’s why I linked the script with OSSEC! I prefer receiving emails in a unified format from ONE tool instead of being flooded by thousands of scripts output. Thanks for sharing your script too!

PS: Your idea to fake the User-Agent is good btw!
Sertan Kolat says:

January 20, 2012 at 17:41

Xavier thanks for sharing the tool.

Sending a CEF event to your SIEM is cool, but I would also recommend adding a mail alert functionality similar to http://pastebin.com/V4LgG9Wr

Also from my experience, though not realtime, 2-3 minutes polling interval is good for fetching all recent pasties.
Xavier says:

January 20, 2012 at 16:16

Adrian,
Good remark! I just uploaded v1.1 which has now a ‘–dump’ option.
You can specify a directory where pasties matching a regex will be saved (raw). This will allow you to check pasties which expired. Thank you for your comment!
Nicolas says:

January 19, 2012 at 21:07

Excellent tool, one recommendation only, to add the ability to exclude words. For example posts that contain the word ‘summer’ but not the ones with the word ‘house’ in the same content.
Adrian says:

January 19, 2012 at 12:54

Forgive me if I’ve misunderstood, but does this script download and store the matches it finds? If so, can you make it clear in the article where they are stored; if not, can you add such functionality? It seems to me that this script is only any good if the paste doesn’t have a time limit.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Leave a Reply