For those who (still) don’t know pastebin.com, it’s a website mainly for developers. Its purpose is very simple: You can “paste” text on the website to share it with other developers, friends, etc. You paste it, optionally define an expiration date, if it’s public or private data and your are good. But for a while, this on-line service is more and more used to post “sensitive” information like passwords or emails lists. By “sensitive“, I mean “stolen” or “leaked” data. Indeed, pastebin.com allows anybody to use their services without any authentication, it’s easy to remain completely anonymous (if you submit data via proxy chains, Tor or any other tool which takes care of your privacy)
In big organizations, marketing departments or agencies learned how to use social networks for a long time. They can follow what has been said about their products and marketing campaigns. In my opinion, it is equally important to follow what’s posted about your organization on pastebin.com! Many people are looking for interesting data on pastebin.com from an offensive point of view. Let’s see how this can also benefit to the defensive side.
For me, pastebin.com became an important source of information and I keep an eye on it every day. But, due to the huge amount of information posted every minute, it is impossible to process it manually. Of course, you can search for some keywords but it’s totally inefficient. In a first time, I grabbed and processed some HTML content using the classic UNIX tools. Later, I found a nice Python script developed by Xavier Garcia: python.py. It checks continuously for data leaks on pastebin.com using regular expressions. I kept it running for a while on a Linux box and it did a quite good job but I needed more! Xavier’s script send the found “pasties” on the console. It is possible to dump the detected pasties by sending a signal to the process. Not always easy. That’s why I decided to go a step further and write my own script! The principle remains the same as the script in Python (why re-invent the wheel?) but I added two features that I found interesting:
- It must run as a daemon (fully detached from the console) and started at boot time.
- It must write its finding in a log file.
The next step sounds logical: If you have a log file, why not process it automatically: Let’s monitor pastebin.com within your SIEM! If you find information posted on pastebin.com, it could be very interesting to be notified (a great added-value for your DLP processes). My script generates Syslog messages and (optionally) CEF (“Common Event Format“) events which can be processed directly by an ArcSight infrastructure. Syslog messages can be processed by any SIEM or log management solution like OSSEC (see below). It is now possible to completely automate the process of detecting potentially sensitive leaked data and to generate alerts on specific conditions.
First install the script on a Linux machine. Requirements are light: a Perl interpreter with a few modules are required (normally all of them are already installed on recent distribution) and a web connectivity to http://pastebin.com:80. If you are behind a proxy, you can define the following environment variable, it will be used by the script:
# export HTTP_PROXY=http://proxy.company.com:8080
The script can be started with some useful options:
Usage: ./pastemon.pl --regex=filepath [--facility=daemon ] [--ignore-case][--debug] [--help]                [--cef-destination=fqdn|ip] [--cef-port=<1-65535>] [--cef-severity=<1-10>] Where: --cef-destination : Send CEF events to the specified destination (ArcSight) --cef-port : UDP port used by the CEF receiver (default: 514) --cef-severity   : Generate CEF events with the very easy to process and can be specified priority (default: 3) --debug          : Enable debug mode (verbose - do not detach) --facility       : Syslog facility to send events to (default: daemon) --help           : What you're reading now. --ignore-case    : Perform case insensitive search --regex          : Configuration file with regular expressions (send SIGUSR1 to reload)
Once running, the script scans for newly uploaded pasties and search for interesting content using regular expressions. There is no limitation on the number of regular expressions (defined in a text file). To not disturb pastebin.com webmasters, the script waits a random number of seconds between each GET requests (between 1 and 5 seconds). There is only one mandatory parameter ‘–regex‘ which gives the text files with all the regular expressions to use (one per line). If one of the regular expressions matches, the following information will be sent to the local Syslog daemon:
Jan 16 14:43:24 lab1 pastemon.pl[29947]: Sending CEF events to 127.0.0.1:514 (severity 10) Jan 16 14:43:24 lab1 pastemon.pl[29947]: Loaded 17 regular expressions from /data/src/pastemon/pastemon.conf Jan 16 14:43:24 lab1 pastemon.pl[29947]: Running with PID 29948 <time flies> Â Jan 16 15:57:48 lab1 pastemon.pl[29948]: Found in http://pastebin.com/raw.php?i=hXYg93Qy : CREATE TABLE (9 times) -- phpMyAdmin SQL Dump (1 times)
All matching regular expressions are listed with their number of occurrences. This can be easily processed by OSSEC using the following decoder:
<decoder name="pastemon"> Â <program_name>^pastemon.pl</program_name> </decoder> <decoder name="pastemon-alert"> Â <parent>pastemon</parent> Â <regex>Found in http://pastebin.com/raw.php?i=\.+ : (\.+) \(</regex> Â <order>data</order> </decoder>
The first regular expression is stored in the OSSEC “data” variable to be used as conditions in rules. Here is an example: The rule #100203 will trigger an alert if some yahoo.com email addresses are leaked in pastebin.com. (Note: This regular expression must be defined in the script configuration file!)
<rule id="100203" level="0"> Â Â Â <decoded_as>pastemon</decoded_as> Â Â Â <description>Data found on pastebin.com.</description> Â </rule> Â <rule id="100204" level="7"> Â Â Â <if_sid>100203</if_sid> Â Â Â <description>Detected yahoo.com email addresses on pastebin.com!</description> Â Â Â <extra_data>@yahoo\.com$</extra_data> Â </rule>
If you have an ArcSight infrastructure, you can enable the CEF events support. The same event as above will be sent to the configured CEF destination and port:
<29>Jan 16 15:57:48 CEF:0|blog.rootshell.be|pastemon.pl|v1.0|regex-match|One or more regex matched|10|request=http://pastebin.com/raw.php?i=hXYg93Qy destinationDnsDomain=pastebin.com msg=Interesting data has been found on pastebin.com. cs0=CREATE TABLE cs0Label=Regex0Name cn0=9 cn0Label=Regex0Count cs1=-- phpMyAdmin SQL Dump cs1Label=Regex1Name cn1=1 cn1Label=Regex1Count
To process the CEF events on ArcSight’s side, configure a new SmartConnector, a new UDP CEF receiver and the events should be correctly parsed:
That looks great! But the next question is: “What to look for on pastebin.com?“. Well, it depends on you… Based on your organization or business, there are things that you can’t miss. Here is a list of useful regular expressions that I often use:
RegEx Purpose --------------------------------------------------------------------- ----------------------------------- company\.com Your company domain name @company\.com Corporate e-mail addresses CompanyName Company name MyFirstName MyLastName Your full name @xme Twitter account 192.168.[1-3].[0-255] IP addresses ranges anonbelgium Hackers groups #lulz Trending Twitter hashtags #anonymous #antisec -----BEGIN RSA PRIVATE KEY----- Interesting data! -----BEGIN DSA PRIVATE KEY----- -----BEGIN CERTIFICATE----- -- MySQL dump Interesting dumps! belgium My country city My city ((4\d{3})|(5[1-5]\d{2})|(6011))-?\d{4}-?\d{4}-?\d{4}|3[4,7]\d{13} Credit cards
If you have interesting regular expressions or ideas, feel free to share!
Source is available here. As usual, this is provided “as is” without any warranty. Happy monitoring!
You can check if your data have been leaked here : http://privacy.is-lost.org
Hi,
has anyone experienced pastemon eating up memory resulting in swap being filled up? Or leaving zombie processes?
Hi Xavier, have a questions.
This actually works fine? You need applied a change in the source code like url parsing:
“$p = ‘http://pastebin.com/raw.php?i=’ . $p;” for “$p = ‘http://pastebin.com/’ . $p;
In parallel in pastebin.com create a paste and searched that with the propioursly search tool in the site and don’t be found.
In the comment for disable proxies cant see what need to remove or comment. Please, can you tell me what remove for not use proxies and go out directly?
“Xavier
9 months ago
To completely disable proxies, just remove or comment the following line in your pastemon.conf file:
…”
Hello Xavier
When I run the script in debug mode displays the following message
DBI::db=HASH(0x201a878)->disconnect invalidates 1 active statement handle (either destroy statement handles or call finish on them before disconnecting) at ./pastemon.pl line 760.
any idea?
Thanks
This looks great, but I only want it to send to ArcSight via CEF and cant get past the SMTP server error.
Hi,
Because this is the purpose of the –dump feature 😉
Hi, the dump function is dumping all date and not only the matched pastie.
Thanks!
Hi Xavier,
I m running the script on a VM and trying to forward the cef output to a LOgger running on another VM on the same host. The problem is i am getting just “cannot fetch pastebin.com:500 ….” and “disabled unreliable proxy” messages in the syslog. Please let me know what am i not doing correctly?
Ok. Thanks 🙂
Hi Frank,
There must be a bug somewhere. I’ll have a look at it.
Hi i’m testing your pastemon, everything ok, but i’ve a question, in a database when keyword is found the column “matched” is blank. If i would search a pastebin that matched a regular expression in a database. How can i do?? Thanks
No problem! Enjoy pastemon!
Apologize for previous post, I should have read a few more lines down. As for the wordpress issue, it stands. Thanks.
I’m testing out pastemon for production use and my end goal is to have it post to a WP site, but while running in debug I keep receiving the error: ‘WordPress configuration disabled: WordPress::XMLRPC not installed’. ‘xmlrpc.php’ is active and available on my site, I do not have ‘http://’ or ‘/’ on my URL within the .conf, I have triple checked my username and password, and verified the category is available within my WP site. Can you offer any help in this area? Is there a dependency I’m missing?
Also when using the default proxies.conf I’m constantly receiving: “Cannot fetch http://www.pastebin.com: 500 Can’t connect to ip.ad.dre.ss:port (timeout)
+++ Disabled unreliable proxy http://ip.ad.dre.ss:port (956 active proxies)”
Any help would be greatly appreciated.
TIA.
Fantastic work, very useful! Thank you.
To completely disable proxies, just remove or comment the following line in your pastemon.conf file:
Thank you for your fast response. It seems like i have my wires crossed, but i dont understand how i can disable the support of proxies. Any advice? Thanks in advance!
To use proxied connections, you must provide a list of proxies (format is [IP|FQDN]:port one by line). Proxies will be selected randomly and removed if not available. It’s up to you to build a reliable list of proxies…
Hi Xavier,
much appreciation for your effort in this script from Austria.
Just updated to the newest version, but the proxy-list doen’t work with the provided entries. Get a lot of log-entries saying “Disabled unreliable proxy…”.
Any suggestions?
Thank you for your contribution! Your patch has been added in the source code.
Xavier,
Thank you very much.
I’d suggest the following patch to fix the input sanitization that seems to have gotten mangled and to add the ability to specify multiple email recipients.
http://pastebin.com/bQenPqzL
Andriy, Perl supports the SOCKS protocol. I have installed a tor client locally and use http_proxy=socks://127.0.0.1:9050.
Heiko
æœ¬ç”°æŠ€ç ”å·¥æ¥æ ªå¼ä¼šç¤¾
本田
ã‚½ãƒ‹ãƒ¼æ ªå¼ä¼šç¤¾
ソニー
you can try with these
Hello Benny,
Output supports non-roman characters:
open(DUMP, “>:encoding(UTF-8)”, “$dumpDir/$pastie.raw”)
The regex file is opened and processed as a regular file. Not tested honestly! Do you have some example? I will test.
For the file with regular expressions, can you use non-roman characters? Umlaut? Cyrillic? Arabic? etc??
Tor support is on my todo list!
HI! Very useful tool!
Has it TOR support?
I’ve been having troubles with proxies I use, because they go offline much times
I also noticed that a few days ago… Code has been updated!
Hi Xavier,
I noticed that the source code of pastebin.com/archive has been changed…that means that no pastie can be fetched.
Regards,
AfterShell.com
Hi guys,
if you have problems behind a proxy and “export” doesn’t work you can also replace the following line:
$ua->env_proxy;
with:
# $ua->env_proxy;
$ua->proxy([‘http’], ‘http://proxy.company.com:port/’);
It is used twice in the script. 😉
Regards,
AfterShell.com
Glad to ear! Enjoy!
Many Thanks Xavier! I’ve modified the script to monitor our sensitive data posted on pastebin. OSSEC is once again my best friend!!!
Geoffrey
This is maybe my fault… Are you sure to use HTTP_PROXY? (upper case). This is important on case-sensitive systems like UNIX. I tested again here and it works!
@Xavier
Yes Xavier I tried this command, with and without quotes, but nothing worked. 🙁
CS,
How do you define your proxy?
$ export http_proxy=”http://my.proxy.com:3128″
The only prob I have is that it doesn’t work behind a proxy. I used the export command but the script doesn’t use the proxy trying to connect to pastebin.
Nice tool. One thing that might help others spare a huge amount of time:
if you use –debug the tool is _NOT_ writing to any logfile.
Regards
Thanks for your effort, Xavier! And indeed since the index is initialized with 1, the Device Custom parameters come in.
Indeed, the CEF dictionary mentions 1-6 custom fields. I fixed this in the script (this will be available after the next commit). Thank you for your tests!
Hmm… I don’t have this issue!?
In the meantime, I’ll make the timeout configurable…
I think I know why I did not see the matched expressions:
Jan 30 14:18:37 CEF:0|blog.rootshell.be|pastemon.pl|v1.3|regex-match|One or more regex matched|3|request=http://pastebin.com/raw.php?i=wcE9Au01 destinationDnsDomain=pastebin.com msg=Interesting data has been found on pastebin.com. cs0=vodafone cs0Label=Regex0Name cn0=1 cn0Label=Regex0Count
It starts indexing at 0 not at 1. I changed line 384 to “my $i = 1;”. Hope it works now…
I can confirm Josh’s issue, I also receive some “it seems you are requesting a little bit too much from Pastebin”. I now doubled the wait timers (i.e. random(3)*2 and random(5)*2) and I am curious to see if it persists…
Heiko
Hi Xavier,
thank for picking this up.
What I mean with the CEF comment is that I cannot see the matching regexes in ArcSight. Now I reviewed your script and can see that it puts the matches in the event. I now configured ArcSight to preserve the raw event so that I can see what the script actually submits to see what happens…
Regards,
Heiko
Hello Heiko,
Thank for you the suggestion/report. I just committed release 1.3 of my script:
– You can know define your own PID file (–pidfile)
– Sample of data can be printed (–sample)
About your commend on the CEF event, the matching regex and their count is already reported using deviceCustomStringX and deviceCustomIntegerX. Or I didn’t understand your remark? Feel free to give me more details.
Hi,
great idea! I implemented the script, forwarding CEF events to ArcSight. I’m curious to see what it catches.
However, I came across two issues:
1. The script tries to write the daemon’s pid to /var/run. Because the script runs as a normal user, this does not work (at least on my machine). I changed this in the script to /tmp, but I would prefer if it was either default or configurable.
2. It would be great to have the matched pattern in the CEF event, e.g. in Message or in requestContext. Then one could tell at a glance from the ArcSight console what was found where.
3. Another idea would be to put a piece of the pastie, e.g. the line containing the matched pattern, in a CEF field like deviceCustomString1
Regards,
Heiko
Hello Nicolas,
Thanks for the idea! I just published a new version of the script which implements this feature.
You can now define rules like “regex1 _EXCLUDE_ regex2”. This could help to get rid of false positives. A good example is looking for countries: If you look for “belgium”, there is a good chance that you will catch HTML code with list of countries. Using “belgium _EXCLUDE_ belize” (Belize is the next country in alphabetical order), you won’t be notified.
Josh,
It’s strange… I’m monitoring pasties constantly for days and no issue here!?
So I’ve been try this and it appears that I’ve gotten my IP address blocked on Pastebin. I guess trying it every 30 seconds was a bit overkill.
Hi Sertan,
That’s why I linked the script with OSSEC! I prefer receiving emails in a unified format from ONE tool instead of being flooded by thousands of scripts output. Thanks for sharing your script too!
PS: Your idea to fake the User-Agent is good btw!
Xavier thanks for sharing the tool.
Sending a CEF event to your SIEM is cool, but I would also recommend adding a mail alert functionality similar to http://pastebin.com/V4LgG9Wr
Also from my experience, though not realtime, 2-3 minutes polling interval is good for fetching all recent pasties.
Adrian,
Good remark! I just uploaded v1.1 which has now a ‘–dump’ option.
You can specify a directory where pasties matching a regex will be saved (raw). This will allow you to check pasties which expired. Thank you for your comment!
Excellent tool, one recommendation only, to add the ability to exclude words. For example posts that contain the word ‘summer’ but not the ones with the word ‘house’ in the same content.
Forgive me if I’ve misunderstood, but does this script download and store the matches it finds? If so, can you make it clear in the article where they are stored; if not, can you add such functionality? It seems to me that this script is only any good if the paste doesn’t have a time limit.