Searching for Sensitive Data Using URL Shorteners

URL Shorteners
(Source: alaev.info)

URL Shorteners are online services which reduce the length of URL’s. Web applications are more and more complex and their URL’s can have multiple parameters like pages, sessionsID’s and much more. At the same time, we use services which limit the messages size (like Twitter) or devices (like SmartPhones) which are not handy to type long texts. Shortened URL’s are based on a limited and randomized set of alphanumeric characters and are handled by the URL shortener website. When you access it, it will redirect you to the real URL (the common way is to use the HTTP code 301). Example: By accessing http://bit.ly/bQNokR, you’ll be redirected to this blog. Simple and powerful. This is great!

So simple that such services can also be used by the bad guys to distribute malicious URLs in pseudo-safe addresses. Hopefully, some URL Shorteners propose a preview of the original URL before redirecting you but it’s not automatic. More and more applications are able to handle short URL’s and resolve them for you. On bit.ly, a good security tip is to append a “+” sign after the short URL. You’ll be redirected to the service homepage first and be able to read some useful information about the URL.

Another security issue is the protocols specified in the URL. Classic ones are “HTTP” or “FTP” but some URL Shorteners are very laxist and permit a large set of protocols. Example on tinyurl.com, the following URL’s are accepted:

  • smb://share/dir/file.exe
  • file:///c:/temp/virus.exe

During the last edition of hack.lu, Saumil Shah demonstrated how to use such service coupled with VLC to pwn a browser.

Another approach is to search the URL Shorteners history for interesting stuff. For some services, once you created a short URL, it remains available permanently! This is the case with bit.ly. Here is an extract of their help section:

Can I delete a bit.ly link?
We believe that being a legitimate shortening service means offering permanent URLs. Our users can feel confident that the bit.ly links they create don’t unexpectedly disappear or expire.

How do I remove or archive links from my bit.ly history page?
You can remove a bit.ly link from your history by opening the Options drop down menu and selecting the Archive link from the choices there (the other choices are Share, Copy, and Edit). Remember that links archived from your public history are still permanently functional and will always redirect to their original destination.

Does bit.ly ever re-use links?
No. Each link bit.ly issues is unique and will not be re-used, so you can be confident users will arrive at the correct site. Some URL shorteners issue links that are a character shorter than bit.ly. But those links get re-used because there is an upper limit to the number of character combinations.

I wrote a small shell script to grab longs URLs from the bit.ly service. Here the script:

  #!/bin/bash
  while true
  do
	URL=`mktemp -u XXXXXX`
        # Proxies are listed in a text file (IP:PORT)
	RANGE=`wc -l proxy.list | awk '{ print $1; }'`
	n=$RANDOM
	let "n %= $RANGE"
	http_proxy=`tail -$n proxy.list | head -1`
	echo "Testing $URL via $http_proxy ..."
	wget --max-redirect=0 -o /tmp/wget.$$ http://bit.ly/$URL
        # If a "Location:" header is returned, we have a valid URL
	REDIRECT=`grep ^Location: /tmp/wget.$$ | awk '{ print $2; }'`
	if [ "$REDIRECT" != "" ]; then
		echo "$URL -> $REDIRECT" >>results.log
	fi
  done

To prevent all risks of being blacklisted, I used open proxies in a random way. The UNIX command “mktemp” is a nice tool to generate random strings. To test if a bit.ly short URL is valid, just check if you received an HTTP code 301 (“Moved permanently“). The original URL is give via the “Location” header.

I let the script run during a few days and received 150000+ URL’s shortened by bit.ly. Here is a top-20 of the “shortened” websites:

  Websites              Hits               TLD's      Hits
  --------              ----               -----      ----
  formspring.me         4974               .com       101319
  www.facebook.com      4740               .me          6794
  fun140.com            3690               .net         5327
  twitter.com           3298               .jp          5181
  feedproxy.google.com  2841               .org         4363
  www.google.com        2495               .de          2861
  www.youtube.com       1780               .uk          2458
  apps.facebook.com     1598               .br          1895
  bit.ly                1549               .ly          1702
  build.nimblebuy.com   1543               .info        1602
  myloc.me              1149
  foursquare.com         875
  www.cbfeed.com         874
  www.amazon.com         780
  news.google.com        672
  friends.myspace.com    671
  chatter.com            635
  links.assetize.com     580
  www.etsy.com           565
  share.groups.im        524

Here are some facts about the URLs I grabbed:

  • First fact, the social networks are on top! (is it really a surprise?)
  • 0.48% of the shortened URLs are based on IP addresses instead of FQDN.
  • People use URL Shorteners inside organizations. I found URL’s based on RFC 1918 addresses (private addresses).
  • URL’s pointing to the loopback interface!? 🙂
  • 0.47% of the shortened URLs contained the word “sex“.
Interesting information found:
  • Private documents stored on scribd.com.
  • Lot of business documents (Word, Excel, Powerpoint, PDF, …)
  • Political content (propaganda images)
  • Exploits attempts
  • Administration interfaces

To conclude, use the URL Shorteners carefully. As user, never trust a short URL. As a creator, be careful about what you share. This is NOT a way to hide your data and once posted, your URL remains valid forever (except if you remote it manually – which often people don’t do).

8 comments

  1. Could you please add one more to the list?

    http://kfc.io

    There are plenty of advanced options such as link passwords, expiry date, custom names, folders, number of uses, private or public.

  2. No problem here if you specify the right template:

    $ $ mktemp foo.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    foo.ZhlBP1EUEBBdtgQ0kHw33SqUkBQsIzPcLjCcgBi
    $

  3. Nice post, I wish you would have shared some pointers on how to sort through all the URL to come up with your stats. Also if you wanted to generate a 12 digit random alphanumeric number like ab120b4599ba I dont think mktemp will work. Any idea what should be used?

  4. Good work!!! but why don’t use curl? is more rapid and ins’t intensive for Hardisk
    example
    REDIRECT=`curl -A “Searching Shorteners” -0 -x $http_proxy -I http://bit.ly/$URL 2> /dev/null | grep Location: | awk ‘{ print $2; }’`

  5. A nice one! 😉 It must point to plenty of interesting stuffs. But I’m not a lucky owner of a “.[mil|gov|fed.us|si.edu]” email address… Just for the fun, I tried to find credentials on bugmenot.com but none found. Why? 😉

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.