– a Python Tweets Grabber

Twitter MonitoringFor me, Twitter is not only a social network, it’s also a tool that I use daily to track and exchange news about information security with a large worldwide community of infosec profesionals. For a while, Twitter is my main source of information. When you are relying on a service like Twitter to collect information, you must have the right tools to handle the huge (and constantly increasing) amount of data. I’m using classic Twitter clients on my computers and mobile devices but it is not powerful enough. Standard options such notifications help to be alerted when a specific Tweet is posted but often we can’t be disturbed all the time (ex: while working at a customer premises or in a meeting). When you’re back to check your timeline, most Twitter clients can’t easily handle thousands of Tweets to be reviewed. In short, I need something else! When you have a lot of data to index, Elasticsearch comes immediately in mind (and the associated tools to build the ELK suite).

Logstash has a Twitter input source by default but it allows only to search for specific keywords. This is enough if you’re looking for a solution to index Tweets posted about an event, a product or a user. In my case, I needed more and I started to develop my own Python script. Basically, it uses the Twitter API combined with the Python Twitter wrapper. The scripts grabs Tweets via the following methods:

  • GetHomeTimeline
  • GetSearch

The first method fetches a collection of the most recent Tweets and retweets posted by the authenticated user and the followed users. The second one simply returns a search results for a given keyword. By combining the two, I’m grabbing all my Tweets, the ones of all the people I follow and interesting keywords or hashtags about specific topics (conferences, specific product, etc)

By default the grabbed Tweets are displayed to the console like a “tail -f” command. Searched keywords are highlighed in one color and other terms can also be displayed in a different color based on regular expressions:

Twitter Console Feed
(Click to enlarge)

Requests to the Twitter API are performed smoothly to not explode the request allowed per minutes (see here) and avoid the IP address to be banned during a few minutes. The configuration is performed via a Python config file with easy to understand sections:

consumer_key: xxxx
consumer_secret: xxxx
access_token_key: xxxx
access_token_secret: xxxx
status_file: /var/run/tweetsniff.status
color: red
regex: foo
color: blue
keywords: drone
index: twitter

If an section “Elasticsearch” is defined, grabbed Tweets are sent to the specified instance and indexed in the provided index (default: twitter). Once done, just use the powerful Kibana interface to design your personal dashboard! Mine contains the following informations:

  • A timeline of indexed Tweets
  • Top-20 active users
  • Top-20 active hashtags
  • A 1-day trend indicator
  • My timeline
Twitter Dashboard
(Click to enlarge)

 Once Tweets are indexed, it’s peace of cake to track people you follow (when they are active on Twitter, what they retweeted, who referenced them and which hashtags they used). Here is an example with my friend @r00tbsd:

@r00tbsd's Dashboard
(Click to enlarge)

This is extremely convenient and quick to search for a backlog of Tweets! The script is at an early development stage, I would like to add more features in the coming days/weeks. Feel free to test it and to share ideas via The next idea is to collect shortened URLs and analyze them for suspicious activity.


  1. hola xavier. el twittersniff necesita tener instalado los paquetes CEF y ElasticSearch? Necesitan alguna configuración especifica?

  2. Thanks a lot for the reply Xavier,

    I’m going to try and use your streamer and if I get good results and am able to advance in my journey, I’ll be sure to give you a ring and share what I was able to produce with you. I’m looking for a way not only to index and retrieve results, but also to enhance searches with user input/validation.

    Best Regards,
    Bernardo Rodrigues

  3. Hi Bernardo,
    My ‘pastemon’ tool is still working (I’m using it 24×7). It can generate CEF events to feed Arcsight boxes. My ‘twittermon’ tool is clearly outdated. I did not touch the code for … I can’t remember! I’d suggest you to use my ‘tweetsniff’ and add a feature to export tweets in CEF!

  4. Hi Xavier,

    A few years back you did some work on integrating OSINT from sources such as Twitter and PasteBin into ArcSight. I’ve recently been trying to use the tool from back then but it seems that due to some updates to the API it no longer works.

    Is this tool possible for performing said integration? And do you still use some tool for pulling that info into ArcSight?

    Best Regards,
    Bernardo Rodrigues

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.