CuckooMX: Automating Email Attachments Scanning with Cuckoo

Today,  classic anti-virus protections are not enough reliable to protect against modern malwares. To have a better understanding and, if possible, block them, it’s best to execute the code in a safe environment and to analyze its behaviour. Does it create new processes or files, are outbound connections performed via suspicious domains or IP addresses? Does it implement hooks? This method of performing malware analysis in a sandbox is more and more common. As usual, they are vendors providing nice solutions (but often very expensive) and free (open source) alternatives. The most popular is called Cuckoo. I won’t explain in details what is Cuckoo and how it works. The project maintainer (Claudio Guarnieri) made a great presentation during the last Hack in the Box in Amsterdam. His slides are available here. Of course, I’m a Cuckoo user! I use malwr.com but I also have my local Cuckoo instance running on my Macbook with my own guest images.

If the method looks sexy, the day-to-day usage of sandboxes remains a pain! You need to grab a copy of the malware, transfer it to the sandbox, execute it, wait (!) and interpret the results. We need more automation! Today, emails remain a key attack vector to distribute malwares but also they are spread using documents (PDF, Office, Flash), as explained in this Sophos blog post “The Rise of Document-base Malware“. Yes, I’m a lazy guy and I would like to have all documents passing through my MTA being automatically analyzed by Cuckoo. They’re commercial solutions which achieve this. I’m currently playing with some in my job but they are really expensive. Why not try to do the same with free software?  That’s the purpose of this project called “CuckooMX“.

The principle is easy: Every mail relayed by a MTA will first be sent to Cuckoo for further analysis. If a suspicious file is detected, the mail will remain in a quarantine until results will be reviewed by a “security analyst” (read: a human). If considered as “safe”, the mail will be re-injected in the flow to reach its final destination. The figure below gives a global overview of the solution:

CuckooMX Architecture
(Click to enlarge)

They are some operations to achieve:

  • Capture the mail flow at MTA level
  • Extract MIME attachments
  • If some interesting are found (like PDF files, Zip archives, executable files or Office documents), submit them to Cuckoo
  • If Cuckoo reports the data to be safe, forward them back to the MTA
  • Otherwise, more investigation must be performed in the quarantine

The following process must be implemented:

CuckooMX Flow
(Click to enlarge)

At this time, we are facing a big “issue”: The current version of Cuckoo (0.3.2) cannot be easily configured to flag a piece of code as malicious or safe. Another Cuckoo user wrote a patch to add YARA support to Cuckoo. It works well but a more interesting system will be implemented in the next Cuckoo release (0.4) which is expected to be released soon. Signatures could be implemented as Python classes to easily categorize the malware. Here is an example (copied from Claudio’s slides):

class CreatesExe(Signature):
  name = "creates_exe"
  description = "Creates a Windows executable on the filesystem"
  severity = 2
  def run(self, results):
    for file_name in results["behavior"]["summary"]["files]";
      if file_name.endswitch(".exe"):
        self.data.append({"file_name" : file_name})
        return True
    return False

Just a few words about YARA. The goal of this project is to categorize malwares based on textual or binary patterns contained on samples of those families. Here is an example of a YARA signature:

rule Worm_VBS_Uaper_B
{
strings:
  $a0 = { 466f72204f353d3120546f204f332e41646472657373456e74726965732e436f756e74 }
  $a1 = { 536574204f363d4f332e41646472657373456e7472696573284f3529 }
  $a2 = { 4966204f353d31205468656e }
  $a3 = { 4f342e4243433d4f362e41646472657373 }
  $a4 = { 456c7365 }
  $a5 = { 4f342e4243433d4f342e424343202620223b20222026204f362e41646472657373 }

condition:
  $a0 and $a1 and $a2 and $a3 and $a4 and $a5
}

With the new Cuckoo version, it will be easy to create powerful signatures based on:

  • Network behaviour (DNS requests, IP addresses)
  • File system operations
  • Registry operations
  • System calls

In the mean time, CuckooMX submit attachments AND re-inject the mail immediately in the normal flow. There is NO protection against malicious code at the moment! Be warned!

My mail relay is based on an Ubuntu server and Postfix (the default installed MTA). CuckooMX is a perl script which integrates into Postfix and submits data to Cuckoo. How does it work? Postfix is a powerful open source mail server which many ways to be expanded to add features to filter emails. One of them is the called “After Queue Content Filter” (more information about this method is available here). To implement the filter, change your master.cf file like below:

# ====================================================================
# service type  private unpriv  chroot  wakeup  maxproc command + args
#               (yes)   (yes)   (yes)   (never) (100)
# ====================================================================
smtp      inet  n       -       -       -       -       smtpd
        -o content_filter=cuckoomx
[...]
cuckoomx  unix  -       n       n       -       -       pipe
        user=cuckoo argv=/data/cuckoo/cuckoomx.pl -f ${sender} ${recipient}

The first line (smtp) defines a new content filter called “cuckoomx“. This one is defined at the end of the file with information about the execution (under which user, arguments). If required, adapt the user and Perl script path to match your environment. I suggest you to use your existing Cuckoo user to avoid access problems on files. Once done, restart Postfix. Edit the Perl script and change the location of the configuration file (“cuckoomx.conf“) on line 58. The last step is to create/adapt the configuration file. The syntax is very simple:

<!--
  CuckooMX Configuration File
//-->
<cuckoomx>
  <core>
    <outputdir>/data/cuckoo/quarantine</outputdir>
    <process-zip>yes</process-zip>
  </core>
  <cuckoo>
    <basedir>/data/cuckoo</basedir>
    <db>/data/cuckoo/db/cuckoo.db</db>
    <guest>Cuckoo1</guest>
  </cuckoo>
  <logging>
     <syslogfacility>mail</syslogfacility>
     <sendmailpath>/usr/sbin/sendmail</sendmailpath>
     <notify>xavier@example.com</notify>
  </logging>
  <ignore>
     <mime-type>text/plain</mime-type>
     <mime-type>text/html</mime-type>
     <mime-type>image/jpeg</mime-type>
     <mime-type>image/png</mime-type>
     <mime-type>text/x-patch</mime-type>
     <mime-type>application/pkcs7-signature</mime-type>
     <mime-type>video/x-ms-wmv</mime-type>
   </ignore>
</cuckoomx>

The most important parameters that must reflect your setup are:

  • <basedir> is the base directory of your Cuckoo instance
  • <db> is the full path to your Cuckoo SQLite database
  • <guest> is the VirtualBox guest to use to analyze malwares
  • <sendmailpath> is the full path to your Postfix sendmail binary (to re-inject safe emails in the SMTP flow)

To avoid a flood of submissions with unsupported files, feel free to create your own ignore list with MIME types you’re not interested in. A best practice is to place this filter behind your classic anti-spam and anti-virus solutions (to reduce the load as much as possible). Keep in mind that using sandboxes may require a lot of system resources. The Perl script requires some Perl CPAN modules:

  • Archive::Extract
  • DBI
  • Digest::MD5
  • File::Path
  • MIME::Parser
  • Sys::Sylog
  • XML::XPath

From now, every mail received by the script is parsed and MIME attachments are extract in a quarantine directory. If a Zip archive is detected, files are extracted and submitted to Cuckoo! If interesting files are extracted, the MD5 digest is generated and compared to the Cuckoo’s DB to avoid duplicate. All information is sent to Syslog:

Jun 18 23:03:39 cuckoomx cuckoomx[9293]: Processing mail from: "DHL Inc." <status@dhl.com> (DHL Package delivery report)
Jun 18 23:03:39 cuckoomx cuckoomx[9293]: Dumped: "/data/cuckoo/in/9293/msg-9293-1.txt" (text/plain)
Jun 18 23:03:39 cuckoomx cuckoomx[9293]: Dumped: "/data/cuckoo/in/9293/msg-9293-2.txt" (text/plain)
Jun 18 23:03:39 cuckoomx cuckoomx[9293]: Dumped: "/data/cuckoo/in/9293/msg-9293-3.html" (text/html)
Jun 18 23:03:39 cuckoomx cuckoomx[9293]: Dumped: "/data/cuckoo/in/9293/DHL report.zip" (application/zip)
Jun 18 23:03:39 cuckoomx cuckoomx[9293]: Files to process: 1
Jun 18 23:03:39 cuckoomx cuckoomx[9293]: "/data/cuckoo/in/9293/DHL report.exe" already scanned (MD5: d68a6a9c37d000989224abe1b2c5160c)
Jun 18 23:03:39 cuckoomx postfix/pipe[9292]: BFC72441BDC: to=<xavier@example.com>, relay=cuckoomx, delay=0.72, delays=0.38/0/0/0.34, dsn=2.0.0, status=sent (delivered via cuckoomx service)

The rest of the operations remains classic to Cuckoo. Files are submitted directly in the SQLite database and processed. What’s next? I’m now waiting for the next release. I’m writing a daemon which will monitor the results of analyzes (always via the SQLite DB). Once the results generated, it will search for known signatures in the output files and decide what to do. The last step will be the interface to allow the security analyst to accept or reject the mail.

The CuckooMX project is already available on github.com. Feel free to test it and report ideas, comments. Everything is welcome!

 

19 comments

  1. Hi;
    This is still working. The only problem is that the extracted file has 600 privileges. It doesn’t mean anything if I change the mode setting. By the way my system is Debian 10 Buster and postfix 3x and Perl5.3.30.
    via

  2. Hi,
    To be honest, this project is dead for a while and I don’t know if it will work with the latest Cuckoo release. Feel free to try and let me know!

  3. Does this feature works in cuckoo latest version 2.0.6 ? Can you please explain what changes must be done in the configuration files in the latest version?

  4. Looks awesome! Does your updated version that work with 0.4 consider consider if attatchments are good/bad and decide whether or not to deliver them? If not, any hints on how I would go about implementing that?

    Thanks!

  5. Hello Michael,
    Why Perl? Just because I like it! Less experience with Python…
    I’m now waiting the official 0.4 version to continue this project.

  6. @Xavier nice work on CuckooMX, any particular reason why choosing Perl over Python? I was thinking about automating email client interactivity and copy the email over to a mailbox which the Cuckoo user/MUA then opens and clicks on everything… Wouldn’t that be neat too? That way you might catch any non-attachment attacks too, like bugs in the organisation’s MUA-product.

  7. Excellent !
    Thanks for this sharing this.

    That’s funny because yesterday I have seen Claudio Guarnieri ‘s slides of HITB and when i’ve read “it can be customized to do whatever you want and it can be integrated in larger threat intelligence frameworks” i’ve obsiously thought that smart people would take a look at the integration with common used mail and proxy products. A few minutes later i’ve seen your post…

  8. Hello Alojzy,
    The goal is of course to scan only suspicious files only (.exe, .zip, .pdf and office files). My development setup runs on a dual-core with 4GB of memory. Cuckoo can also be fine tuned (size of the VM, timeouts). The performance is not a very big issue (IMHO). Mails behind scanned will be queued and stored in the quarantine. Is it really a problem if you receive a mail with some delays?

  9. Fascinating project! I just have one question regarding the performance – how many e-mails per day\hour do you receive and process with cuckoo and on what kind of equipment?
    Good luck and I’m waiting for more!

  10. I’m in the process of experimenting with Cuckoo in our environment, to up our detection and mitigation capabilities. Email attachments are an ever-present threat and a non-commercial solution would be ideal. In the past I’ve been aware of a tool called vortex, being used for similar purposes (inline analysis). You may want to take a look at it as you build out your tools.

    http://sourceforge.net/projects/vortex-ids/
    http://smusec.blogspot.com/2010/03/vortex-howto-series-network.html

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.