Real-time email scanning with YARA

June 16, 2018
YARA from VirusTotal is a powerful tool that can be used to identify and classify malware. In this post we demonstrate how it can be employed to scan email, in real-time, using the NoSpaceships open-source yaraka project.


The NoSpaceships yaraka project is built on top of the open-source Haraka SMTP server. It scans any email sent to it, in real-time, logging any matches it finds.

Organizations can codify their own security intelligence, or intelligence from external sources, into YARA rules which can be used within yaraka. YARA rules can be used to identify simple strings, such as emails or DNS domains, patterns using regular expressions, and much more complex content using the comprehensive language provided by YARA.

In this post we first provide a quick overview of yaraka. Then we demonstrate how it can be built and installed on dedicated scanner hosts. Finally, we demonstrate yaraka in action.


Yaraka is delivered as an open-source GitHub hosted code repository. Included is a Makefile which is used to build an RPM package which is used to install it on one or more hosts. Yaraka is designed to be run in production, to scan all emails sent and received by an organization.

Instead of sitting in front of an email service and actively blocking email, yaraka sits alongside, passively monitoring for potential threats and providing notification of them. This approach does not require a change in architecture and minimises the service availability risk that comes with using an in-line solution.

Upon receipt of an email, yaraka first scans an entire email as is from top to bottom. This includes all headers, and if there are attachments, i.e. it’s a multipart message, at this point they will remain encoded. Following this, the email is decomposed into its constituent parts, and each of these parts is scanned using the same rule set. At this point, any encoded content, i.e. a base64 encoded attachment, is decoded before it is scanned.

If at least one rule matches, yaraka will log an alert message to the /var/log/maillog file via the local Syslog. This message will include a JSON object detailing the matched rules, and a few attributes of the scanned email. These events can be dispatched to a SIEM or alerting framework if desired.

Yaraka is documented in detail in the file under the projects repository on GitHub.


Before yaraka can be installed, you must first build it’s RPM package. The Installation section in the projects file documents the detail of this, but in short, on a 64bit CentOS 7 host use the following commands to build the RPM:

# epel-release is required for the yara and yara-devel packages.  If you
# have compiled and installed YARA already then the first two commands can
# be skipped.
sudo yum -y install epel-release
sudo yum -y install yara yara-devel
sudo yum -y install git gcc-c++ rpm-build

git clone
cd yaraka

The resulting RPM will be named yaraka-smtp-x.x.x-1.x86_64.rpm and located under the dist/x.x.x directory (where x.x.x is the version built).

This RPM can be used to deploy yaraka to production yaraka hosts.


Transfer the yaraka RPM to the scanner host, and then run the following commands to install it (where x.x.x is the version built):

sudo useradd -m yaraka
sudo systemctl stop postfix
sudo systemctl disable postfix

# epel-release is required for the yara package.  If you have compiled and
# installed YARA already then the first two commands can be skipped.
sudo yum -y install epel-release
sudo yum -y install yara
sudo rpm -i yaraka-smtp-x.x.x-1.x86_64.rpm

Following this, the yaraka-smtp service will be installed, enabled and started, be running as the yaraka user, and listening on TCP port 25.

NOTE If a local firewall is utilised, it will likely require an update to permit inbound SMTP connections on TCP port 25.


Once yaraka is installed you can configure your email service to forward emails to be scanned to the yaraka host. There are various ways this can be achieved.

Some mail frameworks will allow you to forward a copy of each email to another email address, others will allow you to dynamically add a BCC recipient, and some will allow you to archive emails via another mail server using SMTP.

Here at NoSpaceships we utilise the Microsoft O365 infrastructure to host our email service. To integrate yaraka we performed the following steps:

  • Deploy an Azure instance using CentOS 7 and a fixed public IP address
  • Configure and assign an Azure Network Security Group to the Azure instance and only permit SMTP connections from the Exchange Online Protection IP address Ranges to TCP port 25 on the Azure instance
  • Create an O365 Exchange Mail Connector to forward all email for the sub-domain to the Azure instances public IP address
  • Create an O365 Exchange Mail Rule to dynamically add the BCC recipient to all inbound email

Following this setup, a copy of all emails sent to all NoSpaceships email addresses, including shared mailboxes and aliases, is forwarded over to the yaraka host for scanning.


Following installation, yaraka will have a default example rule which simply looks for the string Hello, World! anywhere in the email. We will create a new rule to detect based on the MD5 value of a file attachment in an email.

We start by creating the file yaraka.txt with the following contents:

yaraka example

Let’s identify the MD5 for this file (here we used PowerShell on Windows):

Get-FileHash yaraka.txt -Algorithm MD5

For us this gives the MD5 F8E125ABD56F3094A64F08A576096FE6. Using the lower-case version of this MD5 we append the following YARA rule to the /opt/yaraka-smtp/config/rules.yara file on the yaraka host:

import "hash"

rule yaraka_txt {
        hash.md5(0, filesize) == "f8e125abd56f3094a64f08a576096fe6"

We then restart the yaraka-smtp service for the new rules to take affect:

sudo systemctl restart yaraka-smtp

Finally, we send an email to one of our O365 email addresses using another email provider, in our case Google Mail, and we attach our example file.

While we did this, we monitored the /var/log/maillog file and we see the rule match almost immediately:

sudo tail -f /var/log/maillog | grep yaraka_txt
Jun  8 20:34:53 www haraka[111630]: [ALERT] [649C7F7A-0FCD-4153-9D43-E3B219CDC9DD.1] [core] YARA rule match: {"message_id":"<>","subject":"Example email with attachment","mail_from":"<>","rules":["yaraka_txt"]}

The following summarises the email flow:

  • The Google Mail servers forward the email directly to the O365 email infrastructure by consulting and using DNS MX records for the destination email domain
  • O365 added the BCC recepient to the email
  • O365 used the mail connector we configured for the domain and forwarded the BCC’d copy of the email to the configured yaraka hosts IP address
  • O365 delivered the email to all original recipients
  • The yaraka host scanned the email as it was received and matched a rule
  • The yaraka host logged a message to the local Syslog detailing the rule match

During this process yaraka attempts to be frugal in what it logs so that it does not leave personal information lying around on disk. Additionally, it does not store the email on disk, not even temporarily, everything is operated on in memory.

What’s Next?

The example YARA rule given in this post is very basic and was simply used to demonstrate how easy it is to use yaraka for real-time email scanning, and how an organization can configure yaraka to utilise IoC’s with YARA based signatures.

We are looking to enhance yaraka in the following areas:

  • Include more content in the YARA rule match alert written to Syslog
  • Ad-hoc file scanning - via a HTTPS service
  • Recursive content scanning - i.e. extract archives and decode artefacts from certain content types and scan those independently
  • Support extensible plugins - to add custom content handling and inspection

If you have any questions or feedback about this article or the yaraka project, please contact us.

We have released the yaraka project under the terms of the MIT license. Yaraka is designed to be used in a production environment, and we will provide free support on a best-effort basis for this project.