YARA is an important piece in the defenders chess set. Depending on how you play the game you can think of YARA as either a bishop or a rook. Powerful weapon in the hands of a threat hunter, or a layer that makes the defenders wall a bigger obstacle to overcome.
However, chess is a game that takes a lifetime to master. Each defeat is a lesson. And what separates a grand master from the beginner are the experiences accumulated by every wrong move they’ve made. Grand masters have lost more games than the beginners have played. Losing is simply a part of the game.
Unfortunately, security defenders don’t have that luxury. Losing as a defender can mean a devastating blow to the organization they are in charge of protecting.
Unlike chess players, security professionals have many powerful aids at their side while playing the game. These can turn them into grand masters with the ability to see five moves ahead. YARA is just that kind of an aid. But, like chess, it does take time and patience to master.
Hunting with YARA
One way to think about YARA is as a binary data query language. And with the advent of large data lakes hosted in security clouds, the term expanded to big data binary query language. YARA is an expressive way to apply regex matches to raw object content and its associated metadata. Regardless if the data lake objects are structured or not, YARA is the answer when it comes to searching through them.
Hunting with YARA means designing proactive big data queries that expose attacks before they have the chance to reach the organization. While this might sound counterintuitive to some, in chess, the game isn’t over when you form a castle. The fight just continues outside the walls, and its aim is to prevent its fall. How effective your pieces are outside the confines of the castle walls matters just as much as how sturdy that wall is.
There are two types of big data queries that YARA enables, continuous and retrospective. The former anticipates a future activity, while the latter confirms that the activity happened recently. Depending on the aim for the hunt any combination of the two can be applied. And the rule itself, typically, doesn’t need to be specially adjusted to be effective for either of the two.
Hunting rules themselves can be either strict or loose. Strict rules are more appropriate when tracking the activity of a known attacker. On the other hand, the loose ones are a perfect way to discover new threats and existing threat variants. However, the looser the rule the more work it takes to go through the matches it produces. There’s a fine balance between the number of conditions the rule must satisfy and the number of matches it ultimately produces. That’s where a big data lake aimed at security research can help the defenders.
While developing new YARA rules, big data lakes like TitaniumCloud can be used to quickly check if the matches it produces are yielding desired results. Let's say, for example, that we’re developing a rule to detect an unknown packer we intend to reverse engineer and create an unpacking method for. First step in this journey is to acquire the necessary samples that are packed with said packer. And the best way to get those is to write a YARA rule that detects it.
ReversingLabs A1000 - Portable Executable visualization view
Starting from a single sample we have in possession, a simple YARA rule is written. Just like a hypothesis, this rule is validated quickly against the Titanium Platform’s YARA retro-hunt capability. While being checked against a large data corpus the matches are returned in real time, as they occur. That provides an opportunity to review results immediately and reiterate on the rule if necessary.
This is where Titanium Platform’s YARA rule version tracking helps its developers. Each rule iteration can be verified, the results it produces can be compared to match expectations. And should it fail to meet the mark, revertible to a previous version that worked better at any time.
ReversingLabs A1000 - YARA rule editor history view
Quick rule iteration can help to fine tune a loose hunting rule. After the initial rule starts returning matches, which is close to immediately, they can be fetched from the cloud into the private platform instance.
However, closer result inspection shows some unwanted matches due to byte pattern being matched anywhere within the object.
That can be revised, while the rule is still running, to be matching at the entry point of the Portable Executable only. And additionally the rule is changed to clarify that the code section should be named “protect”.
Unfortunately, there’s been a typo in this new rule iteration as the section should be named “.protect”. Since the locally downloaded objects no longer match the rule, as they were meant to, the error is caught immediately. After it is fixed another run of cloud hunting can begin. The scan completes within an hour with the samples required to start working on the unpacker.
While this kind of quick iteration during development is imperative for rapid rule development, equally as important is the fact that the match history persists even when the rule is changed. That way previous matches, as the rule itself, can be studied and used to improve the hunting rule further.
ReversingLabs A1000 - YARA match history select view
Historical matches make catching a typo, like the one in the second iteration of the rule, easy. Rule validation is as easy as selecting the previous version and reprocessing available files. That points out any files that are expected to be matched by the rule aren’t.
ReversingLabs A1000 - YARA match history reanalysis
Regardless of the type of rule being developed this workflow is a prerequisite for creating and validating good YARA rules. Creating good ones takes time, and running bad ones generates operational cost while providing no protection benefits.
Since the intent is to create an unpacker for this unknown executable format, it is a good idea to continuously monitor the rule for matches. While that could be done by logging into Titanium Platform every so often there is a better way. Every YARA rule set can be subscribed to, so when there’s a match an email notification is sent. This is especially useful for those types of rules that on occasion yield a few, but very important, matches.
ReversingLabs A1000 - YARA match subscription view
Searching with YARA
YARA is a big data query language that can easily be combined with another technique, hunting with advanced search. There’s a separation of duties between these two hunting approaches. While YARA specializes to be an object content matching language, the advanced search is a metadata enrichment and correlation language. Things that are simple to express with one can be difficult with the other, and vice versa. That’s why they complement each other well.
It isn’t uncommon to start prototyping YARA rules via advanced search queries. Or the opposite, to create a crude YARA rule and filter down its results using search.
Given the previous example, where the Portable Executable section name was added in a later rule iteration, it is easy to demonstrate how advanced search would have helped there.
Since the YARA rule includes a user defined tag packed, it is easy to find the matching file with advanced search. Expanding the search query with the forgotten section name achieves the same effect as reiterating over the rule. And there’s one thing the advanced search is capable of that YARA is not. It is easy to filter out all the malicious files. Since the packer we’re interested in is predominantly used by clean software we can simply filter out all packed files that were infected by viruses. Which are commonly encountered in security research clouds and a nuisance that pollutes desired match results.
ReversingLabs A1000 - Advanced search with YARA tags, file metadata and classification
Since search results return instantly, it is easy to refine the matches by any number of metadata filters. Including the ones that are not part of the raw object contents. Such as extracted properties, timestamps, download locations, classification, similarity and even associated threat actors. More than 150 keywords and over 500 automatically applied tags are at disposal for just that purpose. At that point, the only limitation to combining these queries becomes how wild the imagination is.
Protecting with YARA
Another way to think about YARA is as a pattern based detection engine. Pattern matching is the most accurate way to detect a threat. But the accuracy of such detections is proportional to its rigidity. The more accurate the rule the less resilient to changes it is. Likewise, the more accurate the rule the less unwanted detections it generates. The key to writing good detection rules is finding the balance between the two.
In terms of protection, YARA rules can play a big role in determining the type of malware that was detected. Accurate pattern matching is a great way to refine heuristic detections into more exact ones. In such systems, heuristic detections act proactively while signatures reinforce their decisions and help to prioritize response.
Titanium Platform is a solution that is based upon this principle. Which, in addition to the dozen detection technologies employed to detect threats, can also have its classification extended through YARA rules. More so than any other security platform, it allows the teams that use it to integrate their best YARA detection rules natively into its classification logic. That extends beyond simple detection to include naming and scoring threats.
Such native integration results in detections that look as if they were produced by the platform itself. YARA rules are therefore first class classification citizens that are available in all features of the platform. Their results can be seen everywhere from reporting to alerting and advanced search.
ReversingLabs A1000 - Threat detection YARA rule syntax
Any existing rule can quickly be converted into a threat detection one. By extending the list of tags with the keyword malicious the intent to classify matched objects is signaled. Such matches will influence platforms classification logic and convict any sample they are matched on.
While that is a good way to classify objects, there is a better option. With the addition of tc_dectection tag the ability to name detected threats is unlocked. Fully qualified threat names include a threat type, family name and its severity. All of which are configured within the rule, and are applied as a malicious classification to the objects that match it.
ReversingLabs A1000 - Threat detection with a YARA rule
Deploying rules within Titanium Platform comes with a huge benefit. Every YARA rule is matched against every object discovered during static decomposition. Which, aside from expanding the coverage deployed rules have, allows rule developers to focus on identifying malicious functionality with less concern about the packaging it comes in. Deploying any existing YARA rule through the automated static decomposition engine amplifies its success rates by having it applied to the objects extracted from over 400 supported formats.
While above is a relatively simple example of this, it demonstrates the approach Titanium Platform takes when using YARA rules to classify embedded objects.
Titanium Platform threat detection rules are designed to detect malicious functionality that is often hidden behind layers of packing. YARA rules included with the platform are no exception.
Detecting a threat is a team effort, in more ways than one. Multiple technologies participate as equals to decide if the object is a threat, and what kind of a threat it is. YARA matches matter just as much as the other ways to convict an object. Ultimate decision is based on the outcome that poses the greatest threat to the organization, while taking into consideration classification technology accuracy and its naming quality.
ReversingLabs A1000 - Multiple detections on a single object
Explainable Machine Learning is an advanced heuristic detection technique. Being a heuristic, the predictive detections it’s capable of producing are limited to the threat type. While that granularity is sufficient to plan the response, having a second opinion to confirm the findings is considered a better analysis outcome. YARA rule provides that confidence, as the accurate pattern match it generates is the confirmation that machine learning was correct. And, as YARA detection is natively integrated into Titanium Platform, it also helps to name the detected threat. That way a detection that would have otherwise been named Ransomware.Heuristic gets refined into Ransomware.DenizKizi. And that is a great reason to natively integrate all threat detection YARA rules.
Deploying YARA rules
Protection rules are only efficient at protecting if they are applied at the boundaries they are designed for. Titanium Platform is a highly scalable content processing solution that can ingest any modern data source. Storage, network, email, mobile or web application uploads, and even software supply chain are highly complex entry points into the organization. Effective defender teams monitor and secure these data exchange points while enabling critical business processes. That’s why it is critical to be able to continuously deploy new protection measures to fight the ever evolving threat landscape.
However, that’s where a typical security organization encounters a huge problem. Security solution fragmentation is an obstacle to efficiency. Parts of the security stack support YARA rules, and do that well, while others either do not, or are running legacy YARA engine versions. That severely limits the available protection options. Even ingestion of trusted, security community developed, YARA protection and hunting rules becomes a hard problem to solve.
Titanium Platform not only solves these challenges, by centralizing data processing, but is also designed with internally developed YARA rules in mind. It is the best place to design new YARA rules, evaluate rule efficiency, and deploy rules to protect the organization. Using the central manager, YARA rule deployment becomes a problem of the past, as the rules get automatically synced to all data processing nodes. And with close integration into the classification system every rule can extend the platform detection capabilities. That way security teams can focus on emerging threats, those specific to their organization, and on closing existing security detection gaps.
Open source YARA rules
While this blog outlines more advanced defense strategies for those with existing YARA rule deployments, those without any haven’t been forgotten. There’s no better time to start using YARA within your organization than today. Existing, vibrant, community of threat defenders is more than ever looking at YARA as a common way to share threat detection knowledge. Deploying YARA can start with a single, high confidence, threat detection rule.
To that end, ReversingLabs is making a sizable contribution to threat defender toolboxes by open sourcing its threat detection YARA rules. The initial public release is composed out of more than a hundred rules that are built to detect various Windows and Linux malware families. Which when deployed detect a multitude of malware downloaders, viruses, trojans, exploits and ransomware.
Deploying these rules defenders can protect their organization from the following threats: Infostealer.MultigrainPOS, Ransomware.WannaCry, Ransomware.MedusaLocker, Ransomware.Kovter, Ransomware.Ryuk, Ransomware.GandCrab, Ransomware.Crysis, Trojan.TrickBot, Trojan.Emotet, Trojan.Dridex, Exploit.CVE-2020-0601 and others.
ReversingLabs A1000 - Threat detection via open source YARA rules
These YARA rules are built with the goal of providing zero false positive detections. To achieve this goal, and ensure their quality, they are put through rigorous testing in ReversingLabs cloud which consists of over 10B unique binaries. Only the rules that meet this strict criteria are considered for publication.
As threat detection rules, these YARA rules make an attribution to both the malware type and its family, or variety. With such results defenders can quickly pivot from a malware detection event to threat response. Knowing that a YARA rule has detected ransomware with high degree of precision can mean the difference between a prevented attack and the one that slips by because it was left waiting for investigation to elevate its importance.
After the initial publication in ReversingLabs GitHub repository the team plans to continue using this delivery mechanism as a venue that provides defenders with updated rules that detect the latest threats.
Leveling up your YARA game can start by deploying ReversingLabs open source rules.
Watch our videos:
- Whiteboard video: Identifying File Content with YARA Rules
- How To video: How to Hunt for Threats Using YARA Rules
Read our blogs:
- Keep up on the latest threats to the software supply chain and beyond
- Get up to speed on secrets security with our Secrets Exposed special report
- Learn more: SCA tools and how app sec is evolving to tackle supply chain security
- Learn how to to harden machine learning models against attacks
- Track key trends, what's ahead: The State of Supply Chain Security 2022-23
- Learn more about C-SCRM and federal supply chain security guidance
- Threat Research