Break Free from VirusTotal with ReversingLabs Threat IntelWatch AMA Replay

Evaluating YARA Rules for macOS Malware Hunting in Spectra Analyze

With a constantly evolving OSX malware domain, it is important to write clear, specific, and accurate YARA rules.

Evaluating YARA Rules for macOS Malware Hunting in Spectra Analyze

Executive Summary

  • Malware targeting macOS systems tends to be overlooked. Hunting for Mac malware within your organization, especially with the Spectra Analyze YARA feature, can easily boost your security posture.
  • High quality YARA rules are specific, readable, and properly structured – whether they are human or machine-generated.

In our previous installments in this series, we’ve covered Windows-oriented threats. Increasingly, organizations have at least some portion of users on MacOS hardware. For too long, those of us in defensive security have written off fortifications against Mac malware with the stance that its lessened prevalence makes it less of a threat.

While MacOS malware is less widespread than malware targeting the Windows operating system, the ability to identify, detect, and classify old and new threats alike is increasingly important. YARA rules provide a versatile solution to this problem by allowing security practitioners to target specific OSX threats of interest. However, not all YARA rules carry the same weight regarding their quality and usefulness, which is dependent upon how they are written and used by researchers. 

Here's how YARA rules work, what constitutes a good YARA rule, how machine learning (ML) and artificial intelligence (AI) are impacting YARA, and most importantly, why RL Spectra Analyze’s YARA feature is a game-changer for researchers.

Some Background on YARA

YARA — or “Yet Another Recursive Acronym” — is a tool used by threat hunters to identify and classify malware through the creation of rules that hold a description of the malware as well as patterns that are present within it, in order to match samples. These patterns include strings and binary code that are unique to the malware. YARA rules allow malware researchers to accurately categorize samples into malware families and reduce the number of false positive results when analyzing files. 

A good YARA rule is precise and readable with clear metadata, unique strings, and specific conditions to minimize false positive results and accurately match the correct malware. There are many publicly available YARA rule repositories that can easily be utilized by security teams looking to hunt Mac malware. These rules may be both human-written and machine generated, and both have their respective pros and cons. 

What makes a good YARA rule?

In general, a good YARA rule is specific, readable, and uses the following structure:

Metadata

This section includes information like the author, the malware it targets, a short description, the date, any references, any hashes, and other general information pertinent to the rule or the malware it targets. 

Strings

This section holds identifiers for the malware the YARA rule pertains to. Some of these identifiers should be unique to the malware present, but it is not incorrect to include other, more generic strings that would be helpful in determining other parts of the malware. These identifiers can be literal strings that are unique to the malware, but we are not limited to just string literals. They can also be hexadecimal numbers or regex expressions. Each string is defined with a “$” before the variable name, and then assigned a value. 

A good rule is also readable; therefore, the string values should also be as human-readable as possible. For example, if a string can be represented in standard format, avoid leaving it in hexadecimal. Additionally, for hexadecimal representations of ASCII characters, it is good practice to leave the ASCII translation in a comment near the definition of the string variable. These provisions and similar ones preserve the readability of the rule.

A YARA rule containing all three key elements: Metadata, Strings and Condition.

Figure 1: A YARA rule containing all three key elements: Metadata, Strings and Condition.

Condition

This section details how the previously defined strings should be used in order to return a positive. This is the meat of the rule – the previous two sections are technically optional (as in, you can write a rule without them), but this one is mandatory. Conditions should be written for optimization. This means that a good rule would have broader conditions first, and more specific ones following. What is considered broad or specific would depend on the malware the rule is detecting. 

Conditions can be as simple as checking for a unique string, or as complex as multi-branch and/or statements. A good practice in writing conditions is to start by checking the header of the file, and end with false positive checks; these would be conditions to rule out any goodware that passed the previous conditions. Additionally, checking the filesize is a generally good condition to include. Further conditions should be specific enough to prevent false positives, but not so specific that certain malware files go unnoticed. 

Comparing Autogenerated and Human-written Rules

YARA rules are usually written by threat hunters and researchers. However, in recent years, autogenerated rules have become commonplace. Each of these rule categories has their respective pros and cons depending on the use case. 

Autogenerated YARA rules are great because of how quickly they can be created. Tools for creating YARA rules will typically ask for a malware sample and create a rule based on it. These tools look for common sequences (static components of the given malware sample) and record them as patterns. These sequences include:

  • Strings
  • Opcodes
  • Byte sequences
  • File types/size
  • Hashes

Autogenerated rules will typically have simpler logical statements than human-written ones as they are usually just looking to match the patterns kept as strings. This approach can be practical in situations where there is a high volume of samples to classify, or there is a need to work with or classify known malware. It is important to note that threat actors can easily obfuscate static malware features, so YARA rules that rely solely on these can only be used for a limited amount of time. However, since autogenerated rules can be created quickly, they are still a useful tool in threat hunting. 

On the other hand, human-written YARA rules tend to hinge on insights gleaned from manual analysis. This makes human-written rules more appealing for threat hunting, as the higher attention to detail allows for these rules to hunt for malware that may not be classified yet or have distinct indicators of compromise. Human-written rules typically include dynamic features of malware. These features might include:

  • Runtime
  • Number of API calls
  • File system changes
  • Registry changes

These dynamic characteristics, combined with threat hunters’ contextual understanding of the malware scheme being targeted, allows for human-written rules to remain useful for longer periods of time, in addition to being used for identifying new threats or anomalies in previously known threats.

As an example, we can compare two rules publicly available on Malpedia for the OSX malware OceanLotus. The first is an autogenerated rule created by Malpedia’s yara-signator:

C/C++

rule osx_oceanlotus_auto {

    meta:
        author = "Felix Bilstein - yara-signator at cocacoding dot com"
        date = "2020-10-14"
        version = "1"
        description = "autogenerated rule brought to you by yara-signator"
        tool = "yara-signator v0.5.0"
        signator_config = "callsandjumps;datarefs;binvalue"
        malpedia_reference = "https://malpedia.caad.fkie.fraunhofer.de/details/osx.oceanlotus"
        malpedia_rule_date = "20201014"
        malpedia_hash = "a7e3bd57eaf12bf3ea29a863c041091ba3af9ac9"
        malpedia_version = "20201014"
        malpedia_license = "CC BY-SA 4.0"
        malpedia_sharing = "TLP:WHITE"

    strings:
        $sequence_0 = { 48 8b85f0feffff 48 8d78e8 48 3b3d???????? 7417 }
            // n = 7, score = 200
            //   48                   | dec                 eax
            //   8b85f0feffff         | mov                 eax, dword ptr [ebp - 0x110]
            //   48                   | dec                 eax
            //   8d78e8               | lea                 edi, [eax - 0x18]
            //   48                   | dec                 eax
            //   3b3d????????         |                     
            //   7417                 | je                  0x19

        $sequence_1 = { 8b85b8fdffff 48 8d78e8 48 3b3d???????? }
            // n = 5, score = 200
            //   8b85b8fdffff         | mov                 eax, dword ptr [ebp - 0x248]
            //   48                   | dec                 eax
            //   8d78e8               | lea                 edi, [eax - 0x18]
            //   48                   | dec                 eax
            //   3b3d????????         |                     

        $sequence_2 = { 48 8b85b8fdffff 48 8d78e8 48 }
            // n = 5, score = 200
            //   48                   | dec                 eax
            //   8b85b8fdffff         | mov                 eax, dword ptr [ebp - 0x248]
            //   48                   | dec                 eax
            //   8d78e8               | lea                 edi, [eax - 0x18]
            //   48                   | dec                 eax

        $sequence_3 = { 8b85b8fdffff 48 8d78e8 48 }
            // n = 4, score = 200
            //   8b85b8fdffff         | mov                 eax, dword ptr [ebp - 0x248]
            //   48                   | dec                 eax
            //   8d78e8               | lea                 edi, [eax - 0x18]
            //   48                   | dec                 eax

        $sequence_4 = { 8b85f0feffff 48 8d78e8 48 3b3d???????? 7417 }
            // n = 6, score = 200
            //   8b85f0feffff         | mov                 eax, dword ptr [ebp - 0x110]
            //   48                   | dec                 eax
            //   8d78e8               | lea                 edi, [eax - 0x18]
            //   48                   | dec                 eax
            //   3b3d????????         |                     
            //   7417                 | je                  0x19

        $sequence_5 = { e8???????? 48 8b85f8feffff 48 8d78e8 48 3b3d???????? }
            // n = 7, score = 200
            //   e8????????           |                     
            //   48                   | dec                 eax
            //   8b85f8feffff         | mov                 eax, dword ptr [ebp - 0x108]
            //   48                   | dec                 eax
            //   8d78e8               | lea                 edi, [eax - 0x18]
            //   48                   | dec                 eax
            //   3b3d????????         |                     

        $sequence_6 = { 89de e8???????? 48 8b8508ffffff 48 8d78e8 }
            // n = 6, score = 200
            //   89de                 | mov                 esi, ebx
            //   e8????????           |                     
            //   48                   | dec                 eax
            //   8b8508ffffff         | mov                 eax, dword ptr [ebp - 0xf8]
            //   48                   | dec                 eax
            //   8d78e8               | lea                 edi, [eax - 0x18]

        $sequence_7 = { e8???????? 48 8b85f8feffff 48 8d78e8 48 }
            // n = 6, score = 200
            //   e8????????           |                     
            //   48                   | dec                 eax
            //   8b85f8feffff         | mov                 eax, dword ptr [ebp - 0x108]
            //   48                   | dec                 eax
            //   8d78e8               | lea                 edi, [eax - 0x18]
            //   48                   | dec                 eax

        $sequence_8 = { 48 89de e8???????? 48 8b8508ffffff 48 8d78e8 }
            // n = 7, score = 200
            //   48                   | dec                 eax
            //   89de                 | mov                 esi, ebx
            //   e8????????           |                     
            //   48                   | dec                 eax
            //   8b8508ffffff         | mov                 eax, dword ptr [ebp - 0xf8]
            //   48                   | dec                 eax
            //   8d78e8               | lea                 edi, [eax - 0x18]

        $sequence_9 = { 90 55 48 89e5 5d e9???????? }
            // n = 6, score = 200
            //   90                   | nop                 
            //   55                   | push                ebp
            //   48                   | dec                 eax
            //   89e5                 | mov                 ebp, esp
            //   5d                   | pop                 ebp
            //   e9????????           |                     

    condition:
        7 of them and filesize < 308528
}

The second is a human-written rule created by AlienVault Labs, targeting the XOR decryption function in the malware:

C/C++

rule osx_oceanlotus_w0 {
    meta:o
        author = "AlienVault Labs"
        type = "malware"
        description = "OceanLotus XOR decode function"
        source = "https://www.alienvault.com/blogs/labs-research/oceanlotus-for-os-x-an-application-bundle-pretending-to-be-an-adobe-flash-update"
        malpedia_reference = "https://malpedia.caad.fkie.fraunhofer.de/details/osx.oceanlotus"
        malpedia_version = "20170519"
        malpedia_license = "CC BY-NC-SA 4.0"
        malpedia_sharing = "TLP:WHITE"
    strings:
        $xor_decode = { 89 D2 41 8A ?? ?? [0-1] 32 0? 88 ?? FF C2 [0-1] 39 ?A [0-1] 0F 43 D? 4? FF C? 48 FF C? [0-1] FF C? 75 E3 }
    condition:
        $xor_decode
}

It is apparent that the autogenerated rule follows the guidelines laid out previously, with lots of static sequences that the rule relies extensively on. While the human-written rule also uses static features, it focuses on the decryption algorithm that is vital to the malware and is much less easily obfuscated. Both of these rules are useful, but it is helpful to see how autogenerated rules are much more focused on sequences and patterns over dynamic behaviors.

Hunting Bundlore with Spectra Analyze

Spectra Analyze includes a built-in YARA feature that allows us to run rules against the entirety of the ReversingLabs sample corpus – consisting of over 422 billion searchable samples. This allows us to create groupings of samples that fit specific criteria. To demonstrate this capability, we’ll evaluate a third-party rule to gather samples of a macOS malware family called Bundlore.

C/C++

private rule Macho
{
    meta:
        description = "private rule to match Mach-O binaries"
    condition:
        uint32(0) == 0xfeedface or uint32(0) == 0xcefaedfe or uint32(0) == 0xfeedfacf or uint32(0) == 0xcffaedfe or uint32(0) == 0xcafebabe or uint32(0) == 0xbebafeca
}

rule OSX_Bundlore_A
{
meta:
description = "OSX.Bundlore.A"

strings:
 		$a1 = { 4F 66 66 65 72 73 49 6E 73 74 61 6C 6C 53 63 72 69 70 74 55 72 6C }
 		$a2 = { 53 6F 66 74 77 61 72 65 49 6E 73 74 61 6C 6C 53 63 72 69 70 74 55 72 6C }
 		$a3 = { 63 6F 6D 2E 67 6F 6F 67 6C 65 2E 43 68 72 6F 6D 65 }
 		$a4 = { 2E 74 6D 70 6D 61 }
 		$a5 = { 50 6C 65 61 73 65 20 77 61 69 74 20 77 68 69 6C 65 20 79 6F 75 72 20 73 6F 66 74 77 61 72 65 20 69 73 20 62 65 69 6E 67 20 69 6E 73 74 61 6C 6C 65 64 2E 2E 2E }

condition:
filesize < 500000 and Macho and 4 of ($a*)
}

Figure 1. Two Apple XProtect YARA rules 

Above, you can see an example of two YARA rules that work together for hunting the Bundlore family of malware. We’ve taken these from Apple’s public XProtect github repository, which is a built-in file protection system on Mac devices.

The first rule, called Macho, is simply looking for the file signature that identifies mach-o files. Running this rule in conjunction with a more specific rule allows us to narrow our sample size to only the applicable files (i.e. - those that can actually run on a macOS machine). The second of this pair of rules, OSX_Bundlore_A, includes several hex strings targeting various features of Bundlore malware. If we convert these strings to ASCII, we see they represent the following:

  • OffersInstallScriptUrl
  • SoftwareInstallScriptUrl
  • com.google.Chrome
  • .tmpma
  • Please wait while your software is being installed…

The rule requires at least four of these strings be present to return a match. We can see in this list a domain; function names; a string the malware may print; and a file extension. The hope in utilizing these strings in a YARA rule is that they are specific to Bundlore and therefore unlikely to be present in other samples or goodware.

To access the YARA hunting feature that allows us to search the ReversingLabs file corpus for samples meeting our specified conditions, Spectra Analyze customers select the  “Yara” option in the main toolbar (Figure 2). Spectra Core has many built-in YARA rules that can be seen by filtering the “Owner” field by “Spectra Core (Embedded).

Yara landing page as seen in Spectra Analyze

Figure 2. Yara landing page as seen in Spectra Analyze

To add a new rule, we click the “Add Ruleset” dropdown menu on the right, and are shown the option to add a new ruleset manually, upload a file containing our rules, or to import rules from an online source. For this exercise, we’ll select the first option and add our new ruleset manually, as seen below.

YARA rule editor as seen in Spectra Analyze, with above XProtect rules being added to a new ruleset.

Figure 3. YARA rule editor as seen in Spectra Analyze, with above XProtect rules being added to a new ruleset.

Before running the rule we have, let’s confirm there are samples in Spectra Analyze via a simple Advanced Search threatname query.

Simple Advanced Search in Spectra Analyze, returning Bundlore samples.

Figure 4. Simple Advanced Search in Spectra Analyze, returning Bundlore samples.

If you search for Macho samples categorized as Bundlore by threatname going back to 2020, we see there are 181 samples to work with. There may be additional samples categorized under another threat name, or it could be that some of these samples do not fit the criteria specified in the Bundlore rule we analyzed in the previous section. Further analysis will tell.

Steps in Spectra Analyze to run a YARA cloud retrohunt.

Figure 5. Steps in Spectra Analyze to run a YARA cloud retrohunt.

Spectra Analyze has multiple options for running YARA rules. You are able to run the ruleset in the cloud, which will match any cloud samples from the moment of enablement, moving forward. If you do not enable this option, rules will run only against locally uploaded samples. However, for this we are interested in historic samples matching our ruleset, so we want to run a Cloud Retrohunt (Figure 5).

Spectra Analyze Cloud Retrohunt results for XProtect Bundlore rule

Figure 6. Spectra Analyze Cloud Retrohunt results for XProtect Bundlore rule

After the retrohunt completes, you can view the results (Figure 6). It has returned only eight samples and interestingly, the “first seen” dates on these samples are all eight or nine years ago. This suggests that the strings utilized by the XProtect rule we used in this hunt are no longer being used by the malware and may not have been present in Bundlore samples for some time now. If we were to expand this hunting exercise, it would prompt us to research more recently reported campaigns and try to gather fresh samples so that we could develop a rule for these new samples. 

We’ll dive deeper into this kind of YARA rule development in a later installment of our RL Spectra Analyze In Action series.

Conclusion 

YARA rules are a quintessential part of the threat hunting realm, allowing analysts to easily identify and classify malware samples. With a constantly evolving OSX malware domain, it is important to write clear, specific, and accurate YARA rules. Understanding what makes a YARA rule accurate — specific strings, fair conditions, and appropriate false positive checks — allows us to create more applicable rules.

The ideal approach to using YARA combines autogenerated and human-written rules and using them in their appropriate applications in threat detecting and threat hunting, respectively. There is no one clear-cut way to develop YARA rules, but by staying mindful of good YARA practices, we can continue to create rules that work. 

Spectra Analyze’s YARA feature allows us to both retrohunt through RL’s immense file corpus, and be alerted when new samples are available. Whether you are looking to run third-party rules or to develop your own for specific hunting purposes, Spectra Analyze is easy and intuitive to use.

Up Next…

In our next installment in this series, we’ll dive into YARA rule development using Bundlore as our case study.

Back to Top