RL Blog

Topics

All Blog PostsAppSec & Supply Chain SecurityDev & DevSecOpsProducts & TechnologySecurity OperationsThreat Research

Follow us

XX / TwitterLinkedInLinkedInFacebookFacebookInstagramInstagramYouTubeYouTubeblueskyBluesky

Subscribe

Get the best of RL Blog delivered to your in-box weekly. Stay up to date on key trends, analysis and best practices across threat intelligence and software supply chain security.

ReversingLabs: The More Powerful, Cost-Effective Alternative to VirusTotalSee Why
Skip to main content
Contact UsSupportLoginBlogCommunity
reversinglabsReversingLabs: Home
Solutions
Secure Software OnboardingSecure Build & ReleaseProtect Virtual MachinesIntegrate Safe Open SourceGo Beyond the SBOM
Increase Email Threat ResilienceDetect Malware in File Shares & StorageAdvanced Malware Analysis SuiteICAP Enabled Solutions
Scalable File AnalysisHigh-Fidelity Threat IntelligenceCurated Ransomware FeedAutomate Malware Analysis Workflows
Products & Technology
Spectra Assure®Software Supply Chain SecuritySpectra DetectHigh-Speed, High-Volume, Large File AnalysisSpectra AnalyzeIn-Depth Malware Analysis & Hunting for the SOCSpectra IntelligenceAuthoritative Reputation Data & Intelligence
Spectra CoreIntegrations
Industry
Energy & UtilitiesFinanceHealthcareHigh TechPublic Sector
Partners
Become a PartnerValue-Added PartnersTechnology PartnersMarketplacesOEM Partners
Alliances
Resources
BlogContent LibraryCybersecurity GlossaryConversingLabs PodcastEvents & WebinarsLearning with ReversingLabsWeekly Insights Newsletter
Customer StoriesDemo VideosDocumentationOpenSource YARA Rules
Company
About UsLeadershipCareersSeries B Investment
EventsRL at RSAC
Press ReleasesIn the News
Pricing
Software Supply Chain SecurityMalware Analysis and Threat Hunting
Request a demo
Menu
Products & TechnologyApril 4, 2024

ReversingLabs Hashing Algorithm

Here's what you need to know about RL's predictive malware detection.

FacebookFacebookXX / TwitterLinkedInLinkedInblueskyBlueskyEmail Us
pixelated initials RHA

Traditional hash detection algorithms (e.g., MD5, SHA-1) provide an essential tool for security applications. Although commonly used for allowlisting and blocklisting, traditional hashes have significant drawbacks for detecting malware. First, a malicious file must be seen before a hash can be created so polymorphic attacks are not detectable. Second, hashes are fragile, enabling malware authors to make inconsequential changes to files to avoid detection.

ReversingLabs Hashing Algorithm (“RHA”) addresses these issues by intelligently hashing a file’s features rather than its bits. Files have the same RHA hash when they are functionally similar. This makes RHA orders of magnitude better than traditional hashes for malware detection. One RHA hash can potentially identify thousands of functionally similar malware files, even though each has a unique SHA-1 hash. Further, RHA will detect a new and unknown malware variant because it is functionally similar to known malware.

RHA is superior to the traditional similarity algorithms, such as imphash, ssdeep, tlsh, and others, providing superior identification automatically with pre-defined threat matching. With the traditional similarity algorithms, users have to create filesets that they want to match against; with RHA, users can get threat matching directly as that work has been done by ReversingLabs for them. Keep reading to see the results of our proprietary algorithm with a real-world example.

How RHA Works

RHA enables correlation of files based on functional features. These attributes include format-specific header information, file layout, and functional file information (e.g., code and data relationships). RHA calculates functional similarity at four “precision levels,” 25%, 50%, 75%, and 100%, each based on an increasing number of attributes. Precision level represents the degree a file is functionally similar to another file. A higher precision level will match fewer files, but the files will have more functional similarity.

Diagram showing five buildings with red-shaded floors indicating data transfer; data moves from buildings at 25% capacity to those at 75% capacity over time

RHA can be applied to any executable file format. First, format-specific features are abstracted into categories such as structure, layout, content, symbols, functionality, and relationships. Then, algorithms are implemented to evaluate the attributes of each category for similarity at each precision level. Algorithms will vary for each format but usually entail data sorting and simplification. The algorithms calculate a hash for each precision level so that functionally related files fall into the same hash group.

Each precision level’s hash is deterministic and tied to functional configuration. This makes precision levels distinct with no overlaps in hash lookup. This hash determinism ensures the fastest possible hash lookup times.

Validation

The effectiveness of RHA was tested using 7.75M unique malware samples that were detected as part of the Zeus malware family by at least one antivirus vendor. The samples were processed with the algorithm at the lowest precision level, resulting in 475K unique RHA1 hashes. This effectively reduced the working malware set size by 93%.

We expected a reduction in sample uniqueness for members of the same malware family but didn’t expect the magnitude of reduction. We analyzed the sample data to understand better why the effectiveness was so high. We started with the hashes that yielded the most matches. The following plot shows the number of unique binaries that map to a single RHA1 hash at the lowest precision level.

Number of Files Assigned to a Single RSA Hash

A line graph with a shaded area underneath, showing a steep drop at the start and a gradual decline from left to right.

The top matching RHA file sample showed that our best match wasn’t on a particular malware family, but on a packing wrapper used to mask the true attack. This was not a common off-the-shelf packer, such as UPX, but a custom packing solution developed exclusively to hide malware presence.

Since packing can obscure detections and their malware family groupings, we turned to antivirus solutions to see how they classified the top match. The following graph shows the normalized threat names for the 100k files of the most prevalent RHA hash. There wasn’t no consensus on the threat name; only one antivirus vendor classified these samples as Zeus. Since it’s clear that the packing layer interferes with proper detections, we updated the Spectra Core backend to support this custom packing solution we call cpFlush.

bar graph with steep decline at beginning left to right

Unpacking the files showed that the top match was also using multiple packing layers. The number of corrupted and incorrectly packed files was low, so we could successfully unpack 95% of the samples. Comparing the RHA of files at each layer of packing showed they remained within the same functional hash buckets. This indicates that the differences between these files were indeed minor.

RHA, even at the lowest precision level, showed no collisions with allowlisted files and, therefore was safely applied to our automatic classification. The custom packer was blocklisted using its format signature, and RHA enabled us to detect multiple malware families that use it.

Conclusion

ReversingLabs proprietary hashing algorithm, a.k.a. RHA, is a critical component of RL’s Spectra Core capabilities, giving security teams a powerful new way for detecting present and future malware.

Learn about RL’s Malware Analysis and Threat Hunting solutions.

Learn more about ReversingLabs Malware Analysis and Threat HuntingTalk with an expert


Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.

Tags:Products & Technology

More Blog Posts

QR Code Phishing Is Evolving: Here’s How Your Detection Can Keep Up

QR Code Phishing Evolves: How to Keep Up

Here's what you need to know about the rise of quishing — and how your threat hunting team can get out in front of it.

Learn More about QR Code Phishing Evolves: How to Keep Up
QR Code Phishing Evolves: How to Keep Up
Why RL Built Spectra Assure Community

Why RL Built Spectra Assure Community

We set out to help dev and AppSec teams secure the village: OSS dependencies, malware, more. Learn how.

Learn More about Why RL Built Spectra Assure Community
Why RL Built Spectra Assure Community
How a Simple YARA Rule Catches What AV Misses

ClickFix: YARA Rules Catch What AV Misses

Learn about the antivirus detection gap — and how to develop a simple YARA rule using Spectra Analyze.

Learn More about ClickFix: YARA Rules Catch What AV Misses
ClickFix: YARA Rules Catch What AV Misses
Polyglot File Examination with Spectra Analyze

How to Examine Polyglot Files with Spectra Analyze

Here's how to assess a sample using Spectra Analyze in your environment — and create a YARA rule.

Learn More about How to Examine Polyglot Files with Spectra Analyze
How to Examine Polyglot Files with Spectra Analyze

Spectra Assure Free Trial

Get your 14-day free trial of Spectra Assure for Software Supply Chain Security

Get Free TrialMore about Spectra Assure Free Trial
Blog
Events
About Us
Webinars
In the News
Careers
Demo Videos
Cybersecurity Glossary
Contact Us
reversinglabsReversingLabs: Home
Privacy PolicyCookiesImpressum
All rights reserved ReversingLabs © 2026
XX / TwitterLinkedInLinkedInFacebookFacebookInstagramInstagramYouTubeYouTubeblueskyBlueskyRSSRSS
Back to Top