ReversingLabs Hashing Algorithm

predictive malware detection

Traditional hashing algorithms (e.g., MD5, SHA-1) provide an essential tool for security applications. Although commonly used for allowlisting and blocklisting, traditional hashes have significant drawbacks for detecting malware. First, a malicious file must be seen before a hash can be created so polymorphic attacks are not detectable. Second, hashes are fragile, enabling malware authors to make inconsequential changes to files to avoid detection.

ReversingLabs Hashing Algorithm (“RHA”) addresses these issues by intelligently hashing a file’s features rather than its bits. Files have the same RHA hash when they are functionally similar. This makes RHA orders of magnitude better than traditional hashes for malware detection. One RHA hash can potentially identify thousands of functionally similar malware files, even though each has a unique SHA-1 hash. Further, RHA will detect a new and unknown malware variant because it is functionally similar to known malware.

RHA is superior to the traditional similarity algorithms, such as imphash, ssdeep, tlsh, and others, providing superior identification automatically with pre-defined threat matching. With the traditional similarity algorithms, users have to create filesets that they want to match against; with RHA, users can get threat matching directly as that work has been done by ReversingLabs for them. Keep reading to see the results of our proprietary algorithm with a real-world example.

How RHA Works

RHA enables correlation of files based on functional features. These attributes include format-specific header information, file layout, and functional file information (e.g., code and data relationships). RHA calculates functional similarity at four “precision levels,” 25%, 50%, 75%, and 100%, each based on an increasing number of attributes. Precision level represents the degree a file is functionally similar to another file. A higher precision level will match fewer files, but the files will have more functional similarity.

RHA can be applied to any executable file format. First, format-specific features are abstracted into categories such as structure, layout, content, symbols, functionality, and relationships. Then, algorithms are implemented to evaluate the attributes of each category for similarity at each precision level. Algorithms will vary for each format but usually entail data sorting and simplification. The algorithms calculate a hash for each precision level so that functionally related files fall into the same hash group.

Each precision level’s hash is deterministic and tied to functional configuration. This makes precision levels distinct with no overlaps in hash lookup. This hash determinism ensures the fastest possible hash lookup times.

Validation

The effectiveness of RHA was tested using 7.75M unique malware samples that were detected as part of the Zeus malware family by at least one antivirus vendor. The samples were processed with the algorithm at the lowest precision level, resulting in 475K unique RHA1 hashes. This effectively reduced the working malware set size by 93%.

We expected a reduction in sample uniqueness for members of the same malware family but didn’t expect the magnitude of reduction. We analyzed the sample data to understand better why the effectiveness was so high. We started with the hashes that yielded the most matches. The following plot shows the number of unique binaries that map to a single RHA1 hash at the lowest precision level.

Number of files that are assigned to a single RHA hash

The top matching RHA file sample showed that our best match wasn’t on a particular malware family, but on a packing wrapper used to mask the true attack. This was not a common off-the-shelf packer, such as UPX, but a custom packing solution developed exclusively to hide malware presence.

Since packing can obscure detections and their malware family groupings, we turned to antivirus solutions to see how they classified the top match. The following graph shows the normalized threat names for the 100k files of the most prevalent RHA hash. There wasn’t no consensus on the threat name; only one antivirus vendor classified these samples as Zeus. Since it’s clear that the packing layer interferes with proper detections, we updated the Spectra Core backend to support this custom packing solution we call cpFlush.

Threat name breakdown for the best RHA1 hash

Unpacking the files showed that the top match was also using multiple packing layers. The number of corrupted and incorrectly packed files was low, so we could successfully unpack 95% of the samples. Comparing the RHA of files at each layer of packing showed they remained within the same functional hash buckets. This indicates that the differences between these files were indeed minor.

RHA, even at the lowest precision level, showed no collisions with allowlisted files and, therefore was safely applied to our automatic classification. The custom packer was blocklisted using its format signature, and RHA enabled us to detect multiple malware families that use it.

Conclusion

ReversingLabs proprietary hashing algorithm, a.k.a. RHA, is a critical component of RL’s Spectra Core capabilities, giving security teams a powerful new way for detecting present and future malware.

Learn about RL’s Malware Analysis and Threat Hunting solutions.

hashing diagram
graph of files decreasing over time
bar graph of threat names

Spectra Assure Free Trial

Get your 14-day free trial of Spectra Assure for Software Supply Chain Security

Get Free TrialMore about Spectra Assure Free Trial
Blog
Events
About Us
Webinars
In the News
Careers
Demo Videos
Cybersecurity Glossary
Contact Us
reversinglabsReversingLabs: Home
Privacy PolicyCookiesImpressum
All rights reserved ReversingLabs © 2026
XX / TwitterLinkedInLinkedInFacebookFacebookInstagramInstagramYouTubeYouTubeblueskyBlueskyRSSRSS
Back to Top
ReversingLabs: The More Powerful, Cost-Effective Alternative to VirusTotalSee Why
Skip to main content
Contact UsSupportLoginBlogCommunity
reversinglabs
ReversingLabs: Home
Solutions
Secure Software OnboardingSecure Build & ReleaseProtect Virtual MachinesIntegrate Safe Open SourceGo Beyond the SBOM
Increase Email Threat ResilienceDetect Malware in File Shares & StorageAdvanced Malware Analysis SuiteICAP Enabled Solutions
Scalable File AnalysisHigh-Fidelity Threat IntelligenceCurated Ransomware FeedAutomate Malware Analysis Workflows
Products & Technology
Spectra Assure®Software Supply Chain SecuritySpectra DetectHigh-Speed, High-Volume, Large File AnalysisSpectra AnalyzeIn-Depth Malware Analysis & Hunting for the SOCSpectra IntelligenceAuthoritative Reputation Data & Intelligence
Spectra CoreIntegrations
Industry
Energy & UtilitiesFinanceHealthcareHigh TechPublic Sector
Partners
Become a PartnerValue-Added PartnersTechnology PartnersMarketplacesOEM Partners
Alliances
Resources
BlogContent LibraryCybersecurity GlossaryConversingLabs PodcastEvents & WebinarsLearning with ReversingLabsWeekly Insights Newsletter
Customer StoriesDemo VideosDocumentationOpenSource YARA Rules
Company
About UsLeadershipCareersSeries B Investment
EventsRL at RSAC
Press ReleasesIn the News
Pricing
Software Supply Chain SecurityMalware Analysis and Threat Hunting
Request a demo
Menu