Spectra Assure Free Trial
Get your 14-day free trial of Spectra Assure
Get Free TrialMore about Spectra Assure Free Trial
A new tool that allows AI builders and users to assess the risk posed by large language models (LLMs) has been announced by the Cloud Security Alliance (CSA). Called Risk Rubric, the tool acts as an AI leaderboard that grades LLMs from A to F across six “risk pillars:” transparency, reliability, security, privacy, safety, and reputation.
Risk Rubric’s developers said the service “eliminates AI model risk guesswork by providing instant, actionable risk grades for the most common models enterprises deploy.”
Gal Moyal, from the CTO office of Noma Security, which partnered with the CSA on the project along with Harmonic Security and Haize Labs, said the leaderboard transforms subjective security concerns into objective, comparable, A-to-F grades backed by thousands of automated tests.
Gal MoyalTeams can instantly compare models and make evidence-based trade-offs between functionality and security risk, as Fortune 100 companies are already doing to challenge developer model choices with concrete data.
Here’s how you can use Risk Rubric — and why it’s only one tool in your security tool box for managing AI risk across your organization.
Get Guide: How the Rise of AI Will Impact Supply Chain Security
Risk Rubric evaluates hundreds of leading machine-learning models through rigorous testing protocols, including more than 1,000 reliability prompts and 200 adversarial security tests, automated code scans, and comprehensive documentation reviews.
Each AI receives objective scores that range from 0 to 100 across the six risk pillars, rolling up to A-to-F letter grades that enable rapid risk assessment without requiring deep AI expertise, they explained.
John Carberry, chief marketing officer at Xcape, said Risk Rubric is helpful because it brings a standardized, data-driven approach to a previously subjective process. “For the first time, organizations have a report card to objectively evaluate AI models on critical factors like security, privacy, and safety,” he said.
John CarberryThe proliferation of LLMs has created a Wild West, where decisions are often based on performance alone. A leaderboard like this signifies the maturation of the market and the recognition that trust, security, and ethics are as important as raw capability.
Rosario Mastrogiacomo, chief strategy officer at Sphere Technology Solutions, said the main advantage of Risk Rubric is that it evaluates AI models not just on performance, but on risk as well.
Rosario MastrogiacomoThis matters because it puts risk on equal footing with speed and accuracy, forcing enterprises to weigh security and governance as core criteria when selecting models.
Erich Kron, CISO advisor at KnowBe4, said corporations are rapidly deploying AI and they need to find ways to balance risk with benefits to stay competitive.
Erich KronIt can be very difficult and expensive to dedicate resources to testing different LLMs, so having a trusted third party that can provide this information can be very helpful.
Arnault Chazareix, ML engineering manager at GitGuardian, said Risk Rubric is a low-cost way to have a good overview of the model landscape. “Is it better than what most people currently do, which is nothing? Probably,” he said.
Arnault ChazareixI think it might be a good add-on to some of the current leaderboards that are performance-centric and often skewed because they are self-reported by companies.
Noma’s Moyal said Risk Rubric can be used to make informed decisions about AI development and deployment. “Teams can select models based on risk tolerance — A-rated for high-sensitivity applications, C-rated with additional controls for lower-risk use cases,” he said. “The weighted risk pillars — security 25%, reliability 20%, privacy 20%, etc. — allow prioritization based on specific context while providing CISOs concrete metrics for board-level risk communication.”
Xcape’s Carberry asserted that a leaderboard is a powerful tool for ongoing risk management and development. “It can inform a multi-model strategy, where different models are used for different risk profiles,” he said. “For example, a high-risk application handling sensitive customer data would require a model with an A grade in security and privacy, while a low-risk internal chatbot might be acceptable with a B grade.”
John CarberryThe leaderboard can also be used to track a model’s score over time, helping developers and security teams understand if a new version has introduced new vulnerabilities or if the model’s overall risk posture is deteriorating.
Sphere Technology’s Mastrogiacomo pointed out that Risk Rubric can provide security and engineering leaders with a common reference point. “A CISO might prioritize privacy, while a developer cares more about reliability,” he said. “With Risk Rubric, both can see objective scores and trade-offs across dimensions.”
The leaderboard can also be used to embed meaningful security into LLM selection. “By breaking the scoring into pillars, the customer can get a better idea of this scoring as it relates to things that are very important to them,” KnowBe4’s Kron explained. “Some organizations may have a focus on privacy, while others may be more concerned about hallucinations, and this rubric helps to identify strengths and potential weaknesses in those areas.”
Carberry added that a leaderboard fundamentally changes the security conversation from an afterthought to a primary selection criterion. “Security teams can now set a minimum grade for procurement, ensuring that models with known vulnerabilities or poor security hygiene are never deployed,” he said.
“It provides a common language for CISOs, AI developers, and business leaders, moving the conversation from ‘Does it work?’ to ‘Is it safe and trustworthy?’” he said. “It empowers security teams to say, ‘We will not use any model that scores below a B on the security and privacy pillars.’”
While Risk Rubric is a welcome new tool in the risk management tool box, it will be a challenge to keep it relevant, Carberry said. “The biggest challenge is the dynamic nature of AI itself,” he said. “A model’s score is only valid for a moment in time, as models and their underlying data are constantly being updated.”
However, Noma’s Moyal noted that Risk Rubric is designed to accommodate change. “It performs real-time scanning upon model version releases, monthly comprehensive re-evaluations for all indexed models, and immediate assessments for community-requested additions,” he said. “The engine automatically detects model updates and triggers new assessments, ensuring scores always reflect current security posture.”
Another challenge is the need for the leaderboard’s methodology to be transparent and rigorous enough to be trusted by the entire industry. “The black-box nature of many LLMs makes it difficult to truly verify claims of security or safety,” Carberry said.
Mastrogiacomo said that there is a governance challenge to be aware of. “As rankings become influential, vendors will inevitably dispute scores. The credibility of the leaderboard will depend on its transparency and resilience to pressure.”
Carberry said there’s also the risk of adversarial gaming, where a model is optimized to get a high score on a public leaderboard without truly addressing its underlying security flaws. That kind of gaming may be difficult with Risk Rubric, though, because of its built-in red-teaming function. “The engine systematically probes 125 risk behaviors using diverse evasion techniques and jailbreak methodologies, generating thousands of adversarial prompts per model to uncover edge cases that manual testing would miss,” Moyal said.
Gal MoyalAutomated judges evaluate responses across multiple dimensions simultaneously, identifying vulnerabilities like data exfiltration paths before attackers discover them in production.
Mastrogiacomo said Risk Rubric is more than a leaderboard. “It’s a governance instrument,” he said. “By making risk visible and comparable, it sets a higher floor for the industry.”
Rosario MastrogiacomoOver time, it could pressure vendors to prioritize explainability, data safeguards, and security features the same way performance benchmarks pushed them toward faster, cheaper models. That’s a welcome shift.
Carberry said the existence of a leaderboard like Risk Rubric is a positive sign for the industry. For security teams, this isn’t just about identifying vulnerabilities, he said. It’s about getting the actionable intelligence needed to manage AI risk at scale. “This is a crucial step toward building a more secure and trustworthy AI ecosystem for everyone,” he said.
John CarberryIt signals a shift from a move-fast-and-break-things mentality to one that prioritizes responsible innovation.
One key tool to build out your AI risk strategy is by adopting Cyclone DX’s XBOM, which includes an ML bill of materials, or ML-BOM.
Dhaval Shah, senior director of product management at ReversingLabs (RL),, wrote recently that developers and security teams need comprehensive visibility into their entire AI supply chain, and the ML-BOM is key to that.
Dhaval Shah[An] ML-BOM identifies potentially malicious open-source models before they can be integrated into your products, giving your development team confidence that the ML components they’re using are safe.
And the rise of AI risk comes at a time when attacks are going after the developer ecosystem and the software supply chain more broadly. Saša Zdjelar, chief trust officer at RL, said that using advanced application security tools such as binary analysis and reproducible builds could help manage AI risk better than traditional static and dynamic application security testing (SAST and DAST) tools.
Saša ZdjelarAppSec practices such as SAST and DAST typically only apply to a small subset of internally developed systems and applications at many organizations.
New NIST guidance identifies key AI and ML challenges. Learn why RL Spectra Assure should be an essential part of your solution.
Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.
Get your 14-day free trial of Spectra Assure
Get Free TrialMore about Spectra Assure Free Trial