RL Blog

Topics

All Blog PostsAppSec & Supply Chain SecurityDev & DevSecOpsProducts & TechnologySecurity OperationsThreat Research

Follow us

XX / TwitterLinkedInLinkedInFacebookFacebookInstagramInstagramYouTubeYouTubeblueskyBluesky

Subscribe

Get the best of RL Blog delivered to your in-box weekly. Stay up to date on key trends, analysis and best practices across threat intelligence and software supply chain security.

ReversingLabs: The More Powerful, Cost-Effective Alternative to VirusTotalSee Why
Skip to main content
Contact UsSupportLoginBlogCommunity
reversinglabsReversingLabs: Home
Solutions
Secure Software OnboardingSecure Build & ReleaseProtect Virtual MachinesIntegrate Safe Open SourceGo Beyond the SBOM
Increase Email Threat ResilienceDetect Malware in File Shares & StorageAdvanced Malware Analysis SuiteICAP Enabled Solutions
Scalable File AnalysisHigh-Fidelity Threat IntelligenceCurated Ransomware FeedAutomate Malware Analysis Workflows
Products & Technology
Spectra Assure®Software Supply Chain SecuritySpectra DetectHigh-Speed, High-Volume, Large File AnalysisSpectra AnalyzeIn-Depth Malware Analysis & Hunting for the SOCSpectra IntelligenceAuthoritative Reputation Data & Intelligence
Spectra CoreIntegrations
Industry
Energy & UtilitiesFinanceHealthcareHigh TechPublic Sector
Partners
Become a PartnerValue-Added PartnersTechnology PartnersMarketplacesOEM Partners
Alliances
Resources
BlogContent LibraryCybersecurity GlossaryConversingLabs PodcastEvents & WebinarsLearning with ReversingLabsWeekly Insights Newsletter
Customer StoriesDemo VideosDocumentationOpenSource YARA Rules
Company
About UsLeadershipCareersSeries B Investment
EventsRL at RSAC
Press ReleasesIn the News
Pricing
Software Supply Chain SecurityMalware Analysis and Threat Hunting
Request a demo
Menu
AppSec & Supply Chain SecurityDecember 14, 2023

The Hugging Face API token breach: 5 lessons learned

More than 1,500 tokens were exposed, leaving millions of AI models and datasets vulnerable. Here's what your security team can learn from the compromise.

John P. Mello Jr.
John P. Mello Jr., Freelance technology writer.John P. Mello Jr.
FacebookFacebookXX / TwitterLinkedInLinkedInblueskyBlueskyEmail Us
green robot turned facing different way amidst a bunch of pink robots

Researchers from Lasso Security rattled the AI development world early in December when they discovered that more than 1,500 Hugging Face API tokens were exposed, leaving millions of users vulnerable.

Hugging Face is the GitHub for AI developers. Its open-source library hosts more than 500,000 AI models and 250,000 datasets, including pre-trained models from Meta-Llama, Bloom, and Pythia.

One of the most used features of the website is its API, which allows developers and organizations to integrate models and read, create, modify, and delete repositories or files within them. A compromise of its API could be catastrophic.

Bar Lanyado, a security researcher at Lasso Security, said in his team's analysis of the compromise that the Hugging Face API tokens are significant for organizations and that exploiting them could lead to major negative outcomes, including data breaches and the spread of malicious models that "could affect millions of users who rely on these foundational models for their applications."

The gravity of the situation cannot be overstated. With control over an organization boasting millions of downloads, we now possess the capability to manipulate existing models, potentially turning them into malicious entities.

Bar Lanyado

After the disclosure, the repository and its affected users rushed to mitigate the problem, narrowly evading a debacle. Here are five lessons learned from the breach — and some best practices for reality-checking API security in your development environment.

Learn more: Secure AI development guidance: What software teams need to know MFA and software supply chain security: It's no magic bullet

1. Don't store login information in public repositories

Roger Grimes, a defense evangelist for KnowBe4, said shared logins are a huge ongoing problem. "After years of telling developers not to store logon information on public repositories, they continue to do so in large numbers," he said.

Grimes said the big takeaway was that technical defenses are now a requirement.

Studies have shown that when logon information is stored in deposited code, it's only minutes before potential adversaries start to take advantage of it. While I'm a huge believer in the power of education to combat most cybersecurity problems, this is one that needs more technical defenses.

Roger Grimes

Public repositories, in an attempt to mitigate the problem, should do proactive scanning when a developer uploads code and block the storing of logon information within stored code — or at least warn the developer of the severe consequences, he said.

With the severity of a potential breach, repositories such as GitHub have been rushing to two-factor and multifactor authentication (2FA and MFA) to protect accounts. However, 2FA and MFA are not panaceas, experts warn.

2. Use multiple API keys — and rotate them

Nick Rago, a field CTO with the API security firm Salt Security, said that it's good security practice to use not just one API key with third-party providers, but many, each focused on certain integration services to minimize impact of an exposed token. It is also a best practice to frequently rotate keys.

If a third-party provider only allows public API access with static tokens, it's good to use an API gateway as an intermediary between a developer and the third-party API, Rago said.

That way, the organizations can enforce more robust API posture and authentication methods in their code, such as OAuth or MTLS.

Nick Rago

3. Be aware of third-party API usage

Rago explained that API security is not just about securing APIs that are internally developed; ensuring safe consumption and usage of leveraged third-party APIs is also critical.

Key business process today consists of API supply chain calls that consist of consumption of both internal and third-party APIs. Therefore, it is important that organizations have a good understanding of what third-party APIs are in use, their function, and the data associated with them to assess risk.

Nick Rago

Education is also important, because developers need to understand the ramifications of mishandling privileged API keys. And technologies should be in place to ensure that secrets such as static API tokens don't find their way into code and then into exposed repositories, Rago said.

4. Your AI tools need to take data handling seriously

Teresa Rothaar, a governance, risk, and compliance analyst at Keeper Security, said AI development demands the highest security protocols given the amount of sensitive data AI models need to be fed for training to generate accurate and appropriate results. That means AI data sets alone are valuable.

In addition to the danger of data poisoning — a scenario where threat actors feed AI models inaccurate or inappropriate data — threat actors may seek to steal fully trained AI models that organizations have invested thousands of work hours and millions of dollars into. Why invest your own money and time into building an AI model if you can steal another organization’s work?

Teresa Rothaar

5. AI providers need to foster trust in APIs and beyond

Karl Mattson, CISO of API security firm Noname Security, said that as large language models grow in use, they will become embedded into applications using APIs. Organizations are already using generative AI from a variety of vendors and various channels. This utilization is taking different forms, including integrating generative AI into in-house application development, incorporating it into third-party applications, or accessing it directly via API from providers such as OpenAI or Google's Bard, Mattson said.

As API attacks continue to increase on AI, organizations integrating with generative AI technologies may face the same risks and consequences. The AI industry will need to work to maintain trust by building secure API implementations and protecting third-party transactions with good security hygiene.

Karl Mattson

Best practices for securing your APIs

Tushar Kulkarni, a graduate student at Indiana University who was part of a recent an RSA Conference webcast on API security, shared six measures organizations can take to secure their API implementations.

  • Don't use GUIDs/UUIDs that can be guessed by a threat actor in an intuitive way. GUIDs (globally unique identifiers) and UUIDs (universally unique identifiers) are used as identifiers for various resources or objects in APIs. During the web session, Kulkarni demonstrated how weak identifiers can be used to compromise an API.
  • Never rely on a client to filter sensitive data. It's always a good practice to allow a client to fetch only the data that is needed and nothing more.
  • Enforce a limit on how often a client can call the API endpoint. Without limits, a threat actor can carry out attacks, such as credential stuffing.
  • Lock down endpoints. Make sure all administrative endpoints validate a user's role and privileges before performing an action.
  • Avoid functions binding client-side data into code variables and later into objects in databases. Binding client-side data directly into code variables can expose the API to injection attacks, such as SQL injection or NoSQL injection. It can also expose the API to unintentional code injection vulnerabilities.
  • Enforce a strong CORS policy with custom, unguessable authorization headers. Enforcing a strong CORS policy ensures that only trusted domains are allowed to make requests to the API. That helps mitigate the risk of cross-site request forgery and other cross-origin attacks. "Enforcing a strong CORS policy is very important," Kulkarni said.

Developers should treat all API inputs as dangerous, Kulkarni said.

You should never assume end users won't fool around with the API on their own. You should always assume that every end user is an attacker.

Tushar Kulkarni

Keep learning

  • Get up to speed on the state of software security with RL's Software Supply Chain Security Report 2026. Plus: See the the webinar to discussing the findings.
  • Learn why binary analysis is a must-have in the Gartner® CISO Playbook for Commercial Software Supply Chain Security.
  • Take action on securing AI/ML with our report: AI Is the Supply Chain. Plus: See RL's research on nullifAI and watch how RL discovered the novel threat.
  • Get the report: Go Beyond the SBOM. Plus: See the CycloneDX xBOM webinar.

Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.

Tags:AppSec & Supply Chain Security

More Blog Posts

AI coding racing

Can AppSec keep pace with AI coding?

AI lets software teams generate code at a rate faster than security can validate it. One way to win the race: more AI.

Learn More about Can AppSec keep pace with AI coding?
Can AppSec keep pace with AI coding?
Finger on map

LLMmap puts its finger on ML attacks

Researchers show how LLM fingerprinting can be used to automate generation of customized attacks.

Learn More about LLMmap puts its finger on ML attacks
LLMmap puts its finger on ML attacks
Vibeware bad vibes

Vibeware: More than bad vibes for AppSec

Threat actors are leveraging the freewheeling vibe-coding trend to deliver malicious software at scale.

Learn More about Vibeware: More than bad vibes for AppSec
Vibeware: More than bad vibes for AppSec
CRA accelerates advantage

The CRA is coming: Are you ready?

Here's how the EU's Cyber Resilience Act will reshape the software industry — and how that accelerates advantages.

Learn More about The CRA is coming: Are you ready?
The CRA is coming: Are you ready?

Spectra Assure Free Trial

Get your 14-day free trial of Spectra Assure for Software Supply Chain Security

Get Free TrialMore about Spectra Assure Free Trial
Blog
Events
About Us
Webinars
In the News
Careers
Demo Videos
Cybersecurity Glossary
Contact Us
reversinglabsReversingLabs: Home
Privacy PolicyCookiesImpressum
All rights reserved ReversingLabs © 2026
XX / TwitterLinkedInLinkedInFacebookFacebookInstagramInstagramYouTubeYouTubeblueskyBlueskyRSSRSS
Back to Top