Spectra Assure Free Trial
Get your 14-day free trial of Spectra Assure
Get Free TrialMore about Spectra Assure Free TrialResearchers from Lasso Security rattled the AI development world early in December when they discovered that more than 1,500 Hugging Face API tokens were exposed, leaving millions of users vulnerable.
Hugging Face is the GitHub for AI developers. Its open-source library hosts more than 500,000 AI models and 250,000 datasets, including pre-trained models from Meta-Llama, Bloom, and Pythia.
One of the most used features of the website is its API, which allows developers and organizations to integrate models and read, create, modify, and delete repositories or files within them. A compromise of its API could be catastrophic.
Bar Lanyado, a security researcher at Lasso Security, said in his team's analysis of the compromise that the Hugging Face API tokens are significant for organizations and that exploiting them could lead to major negative outcomes, including data breaches and the spread of malicious models that "could affect millions of users who rely on these foundational models for their applications."
Bar LanyadoThe gravity of the situation cannot be overstated. With control over an organization boasting millions of downloads, we now possess the capability to manipulate existing models, potentially turning them into malicious entities.
After the disclosure, the repository and its affected users rushed to mitigate the problem, narrowly evading a debacle. Here are five lessons learned from the breach — and some best practices for reality-checking API security in your development environment.
Learn more: Secure AI development guidance: What software teams need to know MFA and software supply chain security: It's no magic bullet
Roger Grimes, a defense evangelist for KnowBe4, said shared logins are a huge ongoing problem. "After years of telling developers not to store logon information on public repositories, they continue to do so in large numbers," he said.
Grimes said the big takeaway was that technical defenses are now a requirement.
Roger GrimesStudies have shown that when logon information is stored in deposited code, it's only minutes before potential adversaries start to take advantage of it. While I'm a huge believer in the power of education to combat most cybersecurity problems, this is one that needs more technical defenses.
Public repositories, in an attempt to mitigate the problem, should do proactive scanning when a developer uploads code and block the storing of logon information within stored code — or at least warn the developer of the severe consequences, he said.
With the severity of a potential breach, repositories such as GitHub have been rushing to two-factor and multifactor authentication (2FA and MFA) to protect accounts. However, 2FA and MFA are not panaceas, experts warn.
Nick Rago, a field CTO with the API security firm Salt Security, said that it's good security practice to use not just one API key with third-party providers, but many, each focused on certain integration services to minimize impact of an exposed token. It is also a best practice to frequently rotate keys.
If a third-party provider only allows public API access with static tokens, it's good to use an API gateway as an intermediary between a developer and the third-party API, Rago said.
Nick RagoThat way, the organizations can enforce more robust API posture and authentication methods in their code, such as OAuth or MTLS.
Rago explained that API security is not just about securing APIs that are internally developed; ensuring safe consumption and usage of leveraged third-party APIs is also critical.
Nick RagoKey business process today consists of API supply chain calls that consist of consumption of both internal and third-party APIs. Therefore, it is important that organizations have a good understanding of what third-party APIs are in use, their function, and the data associated with them to assess risk.
Education is also important, because developers need to understand the ramifications of mishandling privileged API keys. And technologies should be in place to ensure that secrets such as static API tokens don't find their way into code and then into exposed repositories, Rago said.
Teresa Rothaar, a governance, risk, and compliance analyst at Keeper Security, said AI development demands the highest security protocols given the amount of sensitive data AI models need to be fed for training to generate accurate and appropriate results. That means AI data sets alone are valuable.
Teresa RothaarIn addition to the danger of data poisoning — a scenario where threat actors feed AI models inaccurate or inappropriate data — threat actors may seek to steal fully trained AI models that organizations have invested thousands of work hours and millions of dollars into. Why invest your own money and time into building an AI model if you can steal another organization’s work?
Karl Mattson, CISO of API security firm Noname Security, said that as large language models grow in use, they will become embedded into applications using APIs. Organizations are already using generative AI from a variety of vendors and various channels. This utilization is taking different forms, including integrating generative AI into in-house application development, incorporating it into third-party applications, or accessing it directly via API from providers such as OpenAI or Google's Bard, Mattson said.
Karl MattsonAs API attacks continue to increase on AI, organizations integrating with generative AI technologies may face the same risks and consequences. The AI industry will need to work to maintain trust by building secure API implementations and protecting third-party transactions with good security hygiene.
Tushar Kulkarni, a graduate student at Indiana University who was part of a recent an RSA Conference webcast on API security, shared six measures organizations can take to secure their API implementations.
Developers should treat all API inputs as dangerous, Kulkarni said.
Tushar KulkarniYou should never assume end users won't fool around with the API on their own. You should always assume that every end user is an attacker.
Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.
Get your 14-day free trial of Spectra Assure
Get Free TrialMore about Spectra Assure Free Trial