Despite the risks associated with artificial intelligence (AI) coding, developers remain enthusiastic, using it to keep up with the demand for delivery software at speed. A recent GitHub survey found that 92% of U.S.-based developers are using AI coding regularly. But while many developers are using AI to assist them in writing code, they seem to be doing so warily. A 2024 developer survey found that less than half of developers (43%) felt good about the accuracy of AI tools, while 31% were skeptical about them.
Researchers from the Massachusetts Institute of Technology among other institutions are looking to tame large-language model-based AI coding using a new control that works across multiple languages. According to the MIT News article "Making AI-Generated Code More Accurate in Any Language":
"A new approach developed by researchers at MIT and elsewhere ... allows an LLM to allocate efforts toward outputs that are most likely to be valid and accurate, while discarding unpromising outputs early in the process. This probabilistic approach boosts computational efficiency."
With the rising risk coming from AI coding, this advancement is sure to be welcomed by application security (AppSec) teams. Here's what you need to know.
[ Get White Paper: How the Rise of AI Will Impact Software Supply Chain Security ]
Reining in the LLMs
Kurt Seifried, chief innovation officer for the Cloud Security Alliance, said the MIT research tackles a major problem with AI coding assistants, which is "making them reliable enough for real-world use."
"The breakthrough lies in a mathematical approach that ensures code follows programming rules while still accomplishing what the user wants. Instead of building bigger AI models, the researchers created a system that guides existing models to produce better code."
—Kurt Seifried
Seifried noted that AI models can write code quickly, but they struggle to follow precise programming rules, or syntax. Current methods either waste resources by requiring the complete rewriting of incorrect code or make fixes that change what the code was supposed to do in the first place.
"It's like having an expert programmer looking over the AI's shoulder at each step, dramatically improving accuracy without requiring larger, more expensive AI systems."
—Kurt Seifried
Jason Soroko, senior vice president of product at Sectigo, said the MIT researchers have shown that LLMs can be steered to produce code that is syntactically valid and semantically faithful without repeated retries. Soroko said past fixes relied on post hoc checking or token-by-token correction. "Those fixes often broke intended meaning or consumed extra computation," he said.
"The researchers attach constraint checks to every generation step and drop any partial program that violates the rules, so the surviving code usually compiles and runs. This deals directly with the main hurdles of LLM coding — respecting grammar, keeping the logic intact, and limiting cost."
—Jason Soroko
Enter the sequential Monte Carlo algorithm
The rudder used to steer the AI code is the sequential Monte Carlo method. It allows the model to dynamically allocate resources to different threads of parallel computation based on how promising their output appears. "The model keeps many candidate sequences, gives each a weight that reflects how well it matches the stated structure and goal, then resamples so compute follows the highest weights and discards the rest," Soroko explained.
"Resources concentrate on lines of thought that will finish cleanly, which lets a small open model cover the useful search space better than a much larger unguided one. Smaller models need less memory, run on local hardware, and expose fewer proprietary risks yet still achieve higher pass rates on tasks like text-to-SQL and molecule design."
—Jason Soroko
Greg Martin, founder and CEO of Ghost Security, said that integrating the sequential Monte Carlo method into smaller LLMs to produce higher-quality AI code is potentially a big win for local open-source AI models.
"This shows how faster, smaller, local models are advancing versus the large frontier model providers like Open AI or Anthropic in areas like code generation."
—Greg Martin
Stephen Kowski, field CTO for SlashNext, said the MIT research shows that with smarter algorithms, even smaller AI models can write code that’s accurate and follows programming rules, sometimes even better than much bigger models.
"It means you don’t always need a giant, expensive AI to get great results. This could make powerful coding tools more affordable and easier for everyone to use."
—Stephen Kowski
Vikash Kumar Mansinghka, a principal research scientist and leader of the Probabilistic Computing Project in the MIT Department of Brain and Cognitive Sciences, laid out a technical example of how the sequential Monte Carlo method works.
"Let's say I prompt an LLM to write Python code that takes two tensors and multiplies them and then factors them. Of all the things the LLM could generate given that prompt, only a tiny, tiny, tiny fraction will actually even be in Python, let alone be Python code [that] actually does the job. So what our system does is constrain the output distribution to concentrate only on answers that are valid. And it does that in a clever incremental way so that as we go along generating the code, it's incorporating constraints as soon as they become locally checkable."
—Vikash Kumar Mansinghka
Mansinghka said the new method puts engineering back in the driver's seat — and in an efficient way. "It achieves that in a very unusual way, not by tuning the models or doing reinforcement learning or gathering a lot of data, but by engineering in symbolic representations of knowledge that wrap the LLM in a layer of probabilistic inference," he said
Why taming LLMs is core to AI's usefulness
While the MIT research has the clear potential to improve AI coding assistants, it's also potentially useful to AI-powered data analysis and scientific discovery tools, said Melody (MJ) Kaufmann, an author and instructor at O'Reilly Media.
"This research enables programming assistants to deliver more accurate, context-relevant suggestions by focusing effort on likely correct completions. It also improves AI-powered data analysis by helping models zero in on the most meaningful patterns or hypotheses, improving insights without unnecessary computation."
—Melody (MJ) Kaufmann
For scientific discovery tools, Kaufmann noted, the research enhances multistep reasoning, allowing AI tools to more efficiently explore complex solution spaces and generate useful, novel results, which is especially valuable in large datasets used in research.
The Cloud Security Alliance's Seifried said the fundamental nature of the research also means AI coding assistants will find it easier to handle more complex programming tasks.
"Data analysts will be able to leverage more accurate database queries and analysis scripts that follow proper syntax. Over time, each of these applications will become more reliable while requiring less computing power."
—Kurt Seifried
SlashNext's Kowski said the research could make AI coding much better at suggesting code that actually works and that matches what's wanted, which will save time on debugging. "For data analysis, it means more accurate and reliable scripts or queries, even for people who aren’t coding experts," he said.
What the MIT research means for AI agents and coding
Combined with agentic AI, the MIT researchers' method could lead to code-generation capabilities being embedded natively in future software so that it can enhance itself on its own over time, Ghost Security's Martin said. "This dynamic self-improvement is one of the most exciting things that the agentic model brings to AI," he said.
That does not mean that humans can be removed from the loop, said Iftach Ian Amit, founder of Gomboc.AI. Even LLMs powered by the research will still make mistakes, "so developers will still need to be highly proficient in the application or language being developed in order to find the areas where the model-generated code is not 100% correct," he said.
Lowering the bar to AI coding
The MIT research could lead to non-experts having greater control over AI-generated content, O'Reilly Media's Kaufmann said. "By filtering out low-quality outputs early, the model delivers cleaner, more accurate results that require less technical judgment to assess," she said. "Doing this lowers the barrier to using advanced AI tools, empowering more people to produce reliable content without deep expertise or constant oversight."
Kowski said that with these improvements, anyone could tell an AI what they want in plain language and get back code or data queries that work, without needing to know all the technical details.
"This opens up coding and data analysis to people who never learned to program, making technology more accessible. It also means fewer mistakes and more confidence that the results are correct, which is good for everyone."
—Stephen Kowski
Seifried said that the method developed by the MIT researchers could let non-programmers generate complex technical outputs like database queries using just natural language. This is especially relevant given the rise of vibe coding.
"Because the system ensures that the result follows all technical rules while accomplishing what the user actually wanted, you don't need any specialized knowledge."
—Kurt Seifried
Keep learning
- Read the 2025 Gartner® Market Guide to Software Supply Chain Security. Plus: Join RL's May 28 webinar for expert insights.
- Get the white paper: Go Beyond the SBOM. Plus: See the Webinar: Welcome CycloneDX xBOM.
- Go big-picture on the software risk landscape with RL's 2025 Software Supply Chain Security Report. Plus: See our Webinar for discussion about the findings.
- Get up to speed on securing AI/ML with our white paper: AI Is the Supply Chain. Plus: See RL's research on nullifAI and learn how RL discovered the novel threat in this
Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.