The complicated tangle of dependencies in modern software development processes make it tricky to identify dangerous flaws hidden in open-source software (OSS) projects. But the bigger bugaboo has been how to issue fixes to vulnerable projects at a scale that can reduce the attack surface across the entire software supply chain.
How do you scale bug fixes for a single flaw that has been duplicated across thousands of OSS projects, each of which has its own maintainers, coding language, syntax idiosyncrasies, and project culture? That's especially challenging since those maintainers have a widely varying willingness, security savviness, and resource capacity to take action when notified of a security issue.
Now researchers have developed a new technique — using bulk pull requests — to overcome this challenge with OSS. They're rooting out certain easy-to-fix vulnerabilities that have been replicated across thousands of OSS projects, automating the code refactoring process and offering it to maintainers.
The new process has already been used successfully on tens of thousands of OSS projects. Here's why overcoming the scale issue with OSS vulnerabilities has become such a problem, and how this bulk pull request solution works.
With OSS, even simple fixes become a big challenge
Unpatched vulnerabilities in a single tarfile module or other library component in an OSS project can affect millions of other pieces of software that depend on that component. Why? Because vulnerable components tend to propagate across the OSS ecosystem. A single critical flaw in one library may proliferate across thousands of dependent OSS projects. And every one of those projects will have its own blast radius of impact in the broader software world.
So enterprise software may be exposed not just from dependencies to an original component, but to orders of magnitude more additional vulnerable OSS components hidden within their architecture, said Matt Rose, field CISO for ReversingLabs.
"Security issues in open-source packages are propagated because they are hidden in the very complex application architecture."
A flaw duplicated in downstream OSS projects is often more dangerous to enterprises because it is less apparent and can linger on unpatched, even if the maintainer of the known vulnerable component makes a fix. Maintainers may not have the visibility or security savvy to recognize a flaw's relevance or impact on their users, they may not have the resources to make a fix, or they might worry that patching the vulnerable component will break their build and those of other dependent projects. Or they simply may not be aware of its existence in the first place.
So the maintainers of an affected component either can't or won't fix that root cause flaw, compounding the research and effort the rest of the maintainers in the OSS ecosystem must take to fix all of those downstream vulnerabilities.
What's ironic about this phenomenon is that those vulnerabilities don't always require complicated fixes. Some are simple flaws, with well-known, simple fixes that have been around for years. They're effectively some of the lowest hanging fruit the march toward improved software supply chain security.
Scaling the effort to eliminate OSS vulnerabilities
Pitching projects one-by-one to convince their maintainers to take action requires a huge effort, but until recently, security researchers hadn't succeeded.
Manually notifying maintainers is a heavy lift, with limited effectiveness. Security researchers, developers, and app sec experts need to bring creativity and automated collaboration to the process of convincing maintainers to fix their software — and this is the route that Jonathan Leitschuh, lead researcher for Alpha Omega Project, an OpenSSF initiative, is taking. Rather than going deep into any one vulnerability's inner workings, he explored how broad its impact is and how to use automation to eliminate it everywhere.
When working at Gradle a few years ago, Leitschuh helped orchestrate a fix in the Maven Central Repository to shut down use of HTTP in order to stop a flaw that allowed build files to resolve their dependencies over HTTP instead of HTTPS. The fix reduced many dependency risks, but the flaw still existed in OSS projects that didn't rely on the Gradle's three big Maven servers. Using GitHub's CodeQL, Leitschuh was able to search for a list of projects vulnerable to the flaw but wasn't sure how to go about getting them fixed.
"So I had a list and I'm like, 'Why don't I just fix it?' But it was thousands of projects, I couldn't do this by hand. But I could see this was a cookie-cutter enough vulnerability that you could just fix it with a bot. I figured, 'Why not?'"
His solution: Use bulk pull requests to deliver to maintainers a vulnerability disclosure and ready-made code that simplified fixing it. Leitschuh needed to do this at scale, so he wrote a bulk pull request generator using the list of vulnerable projects he'd found from CodeQL to deliver the pull requests all in one go.
"We know these vulnerabilities are out there. You put the scanners in the hands of a maintainer, they see a lot of noise. They have to filter out what’s good and what’s bad. With a pull request, even if they don’t fix it, it still, hopefully, hardens the software."
This early work helped him earn the Dan Kaminsky Fellowship to keep his momentum moving forward, keep iterating on his toolset, and continue to tackle more widespread vulnerabilities.
Leitschuh has also tackled the complicated task of turning a detection into a fix that works for different developers. "Software developers write code in different ways, and trying to write fixes for all of the different ways developers can write code is difficult," he said.
So he turned to a former coworker and founder of Moderne. The company runs the OpenRewrite project, which does code refactoring at scale. OpenRewrite wasn't exactly a security tool, but it was perfect for solving this security problem, Leitschuh said. It has a templating engine that allows coders to write in their native language and substitute statements in several different pieces of software while handling all of the formatting and identifying where new code needs to be inserted.
Patrick Way, a principle software engineer at Moderne, explained how Leitschuh leveraged OpenRewrite:
"With this Jonathan is able to detect a vulnerability, define the fix, and perfectly insert into the space where it belongs."
Leitschuh's old work friend introduced him to Way, who taught him OpenRewrite from the ground up and helped him generate fixes for a wide range of OSS projects. With the help of Way and intern Shyam Mehta, Leitschuh re-implemented features like data flow and control flow analysis that were either missing or partially implemented in OpenRewrite in order to help fix the flaws for which he wanted to generate bulk pull requests.
Leitschuh has now fixed more than 5,200 different OSS supply chain vulnerabilities, and has presented at many conferences, where he's walked attendees through notable clusters of vulnerabilities he worked through during his fellowship. In August he presented with Way at Black Hat USA 2022. Watch this Youtube session for further technical details and a walk-through.
Leitschuh's approach is gaining momentum
The techniques Leitschuh pioneered, and his results, are lighting a fire under the security research and app sec community, said Rose.
"This technique is a huge step in the right direction. Ensuring that the open source code repos are secured more effectively helps eliminate the risk that a vulnerable package is used in production, even if it is missed by security scanners like SCA [software composition analysis]."
Just last month, the security team at Trellix announced a breakthrough using a new tool inspired by Leitschuh's work. During the Schmoocon hacker convention, Kasimir Schulz, a vulnerability researcher with Trellix Advance Research Center, explained in a blog post how he and a coworker patched tens of thousands of OSS projects that had been vulnerable to a 15-year-old path traversal flaw originating in Python's tarfile module.
"Our Advanced Research Center vulnerability team was able to automate most of the processes, except for quality control. We broke the process into two steps, the patching phase and the pull request phase, both of which were automated and simply needed to be executed."
To help with patching phase, the team wrote Creosote, a tool that recursively traverses a directory searching for Python files, scans them for the vulnerable tarfile module, and parses code into an application security testing tool to look for the flawed code. This helped the team search through hundreds of thousands of unique repos on GitHub that import the tarfile module to find those that were vulnerable.
"In the end, we were able to send out 61,895 pull requests," Schulz said in his Schmoocon talk. "This took about three months to do. We did have some pushback. However, we had a lot of notable projects accept the pull request and actually thank us."
Leitschuh was ecstatic about the news, and publicized the Trellix team's results.
"There’s a wealth of open source vulnerabilities that are just waiting to be fixed with more advanced techniques."
He hopes his new position at Alpha-Omega will provide him with an opportunity to magnify these efforts. Only in its second year, the Alpha-Omega Project has two sides:
- The Alpha side goes deep into some of the most critical OSS projects to modern software architecture. The Alpha team does audits of projects like npm, and tries to help those maintainers fix a wide variety of flaws in their code.
- The Omega side goes broad. It seeks to root out critical vulnerabilities in 10,000 widely-deployed OSS projects.
This focus dovetails with Leitschuh's research mindset, he said. He has a lot of targeted work in his plans, but one of the most interesting will be not just automating bulk pull requests but following the supply chain back to the root causes upstream and collaborating with maintainers to fix them at their source.
"There are multiple parsers and multiple scanners that find these vulnerabilities across the industry. But nobody has tried to spend the same amount of time that we've spent writing those scanners on trying to go to the upstream maintainer and finally convince them to fix it once and for all."