RL Blog
|

Package names repurposed to push malware on PyPI

What’s in a name? Here's how bad actors are pushing malware on the Python Package Index under the guise of legitimate yet abandoned open source modules.

Lucija Valentić
Blog Author

Lucija Valentić, Software Threat Researcher, ReversingLabs. Read More...

pypi-malicious

In the beginning of March, ReversingLabs researchers encountered a malicious package on the Python Package Index (PyPI) named termcolour, a three-stage downloader published in multiple versions. Finding this malicious payload wasn’t difficult, but what piqued our interest was its name. The termcolour package wasn’t new. In fact, it had been published to PyPI two years earlier, and then removed. It reappeared on PyPI in the beginning of March — this time as a malicious downloader. 

How is it that the name of a pre-existing and legitimate package can be co-opted by a malicious one? That question can be easily answered by just looking at the PyPI Common questions page, which makes clear that taking over an abandoned project — or just a package name — is permissible on PyPI when certain criteria are met. It is also demonstrated in practice: There have been instances of abandoned projects being transferred to new owners, or entirely new packages taking on the name of earlier, discontinued modules. The vast majority of these are legitimate, non-malicious modules. However, in an age of growing software supply chain attacks, allowing the re-use of names of previously removed packages creates cyber risk. 

As the termcolour incident indicates, repurposing names is a strategy that is already being used by malicious actors to spread malware and start a supply-chain attack. Depending on the popularity of the  co-opted package, these attacks could have a major impact on downstream developers who count the replaced repurposed package as a dependency. 

How can organizations protect themselves from such attacks? A close analysis of what happened with the termcolour package can help us answer that question. Here's my team's analysis of the hijacking of the termcolour Python package, a look at the larger issue of package name re-use on PyPI — and some of the telltale signs that development organizations can use to detect such attacks on their own software supply chain.

Termcolour: A suspiciously boring package

The PyPI package termcolour didn’t raise any red flags when we first looked at it. The module was described as a “simple terminal color changing app based off the ASCII colors” and as far as it seems, it wasn’t widely used.  

But there were a few discrepancies with the package that raised red flags. For example, the name of the package and the name on the GitHub page did not match. Also, the GitHub page associated with the termcolour package no longer existed.

Names do not match

Figure 1: Names do not match.

When we peered inside the contents of termcolour, we found more reason for concern. Every file in the package was simple — far too simple. These files contained hardly any useful code and their purpose was unclear, apart from one file: colored.py. That file contained very suspicious code with a lot of the tell-tale characteristics we find with malicious code hidden in open source repositories. 

For example, only one function had variables with strange names that obscured the real purpose of the function from uncautious developers who might install and use the package. Aditionally, in the same function there was obfuscated (encrypted) code. It was called and executed from a function that disguised itself as a function for background color, but the real purpose of which was difficult to discern from reading the code. 

Malicious payload hidden inside colored.py.Figure 2: Malicious payload hidden inside colored.py.

As it can be seen in Figure 2, the malicious code is encrypted in a very simple manner, and it is then decrypted in the next few lines of code. Afterwards, it is executed — as function “b” is called. Upon closer examination, mysterious function b is actually an exec command masked behind a benign name. 

Figure 3 shows the actual malicious payload after decryption. It can be seen that it is a simple three-stage downloader, where the second stage is a script executed afterwards (although only if the platform is Linux). 

Malicious payload once decrypted — a simple downloader

Figure 3: Malicious payload once decrypted — a simple downloader.

Return of the termcolour

Bad actors sneaking obfuscated and malicious code into open source packages is common these days. So why exactly is this package interesting? It is interesting because the termcolour package isn’t new. In fact, two years ago, there was another package on PyPI with the exact same name. Version 0.0.1 of termcolour was published on January the 15, 2021 under the author victorkolis. It had no functionality whatsoever — just a few files with very little code written inside. As noted above, the stated purpose of the module was to be a “simple terminal color changing app based off the ASCII colors,” but the project didn’t get very far. No updates to termcolour were ever published after the initial post. The package was completely removed from PyPI almost one and a half year later, on 11th of November 2022. 

This kind of thing is common on open source repositories, where there is a (very) long tail of small, low traffic projects and aspirational open source initiatives that are started, but never completed.  What makes termcolour interesting is its reappearance on March the 9th 2023. On that date, the package was uploaded again, with the same name, but under a different author, v2e4lisp, and a different version: 3.3.3. Later that day, version 3.3.1 was also added - an update with a lower version number. These new iterations of the package were malicious, and were found by ReversingLabs researchers soon after based on a combination of a package’s behavior that can be found in the ReversingLabs Software Supply Chain Security platform.

Behavior policies in ReversingLabs Software Supply Chain Security

Figure 4: Behavior policies in ReversingLabs Software Supply Chain Security.

Malicious termcolour packages

package_name version SHA1
termcolour 3.3.1 67bdf8aeb709760e94d8ec741417d98dfb79c4c7
termcolour 3.3.1 e691b3cb1abde57b42814d9b1aa9eb30803b64c6
termcolour 3.3.3 2cca8e088b4fae776b8347369b0f1506627a4aee
termcolour 3.3.3 5a3e10b31c5d870463c5de85ba241eb83903175e

Name games 

Why is it possible to reuse the name of a previously deleted package? This has to do with the rules and procedures used by the platform on which the package is hosted. In this case, that’s PyPI. And a closer look at the process for removing packages from PyPI gives a good idea of how the bad actors were able to step into termcolour’s shoes. 

On PyPI, when a maintainer (or administrator) deletes a package, there is a message inside the form for deleting PyPI packages, which can be seen in Figure 5.

Form for deleting package from PyPI

Figure 5: Form for deleting package from PyPI.

Basically, this informs package maintainers who are deleting packages that doing so will make their project name “available to any other PyPI user.” New packages can be uploaded under the name of previously removed packages so long as the distribution filenames (a combination of the project name, version number and distribution type) are different from those used in the previous (deleted) distribution. The Common questions section on PyPI pages has rules on allowed names that make clear that, according to PyPI policy, names can be reused so long as the version is changed; the name is not current in use; and the file in question wasn’t part of a malicious package or isn’t similar to an existing package.

Rules for using names of packages on PyPI

Figure 6: Rules for using names of packages on PyPI.

Package name reuse: Ripe for abuse

The ability to reuse the names of deleted packages is a boon for malicious actors, who can easily benefit from the good will created by legitimate developers and discontinued packages. 

There is plenty of evidence that bad actors have taken notice, also. On April 12th, for example, there was a post on Codecov’s community page about Codecov’s releases being yanked from PyPI. In a statement clarifying the situation, Codecov noted that it had removed a “rarely used package” from PyPI. Subsequently, an unrelated third-party uploaded a new, non-malicious package as a stop-gap measure to prevent hijacking of the Codecov name by malicious actors. PyPI admins followed up by removing the newly uploaded release and locking the name of the removed CodeCov package so it cannot be hijacked and used by malicious actors to spread malware. 

However, that sequence of events and the termcolour incident highlight a looming supply chain risk on platforms like PyPI. “What would happen,” we might ask, “if malicious actors were quick enough to snatch the name of a popular, but abandoned package before it could be locked?” What might have happened if they uploaded malware akin to the termcolour update, but instead of an obscure, abandoned package from an unknown developer, it was from a reputable publisher, like CodeCov, or replaced a package that had a large number of dependencies? 

The Codecov incident gives us an idea of what that might look like. CodeCov is used as “all-in-one code coverage reporting solution for any test suite” by many companies. The PyPI package in question was not widely used. However, it was one element of a redistribution that was widely used. In the wake of the incident, CodeCov customers on community pages even mentioned that they “were still installing the old package even though we (they) weren’t using it.” A successful effort to swap in a malicious file that did not get noticed would mean that some of those customers could have been hit by a supply chain attack.  

Reusing names: An experiment

While we don’t know the exact reason that termcolour was initially removed from PyPI, it was most likely removed by the owner. In accordance with PyPI policies, that left the name open to be reused later. As it turned out, it was used for a malicious package. How easy is it for bad actors to pull off something like that? We did a little experiment. 

Two accounts were made: one that published the package and then removed it, and one that tried to publish the package again after it was removed by the first account. In our experiment, the PyPI package’s name was lucija-package and the version was 0.1. 

Soon after it was created, lucija-package was removed from the first account by the user. Next, without changing the version, the lucija-package 0.1 package was re-uploaded to both of the accounts- something that shouldn’t be possible. That action failed, as it was supposed to. The results can be seen in Figure 7. 

Next, we changed the version of lucija-package to 0.2 and attempted to upload the package using the second account. That attempt was successful.

Failed uploading using version that was used before.

Figure 7: Failed uploading using version that was used before.

Mimicking malicious actor

The success of our experiment shows how easy it is for malicious actors to exploit PyPI rules on allowed project names to publish malicious packages using the names of pre-existing, but abandoned PyPI packages. Given the volume of activity on platforms like PyPI, this creates a lot of “green fields” for would-be supply chain attackers. 

To demonstrate this, we extracted and saved names of all the packages removed from PyPI between the 6th of April and the 8th of April, 2023. Next, we attempted to publish multiple packages — each using the name of one of the removed packages and the version 123.6.6. These actions were automated and the responses from the PyPI platform were recorded for observation. 

The result? Of the 111 removed packages observed between April 6 and April 8, nineteen of those package names were successfully reused - a 17% success rate. In those cases, substitute packages using the names of abandoned modules, but different code, were uploaded to the testing account luce777. 

A 17% success rate sounds low, but it is high enough to make this attack worthwhile for malicious actors. And a larger experiment might yield better results. For example, some of the 111 files we observed being removed from PyPI were known malicious packages with names like pylibscrape, pycrypting, colorizepip, promptcolors. New packages using those names weren’t allowed since they were used for malicious packages before being removed, presumably by PyPI admins. 

There were also some packages that weren’t published because their name was too similar to existing packages — as it was written in the error message, publication of list-str-ops failed because it was “too similar to an existing project”. 

Finally, names that had already been reused for packages that were already published were also blocked. We observed that with a package named ‘scrying.’ The error message we received said that “the user 'luce777' isn't allowed to upload to project 'scrying'.” Name scrying was already being used for some recently uploaded package, and it appears in this case the author was actually the same for both removed package and the newly uploaded one.

In the end, our experiments pushed us past the limit for publishing new projects and our access was terminated since “too many new projects created.” At the time of writing this blog, all of those packages were removed once again by the user.

Looking for the 'who' behind removed packages

One other problem our research has raised is the lack of transparency on platforms like PyPI around the question of removed packages. For example, when we identified the 111 removed packages, one question we were trying to answer is whether we could determine which name could be reused as a potential malicious package without actually having to test it out. One piece of information that would help in determining that is who the packages were removed by. 

In our experience, packages (or ”projects” as they are called on PyPI) can be removed either by the owner or by PyPI admins. If the project was removed by one of PyPI admins, it can be determined as most likely non-reusable without actually reuploading the package and waiting for an error message to appear. If it was removed by the owner, there is a much higher probability that the name could be re-used. 

Unfortunately, there is no way to tell, currently, who removed a project from PyPI. We can infer that malicious packages that were reported to PyPI by our own researchers were removed by the PyPI admins. The timestamps associated with these removals point to the late hours in Croatia (where many of our researchers are located), or the early morning hours in the United States. 

What can we infer from that? Not much. There are other packages removed at that time, as well, so timestamps aren’t a great way to resolve this question. Additionally, once the project is removed from PyPI, the event “remove project” is created. The problem is that the same event name is used to designate removal of a package by PyPI admins and removal by the package owner. 

Undeterred — and just to be sure we aren’t missing anything — we reached out and asked PyPI admins if there was a way to tell who removed a project. They confirmed what our research told us: there is currently no way to distinguish those two events, but there is an open issue waiting to be implemented which would provide such functionality

True, not being able to tell who removed a package makes it harder for would-be attackers to know which abandoned PyPI packages are good candidates for malicious repurposing. The 17% success rate we had repurposing the names of removed packages isn’t definitive, but it does suggest that most efforts to reuse names fail for one reason or another, raising the bar for would-be attackers. 

The problem is that it also raises the bar for would-be defenders. Transparency about who removed and reposted packages would provide information that development organizations and security teams could use to spot patterns of suspicious behavior that may indicate a malicious module is using the name of a former, legitimate package. Being able to discern the packages that PyPI administrators removed from the platform would help development teams create an exclusion list that might prevent inadvertent use of a malicious package with a repurposed name.    

Conclusion

When compared with common software supply chain attack vectors like dependency confusion and typosquatting attacks, repurposing abandoned packages is a relatively uncommon phenomenon. That’s due, in part, to the obstacles to claiming the name of a package that has already been used and removed, as well as the difficulty of pushing the abandoned package. How much easier is it to simply create a new project with a typo-squatted name that is similar to a widely used module, then sit back and wait for harried developers to slip up?   

That said, repurposing abandoned packages is a strategy with clear advantages for malicious actors, as our research into termcolour indicates. Furthermore, the lack of transparency on platforms like PyPI work to the advantage of bad actors by making it difficult for developers and development organizations to understand the full history provenance of packages they are downloading and using. 

The solution here, as elsewhere, is for development organizations to have a firm grasp on what open source and third party software dependencies they have, and to be prepared to interrogate both internally developed, third party and open source software for signs of tampering or other red flags.

Keep learning


Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.

More Blog Posts

Do More With Your SOAR

Do More With Your SOAR

Running an SOC is complex — and running without the best tools makes it more difficult. Learn how RL File Enrichment can automate and bolster your SOC.
Read More