While monitoring different malicious packages found in public software repositories, ReversingLabs researchers have noticed an increase of malicious HTTP libraries on the Python Package Index (PyPI) repository. Actually, we should air-quote “HTTP libraries.” In reality, most of these are simple, malicious packages bearing names that are Frankenstein-like amalgamations of the acronym "HTTP".
The descriptions for these packages, for the most part, don't hint at their malicious intent. Some are disguised as real libraries and make flattering comparisons between their capabilities and those of known, legitimate HTTP libraries.
Specifically, ReversingLabs detected 41 malicious PyPI packages posing as HTTP libraries, with some mimicking popular and widely used libraries. It is just the latest attempt by malicious actors to use open source repositories like PyPI, npm and GitHub to distribute malware.
This report includes the full list of packages, and also describes the discovery and provides the developer community with telltale signs of malicious HTTP libraries so that they can detect this emerging threat.
It is not unusual for bad actors to invoke the acronym “HTTP” while naming malicious packages. HTTP libraries are widely used by developers for networking functionality and to communicate with appropriate APIs when functionality from third party modules need to be included in their application.
This background makes HTTP libraries very interesting to malicious actors, and to researchers tracking malicious campaigns online. In our research, ReversingLabs discovered a trove of malicious packages on the PyPI repository and identified two, distinct types of malicious modules hiding in these supposed HTTP libraries:
- Downloaders used to deliver second stage malware to compromised systems.
- Info stealers containing malicious functionality used for data exfiltration embedded in them.
Looked at more closely, these malicious packages share similarities. The packages contain only a few files, most with very little information identifying them, compared with legitimate software modules. At best, some of these malicious files will have code comments or short descriptions of the functionality. It goes without saying: The functionality and purpose contained in these packages are fictitious. The real purpose of these packages is malicious, and not described.
To understand how these malicious HTTP packages work, here's a look at a few packages ReversingLabs uncovered.
Among other files, this package contains one particularly interesting file to us, setup.py. Content similar to this file’s content is seen fairly frequently, so the first look at it raised suspicion of its malicious nature. It contains more than 500 lines of code made to steal information from victims. Information ranges from various discord information to passwords and tokens. Once it is all gathered, it is sent to the malicious actor: The author of the package.
Figure 1: Infostealer’s malicious code
The suspicious payload of the httpsus package is slyly hidden, encoded with base64 and pushed to the very right of the setup.py file, so its content couldn’t fit into default screen width. That is a clever way to fool developers. Not every package made that effort, however. The malicious package htps1, for example, has malicious code barely concealed inside of an __init__.py file. This file is implicitly executed after a package has been imported somewhere.
Figure 2: Downloader’s malicious payload pushed to the side
Figure 3: Downloader’s malicious payload decoded using CyberChef
What’s in a name?
As we observed: almost all the malicious packages we discovered had names that invoke the “HTTP” acronym — an obvious effort to fool developers into believing the package is an HTTP library. Other package names associated with HTTP are also targeted, even though their names do not include the “HTTP” acronym. For example, packages like aiohttp or urllib3 were also the target of typosquatting attacks via malicious packages with names like aio6, aio5, ulrlib3, and urllb, ReversingLabs discovered.
However, when it comes to the package names, it appears there isn't always a correlation between a real HTTP library and a malicious one. For example, the malicious package httpxv2 is likely trying to mimic the legitimate package httpx, "a fully featured HTTP client library for Python 3". Yet at the same time, the description for httpxv2 makes no effort to further that illusion, describing the httpxv2 as like “Ctypes but better," a reference to the Ctypes library, a foreign function library for Python that provides C compatible data types. That mistake was corrected with the malicious package httpxv3 - a successor to httpxv2, which is described as "httpx but better."
Figure 4: Fabricated description of malicious package
As with other supply chain attacks, malicious actors are counting on typosquatting creating confusion and counting on incautious developers to embrace malicious packages with similar-sounding names by accident. In a few cases, the attackers attempted to convince developers to install a package outright without trying to confuse them. For example, httpssus doesn’t imitate another, legitimate package, but is described as a “simple CLI note taker and free-wheeling wiki.” Likewise, httpsos, which is described as “a simple caching utility in Python 3.”
Who can you trust?
Fortunately for developers, there are a number of legitimate, non-malicious HTTP libraries to choose from on PyPI as well as other repositories. Below is a list of current HTTP libraries that we recommended for use. Which library you choose for your project will depend on what you want to achieve, and which functionality you wish to implement.
It doesn't contain the acronym "HTTP" in its name, but this library is at the top of the Google search results when you search for "how to make HTTP requests in python." The requests library is widely used and easy to operate (requests can be made in a single line). It is also being actively maintained with frequent updates.
A part of a standard Python library, urllib contains a couple of modules for managing HTTP communications such as urllib.request, urllib.error, urllib.parse and urllib.robotparser. To make HTTP requests, module urllib.request is needed. Looking at its official documentation and comparing it to the requests library, urllib appears more complicated and more difficult to use, but still powerful and useful.
The urllib3 library is another popular choice for making HTTP requests. It improves on the standard urllib library with features like thread safety, connection pooling and client-side SSL/TLS verification.
This library is not only used to make requests, but to build HTTP servers and clients. As its name suggests, aiohttp is used for building asynchronous HTTP clients/servers.
Below is a list of the malicious PyPI packages identified by ReversingLabs researchers. Additional malicious packages were found and reported by Fortinet. They are discussed in more detail in their latest blog post.
The lesson that this discovery has for developers is one that is becoming familiar. Typosquatting attacks on platforms like PyPI, npm, RubyGems and GitHub are common. Something as simple as overlooking a missing or repeated letter in a package name —ulrlib3 instead of urllib3, for example — can mean that a malicious library will be installed on your system instead of the intended, legitimate one. Typo-squatting and other attacks on developers often hinge on this kind of simple confusion and social engineering. Malicious actors are counting on the fact that developers have a lot of work to do, and not much time to do it - and that small mistakes or discrepancies may escape notice.
Developers and development teams also need to keep track of popular libraries and frameworks used by the developer community. Using an old, deprecated or not actively-maintained library can mean introducing exploitable and unpatched vulnerabilities into your project, which can give malicious actors the ability to compromise downstream systems that are running your code.
Finally, developers should frequently conduct security assessments of third-party libraries and other dependencies in their code. Fortunately, there is a growing population of tools that can help track modules and dependencies and detect malicious content lurking in open source and third party packages. ReversingLabs’ A1000 is one. It provides thorough static analysis and threat classification of different libraries. Also, ReversingLabs Software Supply Chain Security platform offers binary analysis of software release packages to ensure they don’t carry unwanted risks or behaviors.
- Join Webinar: Threat Modeling & Software Supply Chain Security
- Supply Chain Risk Report: Learn why you need to upgrade your app sec
- Learn more: SCA tools and how app sec is evolving to tackle supply chain security
- How to to harden machine learning models against attacks
- Track key trends: The State of Supply Chain Security 2022-23