Analysis of the Biggest Python Supply Chain Attack Ever


Last week, the official Python Package Index website faced a large-scale automated attack where more than 3500 malicious packages were added to the index, aiming to be downloaded by unsuspecting developers. This article will explain the approach and goal of this campaign called a supply chain attack, and why such attacks are extremely powerful. But first, a little bit of context.

The SolarWinds hack : how 2020 changed the model of trust

In december 2020, the Washington Post reported that SolarWinds, an American software development company of 3000 employees, was hacked. This story so far has nothing special, there are much larger companies breached every day. The reason why SolarWinds made the headlines was because of the software they make : their core business is based around IT network and infrastructure monitoring, with software running on millions of company devices around the world. The attack, which has been attributed to Russian hacking group APT29 aka “CozyBear”, inserted malicious code in an update of the Orion software, which was used by around 18 000 companies at the time. Among the victims, many government bodies (US Treasury, EU Parliament, GCHQ, …) and also notably several entities associated with health and the covid crisis (UK’s National Health Service, AstraZeneca, …). Due to its high impact, the SolarWinds story was deemed by many the biggest cyberattack in history by some.

This kind of attack, where hackers aim for a supplier of their target instead of attempting to breach its highly secured systems directly, gave them the name of “supply chain attacks”. Those are extremely potent, since a common supplier can yield more than a single high-value target and give access to multiple backdoors, as was the case with SolarWinds.

A new Python supply chain attack?

On March 1st, 2021, a newly created account on the Python Package Index PyPI uploaded 3591 new packages. Each package had a name that closely resembled the name of another popular package, such as beauitfulsoup4 instead of beautifulsoup4 (notice how the i and t are swapped). When a developer makes a typo in their terminal or in a build script when writing the package name, the maliciously named package is then downloaded and installed instead of the legitimate one, and this can lead to malware execution which may be hard to notice. This type of attack is called typosquatting which dates back to the early days of the Web when hackers bought domain names like or in order to trap unsuspecting victims on phishing-like websites.

What’s inside?

The PyPI administrators were quick to react and deleted all typosquatting packages within a few hours, but I still managed to get the cached version of one of them. Excited to find out what’s inside and what kind of devious malware could hide in the install scripts, I quickly opened the code in my editor, and was quite surprised at first:

The screenshot above shows pretty much the entirety of the package. The obfuscated URL pointing to an IP instead of a domain name is typical of malware, but the reality is quite disappointing: as you can see, the code only queries the URL but never stores or executes its response which is actually empty. This means the script is only signaling to someone that it was successfully downloaded and installed, but does nothing beyond that. But why?

The answer is not 100% certain as of now, but we can assume this is the work of a security researcher who wanted to raise awareness about typosquatting supply chain attacks, by publishing a lot of fake packages and collecting statistics about how many times each one was downloaded. The IP address of the fake C&C server is based in Japan, so I would bet in the upcoming weeks a Japanese researcher will publish the results obtained during the few hours their packages were online.

In conclusion, supply chain attacks are scary because you can be compromised through a mass attack even if you’re not a high-interest target, and this means they should definitely be part of your threat model.

Mathis Hammel


Mathis is a technical expert in various areas such as cybersecurity, machine learning and algorithms. He has always been passionate about competitive cybersecurity (also known as CTF) and coding contests, with several national and international achievements.

More on Mathis Hammel.

Related Posts

Your email address will not be published. Required fields are marked *