Up to 350,000 open source projects vulnerable to 15-year-old Python bug

Researchers at Trellix and GitHub have patched nearly 62,000 affected projects so far

A 15-year-old Python vulnerability has been found to have affected hundreds of thousands of open source projects over its lifespan.

The vulnerability, tracked as CVE-2007-4559, is a path traversal attack in the extract and extractall functions found in the Python tarfile module. Researchers at Trellix warned that, if exploited, it could enable an attacker to overwrite arbitrary files in a TAR archive.

Trellix said that researchers initially thought they had discovered a new zero-day vulnerability upon encountering the flaw. However, a subsequent investigation last year found it dated back to 2007.

The bug was deemed to be of low importance at the time. However, Trellix warned that it was found to be present in more than 350,000 open source projects and in an undisclosed number of closed-source projects.

“Late last year, the Trellix Advanced Research Center team uncovered a vulnerability in Python’s tarfile module. As we dug in, we realised this was CVE-2007-4559 – a 15-year-old path traversal vulnerability with potential to allow an attacker to overwrite arbitrary files,” said Douglas McKee, director of vulnerability research at Trellix.

“CVE-2007-4559 was reported to the Python project in 2007, and left unchecked, had been unintentionally added to an estimated 350,000 open source projects and prevalent in closed source projects.”

McKee added that the vulnerability is “firmly embedded in the supply chain of many projects” and remains widespread.

The Python bug is believed to have been present in frameworks created by Google, Intel, and Amazon Web Services (AWS), highlighting both its longevity and potential critical risk.

GitHub collaboration

Since the discovery of the bug, Trellix said it has worked extensively with GitHub to issue a fix.

Nearly 62,000 susceptible open source projects have been patched to date, McKee revealed.

“To effectively minimise the vulnerability surface area, Trellix Advanced Research Centre executed a months-long effort to patch open source projects known to use the vulnerable code,” he said.

“Through GitHub, developers and community members are able to push code to projects or repositories on the platform via a process called pull request. Once a request is opened, the project maintainers review the suggested code, request collaboration or clarification if needed, and accept the new code.”

Upon receiving a list of repositories and files that contained the keyword ‘import tarfile’, McKee said researchers were able to compile a list of repositories to scan using the Creosote vulnerability tool.

“If a repository was determined to contain the vulnerability, we patched the file and created a local patch diff containing the patched file so users can easily compare the two files, the original file, and some metadata about the repository,” he added.

Open source vulnerabilities

Open source vulnerabilities have been a recurring issue for businesses globally in recent years. Research from Anaconda last year found that organisations scaled back their use of open source software across 2021 and 2022 amidst security concerns.

Nearly one-third (31%) of respondents to Anaconda’s survey said that security vulnerabilities were the number one challenge in the open source community.

In May 2022, security experts uncovered vulnerabilities in two popular open source packages, Python CTX and PHP’s phpass.

If exploited, the vulnerabilities could have enabled attackers to launch software supply chain hacks which harvested AWS cloud credentials.

McKee warned that great collaboration is required across the open source community to eliminate critical vulnerabilities.

“As an industry, we cannot afford to ignore the need to seek out and eradicate foundational vulnerabilities,” he said. “Mass patching of open source projects can be done, even if it takes a lot of time, and it can deliver benefits to organisations of all sizes, across sectors and regions.”

McKee added that to “properly prevent the reintroduction of past attack surfaces”, organisations using code libraries and frameworks in their applications conduct regular checks and implement robust evaluation measures to improve supply chain transparency.