Friday, February 28, 2025
HomeAICopilot exposes private GitHub pages, some removed by Microsoft

Copilot exposes private GitHub pages, some removed by Microsoft

Share

Microsoft’s Copilot AI assistant is exposing the contents of more than 20,000 private GitHub repositories from companies including Google, Intel, Huawei, PayPal, IBM, Tencent and, ironically, Microsoft.

These repositories, belonging to more than 16,000 organizations, were originally posted to GitHub as public, but were later set to private, often after the developers responsible realized they contained authentication credentials allowing unauthorized access or other types of confidential data. Even months later, however, the private pages remain available in their entirety through Copilot.

AI security firm Lasso discovered the behavior in the second half of 2024. After finding in January that Copilot continued to store private repositories and make them available, Lasso set out to measure how big the problem really was.

Zombie repositories

“After realizing that any data on GitHub, even if public for just a moment, can be indexed and potentially exposed by tools like Copilot, we were struck by how easily this information could be accessed,” Lasso researchers Ophir Dror and Bar Lanyado wrote in a post on Thursday. “Determined to understand the full extent of the issue, we set out to automate the process of identifying zombie repositories (repositories that were once public and are now private) and validate our findings.”

‍After discovering Microsoft was exposing one of Lasso’s own private repositories, the Lasso researchers traced the problem to the cache mechanism in Bing. The Microsoft search engine indexed the pages when they were published publicly, and never bothered to remove the entries once the pages were changed to private on GitHub. Since Copilot used Bing as its primary search engine, the private data was available through the AI chat bot as well.

After Lasso reported the problem in November, Microsoft introduced changes designed to fix it. Lasso confirmed that the private data was no longer available through Bing cache, but it went on to make an interesting discovery—the availability in Copilot of a GitHub repository that had been made private following a lawsuit Microsoft had filed. The suit alleged the repository hosted tools specifically designed to bypass the safety and security guardrails built into the company’s generative AI services. The repository was subsequently removed from GitHub, but as it turned out, Copilot continued to make the tools available anyway.

Screenshot showing Copilot continues to serve tools Microsoft took action to have removed from GitHub. Credit: Lasso

Lasso ultimately determined that Microsoft’s fix involved cutting off access to a special Bing user interface, once available at cc.bingj.com, to the public. The fix, however, didn’t appear to clear the private pages from the cache itself. As a result, the private information was still accessible to Copilot, which in turn would make it available to the Copilot user who asked.

The Lasso researchers explained:

Although Bing’s cached link feature was disabled, cached pages continued to appear in search results. This indicated that the fix was a temporary patch and while public access was blocked, the underlying data had not been fully removed.

When we revisited our investigation of Microsoft Copilot, our suspicions were confirmed: Copilot still had access to the cached data that was no longer available to human users. In short, the fix was only partial, human users were prevented from retrieving the cached data, but Copilot could still access it.

The post laid out simple steps anyone can take to find and view the same massive trove of private repositories Lasso identified.

There’s no putting toothpaste back in the tube

Developers frequently embed security tokens, private encryption keys and other sensitive information directly into their code, despite best practices that have long called for such data to be inputted through more secure means. This potential damage worsens when this code is made available in public repositories, another common security failing. The phenomenon has occurred over and over for more than a decade.

When these sorts of mistakes happen, developers often make the repositories private quickly, hoping to contain the fallout. Lasso’s findings show that simply making the code private isn’t enough. Once exposed, credentials are irreparably compromised. The only recourse is to rotate all credentials.

This advice still doesn’t address the problems resulting when other sensitive data is included in repositories that are switched from public to private. Microsoft incurred legal expenses to have tools removed from GitHub after alleging they violated a raft of laws, including the Computer Fraud and Abuse Act, the Digital Millennium Copyright Act, the Lanham Act, and the Racketeer Influenced and Corrupt Organizations Act. Company lawyers prevailed in getting the tools removed. To date, Copilot continues undermining this work by making the tools available anyway.

Microsoft representatives didn’t immediately respond to an email asking if the company plans to provide further fixes.

Popular

Related Articles

Airbnb co-founder Joe Gebbia takes wraps off his first assignment for DOGE

Almost two weeks after The New York Times reported that Airbnb co-founder Joe...

Microsoft Copilot gets a macOS app

Microsoft finally released a macOS app for Copilot, its free generative AI chatbot. ...

Amazon debuts Ocelot, its first quantum computing chip

Amazon Web Services (AWS) has introduced Ocelot, its first quantum computing chip. The...

Europes Relay pulls in $35M Series A after applying Asias model to delivery

Being somewhat later than Europe in adopting the idea of parcel delivery, much...
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x