How to Remove Sensitive Data You Accidentally Uploaded To Github

June 16, 2022

How to Remove Sensitive Data You Accidentally Uploaded To Github

It happens to me all the time.

I create repositories, and I know that I need to store my secret keys in an environment file. But I always commit my changes without adding them to the gitignore file.

As a result, our secrets are there on the internet!

Now, even if you add the environment file to gitignore, it won’t go from the history.

The severity could vary depending on what information you’ve uploaded and how difficult it is to reset it. Yet, it's preventable! So, we should always stop such sensitive data from entering the cloud.

But, If you accidentally pushed something, here’s how to clear it properly.

Related: How to Run Python Tests on Every Commit Using GitHub Actions?

Properly remove secret files from GitHub.

When you commit a secret file (such as .env), it becomes part of the git history. Removing the file and recommitting it won’t clear it. Adding it to the gitignore at a later point won’t help either.

Previously, people used git-filter-branch to tap into the branching model. Trust me; you don’t want to get in there.

If you disagree, here’s the warning message on git-filter-branch documentation.

git filter-branch has a plethora of pitfalls that can produce non-obvious manglings of the intended history rewrite — Git filter branch documentation

Fortunately, we have a simpler alternative. It’s called BFG repo-cleaner.

BFG is a community-maintained utility. It’s written in Java. So make sure you have it ready on your computer.

How NOT to remove a secret file from Git.

Let’s create a sample repo for this walkthrough.

I created a repo with only one file. It’s a .env file with a secret key in it. With it, I made my initial commit. Now the `.env` file should be in the history.

mkdir testrepo
cd testrepo/
echo "SECRET_KEY=039jrad8932" > .env
git init$ git add .
git commit -m 'initial commit'

# Let’s create a cloud repository on GitHub (or Bitbucket, etc.) and upload our code there.
git remote add origin [email protected]:<Your-Repo>.git
git branch -M main
git push -u origin main

Oops, our secret is on the internet! If you visit your remote repo, you can see the .env file.

Let’s remove it, recommit and push it. And refresh the cloud repo to see the effect.

rm .env
git add .env
git commit -m 'remove secret'
git push origin main

Phew! It’s gone now.

No, not yet. It’s still there in the commit history. Anyone who has access to the repo could still see it.

It didn’t work!

Related: 7 Ways to Make Your Python Project Structure More Elegant

Removing secret files on Git with BFG.

Here’s the correct way to remove them. As I mentioned earlier, you need BFG for it. And BFG requires Java.

You can confirm if Java is installed on your computer with the following command. If not, please follow this documentation to install it.

You can download the BFG jar file from their official page.

Step I: To remove the file from the history across all branches, first, you need to create a mirrored clone of our remote repository.

$ git clone --mirror git://<Your-Repo>.git

What is a mirror repo?

Suppose you go inside your newly cloned repository. You couldn’t see your ordinary files. Instead, you see files and folders used by Git to manage your repository. We call such repositories bare repos. But mirror repo, in addition to the bare repo content, has the full copy of your files, branches, history, etc., inside it.

Step II: Here’s the code that removes the file from history. Make sure you run this code from outside the cloned repository.

$ java -jar bfg.jar --delete-files .env <Your-repo>.git

The above code will print some helpful information on the terminal. It’ll inform you if there are any file matches with your criteria, branches, and commits that will get affected, etc.

Your file hasn’t been deleted physically. It’s removed only from git history. So you need to review the information on the screen before you delete them permanently.

Step III: You can run the following code to permanently delete it from all the branches and commits when you're done reviewing. This command needs to run inside the repository.

$ cd <Your-Repo>.git # if you're still outside the repo directory. $ git reflog expire --expire=now --all && git gc --prune=now --aggressive$ git push

If you refresh the remote repository page now, you won’t see the secret file. Not in the files section. Not even in the commits.

We just removed our secrets from the internet.

Final Thoughts

Pushing sensitive files such as .env with secret codes is a frequent error I make.

BFG may not be your passport to make errors. The consequences may be catastrophic. But it’s there for the rescue.

As a final note, I want to say that though BFG removes files from all the branches and commits history, it doesn’t touch your last commit. This commit is also known as the HEAD.

BFG assumes the last commit is in production. Changing anything there could bring your deployment down.

BFG documentation says, “you need to fix that, the BFG can’t do it for you.”

Additionally, you are still at risk if others have a local copy of your repository.

Related: 3 Ways to Deploy Machine Learning Models in Production

Thanks for reading, friend! Say Hi to me on LinkedIn, Twitter, and Medium.

Not a Medium member yet? Please use this link to become a member because, at no extra cost for you, I earn a small commission for referring you.

How we work

Readers support The Analytics Club. We earn through display ads. Also, when you buy something we recommend, we may get an affiliate commission. But it never affects your price or what we pick.

Connect with us