Skip to main content

Removing traces from our repository history

Filed Under: Drupal Planet

In our daily work we all make mistakes in our git commits. Sometimes this errors can easily be repaired just by reverting our commits. But if we are working in a public repository and we have accidentally pushed some sensitive information, we now have a problem.

That sensitive information is in our repository history and anybody who has the enough time to explore can gain access to that. Our clients or even ourselves are now dealing with a privacy issue.

We can always try to repair that commit in our local environment and push our code again using the --force parameter. But we know, when you do that, a kitten dies. And if your team members already pushed something, everything in the repository will be messed up.

So the best option is to try and fix this in a more elegant way that allow us to erase all the traces of our mistake, but preserves repository integrity.

Git provides the filter-branch command, but sometimes this powerful tool becomes too complicated and slow. In trying to find an easier way to do it, finally came across the BFG Repo-Cleaner.

This tool is an alternative to git filter-branch that provides a faster and easier way to clean git repositories. It is written in Java, so you need to make sure you have JRE 6.0 or above installed. To clean your repository you only have to follow the steps below:

Clone your repository using the --mirror option. Beforehand, you should repair manually your mistakes in the repository.

Now, download BFG and execute it against your cloned repository.

This step will remove all the blobs bigger than 1MB from your repository.

Once the index has been cleaned, examine your repository's history and then use the standard git gc command to strip out the unwanted dirty data, which Git will now recognise as surplus to requirements:

Finally, once you're happy with the updated state of your repo, push it back up

If everything went well, your repository won't include any of the accidentally committed files.

Here you have some common examples to use with Drupal:
Delete all files named 'id_rsa' or 'id_dsa' :

Delete database dumps:

Delete files folder:

We have to remark that BFG assumes that you have repaired your repository before executing it. You need to make sure your current commits are clean. This protects your current work and gives you peace of mind knowing that the BFG is only changing your repo history, not meddling with the current files of your project.

Finally, here you have some useful related links:

We'd love to partner with you on your next project!

Since we’re big on relationships, we’re all about finding the right fit. Will you take the next step with us to see if we’re a match?