I'm interested in accurately removing a git repository in a reasonable time.
But it takes quite a while to do so. Here, I have a small test repo where the .git folder is < 5MiB.
$ du -ac ~/tmp/.git | tail -1
4772 total
$ find ~/tmp/.git -type f | wc -l
991
Using shred's default options, this takes quite long. In the next command I use --force to change permissions and --zero to overwrite with zeros after shredding. The default shredding method is to overwrite with random data three times (-n3).
I also want to remove the files afterwards. According to man shred, --remove=wipesync (the default, when --remove is used) only operates on directories, but this seems to slow me down even when I operate only on files. Compare (each time I reinitialized the git repo):
$ time find ~/tmp/.git -type f | xargs shred --force --zero --remove=wipesync
real 8m18.626s
user 0m0.097s
sys 0m1.113s
$ time find ~/tmp/.git -type f | xargs shred --force --zero --remove=wipe
real 0m45.224s
user 0m0.057s
sys 0m0.473s
$ time find ~/tmp/.git -type f | xargs shred --force --zero -n1 --remove=wipe
real 0m33.605s
user 0m0.030s
sys 0m0.110s
Is there a better way to do it?
EDIT: Yes, encryption is the key. I'm now just adding two more benchmarks using -n0.
time find ~/tmp/.git -type f | xargs shred --force --zero -n0 --remove=wipe
real 0m32.907s
user 0m0.020s
sys 0m0.333s
Using 64 parallel shreds:
time find ~/tmp/.git -type f | parallel -j64 shred --force --zero -n0 --remove=wipe
real 0m3.257s
user 0m1.067s
sys 0m1.043s
git repack(without any options) in this test repo and the number of files in the.gitfolder increased from 684 to 688. – Sebastian Jul 09 '14 at 09:42git repack -a -d -gand looking at the documentation reveals that -a repacks everything into a new big pack, -d deletes old unneeded packs, and -f recomputes deltas instead of reusing deltas from old packs. – ptman Jul 09 '14 at 09:59-ginstead of-f). This reduces the file count to 42. A coincidence? ;-)man git-repacksays: Packs are used to reduce the load on mirror systems, backup engines, disk storage, etc.. Why do you use it? – Sebastian Jul 09 '14 at 10:04