I'm interested in accurately removing a git repository in a reasonable time.
But it takes quite a while to do so. Here, I have a small test repo where the .git
folder is < 5MiB.
$ du -ac ~/tmp/.git | tail -1
4772 total
$ find ~/tmp/.git -type f | wc -l
991
Using shred
's default options, this takes quite long. In the next command I use --force
to change permissions and --zero
to overwrite with zeros after shredding. The default shredding method is to overwrite with random data three times (-n3
).
I also want to remove the files afterwards. According to man shred
, --remove=wipesync
(the default, when --remove
is used) only operates on directories, but this seems to slow me down even when I operate only on files. Compare (each time I reinitialized the git repo):
$ time find ~/tmp/.git -type f | xargs shred --force --zero --remove=wipesync
real 8m18.626s
user 0m0.097s
sys 0m1.113s
$ time find ~/tmp/.git -type f | xargs shred --force --zero --remove=wipe
real 0m45.224s
user 0m0.057s
sys 0m0.473s
$ time find ~/tmp/.git -type f | xargs shred --force --zero -n1 --remove=wipe
real 0m33.605s
user 0m0.030s
sys 0m0.110s
Is there a better way to do it?
EDIT: Yes, encryption is the key. I'm now just adding two more benchmarks using -n0
.
time find ~/tmp/.git -type f | xargs shred --force --zero -n0 --remove=wipe
real 0m32.907s
user 0m0.020s
sys 0m0.333s
Using 64 parallel shreds
:
time find ~/tmp/.git -type f | parallel -j64 shred --force --zero -n0 --remove=wipe
real 0m3.257s
user 0m1.067s
sys 0m1.043s
git repack
(without any options) in this test repo and the number of files in the.git
folder increased from 684 to 688. – Sebastian Jul 09 '14 at 09:42git repack -a -d -g
and looking at the documentation reveals that -a repacks everything into a new big pack, -d deletes old unneeded packs, and -f recomputes deltas instead of reusing deltas from old packs. – ptman Jul 09 '14 at 09:59-g
instead of-f
). This reduces the file count to 42. A coincidence? ;-)man git-repack
says: Packs are used to reduce the load on mirror systems, backup engines, disk storage, etc.. Why do you use it? – Sebastian Jul 09 '14 at 10:04