Here is an obscure but practical use for rev
.
I once had a large tree of files I needed to archive. I planned to use tar
and then gzip
, in the usual way, but the resulting tarball was still too big — it wouldn't fit on the disk I needed it to, or something.
But there was a lot of redundancy in the data — there were several directories containing large and nearly-identical files, although these directories were rather widely separated in the tree hierarchy. For example, there might have been things like
a/b/c/d/efg
a/b/c/d/hij
a/b/c/d/klm
n/o/p/q/r/s/t/u/efg
n/o/p/q/r/s/t/u/hij
n/o/p/q/r/s/t/u/klm
v/w/efg
v/w/hij
v/w/klm
So I took the full list of files to be archived and ran it through
rev | sort | rev
This brought all the same-named files together, like this:
a/b/c/d/efg
n/o/p/q/r/s/t/u/efg
v/w/efg
a/b/c/d/hij
n/o/p/q/r/s/t/u/hij
v/w/hij
a/b/c/d/klm
n/o/p/q/r/s/t/u/klm
v/w/klm
Then I used tar -T
to archive my list of files in the regrouped order, then gzipped. My hope was that gzip's algorithm would be able to do a better job of compressing the multiple copies of the big and similar files when they were next to each other, meaning that gzip would have its rolling dictionaries best optimized for reuse across the several copies.
To my pleasant surprise, this made a huge difference. I don't remember the exact numbers, but I think I got about 3× better compression, and the final result fit on to the target disk easily.