6

I'm cleaning up a CVS repo before migration to git. As part of the preparation I need to find (and possibly remove) any folders which ONLY contain an Attic folder.

My unix-fu is not strong, but here's what I attempted, which doesn't work, but hopefully conveys the intent.

shopt -s globstar
for file in **/*
do
  if [ -d "$file" ];then
    if [`ls | wc -l` == 1 && `ls Attic | wc -l` == 1]; then
      ((echo Attic-only folder))
    fi
  fi
done

The second part to this is to then find any folders (or folder chains, rather) that are empty.

For example, if /foo/bar/Attic is removed, and /foo/bar are now both empty, lets kill that part of the tree too.

Background: I'm trying to clean up a CVS repository for migration to git. CVS creates an Attic folder for deleted files. Over the last 10 years some Bad Things have happened. I am fully aware of the risks and implications. I have backed up my data and I'm working on a copy.

don_crissti
  • 82,805
  • Wouldn't removing the Attic folders break the history of the repository? – Kusalananda Aug 02 '16 at 10:21
  • In these cases, no. It's mostly when people have accidentally added a new folder and files under the wrong name, then simply deleted the entire folder and re-added under a different name. – Denham Coote Aug 02 '16 at 10:23
  • 3
    They wouldn't be part of the checkout in any case (if using cvs up -P), so I think you may be doing more harm than good with removing them. Try importing into Git without removing them and see what happens first. This may be a no-problem. – Kusalananda Aug 02 '16 at 10:26
  • What about the CVS directories? – Satō Katsura Aug 02 '16 at 10:27
  • @Rahul That simply finds ALL folders containing an attic folder. I'm only interested in folders if they contain ONLY an Attic folder. – Denham Coote Aug 02 '16 at 10:28
  • @SatoKatsura He's doing this on the repository, not the checkout. (I hope) – Kusalananda Aug 02 '16 at 10:31
  • @Kusalananda true, they may not be a part of the conversion, but as it currently stands, the conversion is taking 30+ hours. By removing stuff I know I don't want or need in the migrated version, I've already trimmed it down to a 7 hour process. Besides, this question is more about how to do it, rather than the merits of doing so :) – Denham Coote Aug 02 '16 at 10:31
  • @Rahul if a folder contains ,v files and an Attic folder, it can stay. If the only contents of a folder is the attic folder, it can go. – Denham Coote Aug 02 '16 at 10:33

7 Answers7

4

With bash, GNU find, and comm:

comm -12 \
    <( find /path/to/CVS/repo -printf '%h\n' \
        sort | uniq -u ) \
    <( find /path/to/CVS/repo -name Attic -type d -printf '%h\n' | \
        sort )

The first find prints basenames (-printf '%h\n') of everything, files and directories, in the repository. sort | uniq -u then prints directories with exactly one descendant, file or directory.

Then the second find prints the basenames of the Attic directories. The intersection of this set to the set above (i.e. comm -12) are exactly the directories with only an Attic descendant.

This of course happily ignores things like symlinks and other fun, and filenames with embedded newlines. You shouldn't have those in a CVS repo anyway.

Satō Katsura
  • 13,368
  • 2
  • 31
  • 50
  • Can this be piped into something like rm -rvf ? – Denham Coote Aug 02 '16 at 13:44
  • 1
    @DenhamCoote Sure, you can pipe the output to xargs -d '\n' rm -rvf. Or to tr '\n' '\0' | xargs -0 rm -rvf, if your xargs doesn't support -d. Both assume you don't have filenames with embedded newlines. – Satō Katsura Aug 02 '16 at 15:02
  • while I haven't seen any with newlines, I know there are filenames (and folders) with spaces in them, which breaks if I try to pipe to something like du -ch. Anyway, I shall try your suggestions. Thank you for your helpful answer. – Denham Coote Aug 02 '16 at 15:10
  • @DenhamCoote No problem with spaces. Fatal problems with newlines. – Satō Katsura Aug 02 '16 at 15:16
4

Find all Attic folders in . without any siblings, in bash:

find . -type d -name Attic -print0 | while read -d $'\0' DIR ;\
    do [[ $(ls -1 "$DIR/.." | wc -l) -eq 1 ]] && echo "$DIR" ; done

Replace echo with your favorite file handling command ;-).

Robin479
  • 335
3

The first part seems to be easiest to do with a bit of Python:

#!/usr/bin/env python

import os, sys

for topdir in sys.argv:
    for root, dirs, files in os.walk(topdir):
        if not files and len(dirs) == 1 and dirs[0] == 'Attic':
            print os.path.join(root)

Run it like this:

./script.py /path/to/CVS/repo

To delete the directories, assuming your files don't have newlines embedded in names, and assuming a cooperating xargs (i.e. one with the -d option):

./script.py /path/to/CVS/repo | xargs -d '\n' rm -rf

With a non-cooperating xargs you could modify the script to print NUL-terminated strings:

#!/usr/bin/env python

from __future__ import print_function
import os, sys

for topdir in sys.argv:
    for root, dirs, files in os.walk(topdir):
        if not files and len(dirs) == 1 and dirs[0] == 'Attic':
            print(os.path.join(root), end="\0")

Then you'd use xargs -0 to kill the directories:

./script.py /path/to/CVS/repo | xargs -0 rm -rf

To kill empty directories after that:

find /path/to/CVS/repo -depth -type d -empty -delete
Satō Katsura
  • 13,368
  • 2
  • 31
  • 50
3

With zsh:

twoormore () {                                            
set -- $REPLY/*(D[2])
(($#))
}

The function evaluates true if there's more than one item in $REPLY (D[2] selects the second item from whatever that glob expands to). It can then be used via glob qualifiers:

print -rl -- **/*(D/e_'[[ -d $REPLY/Attic ]]'_^+twoormore)

This searches recursively (**/*) for all directories (/) - including hidden ones (D) - and lists only those for which both the estring and the negated (^) function evaluate true i.e. there's a child directory called Attic and it's the only item in $REPLY.


Similarly, with find, you could run:

find . -type d -exec sh -c '
if [ -d "$0"/Attic ]; then
set -- "$0"/*
if [ $# -eq 1 ]; then
printf %s\\n "$0"
fi
fi
' {} \;
don_crissti
  • 82,805
2

Using ksh/bash:

find /cvs/myrepository_copy -type d -name "Attic" -print |
while read -r attic; do
  things=( $( dirname "$attic" )/* )
  if (( ${#things[@]} == 1 )); then
    echo rm -rf "$( dirname "$attic" )"
  fi
done

Make a copy of the whole repository, then run this (on the copy, preferably). Inspect the output with your eyes and brain and remove the echo if you think it does the right thing.

You may have to run it several times to remove higher level directories that became empty (only containing an Attic directory) upon earlier runs of the loop.

I'm uncertain how this handles exotic filenames, but as * in only used to check if there is anything other than Attic in the folder, that may not be an issue.

I take no responsibility for the loss of data.

Kusalananda
  • 333,661
2

Try this command

find $(find . -type d -exec bash -c "echo -ne '{} '; ls '{}' | wc -l" \; |  awk '$NF==1{print $1}') -name Attic -exec rm -r {} \;
1

Using a bash function f for filtering the list of Attic folders with find:

f(){ [ $(ls $(dirname $1)|wc -l|xargs echo) == 1 ] && dirname $1; }
export -f f
find . -wholename "*/Attic" -type d -exec bash -c 'f "$0"' {} \;

xargs echo is used for trimming the string returned by wc -l (may not be needed on some systems).

Can also be written as a oneliner by separating the above line with semicolons.

Alexander
  • 9,850