4

I need a way to search directories for child directories with the same name and then move all files in the child directory to the parent. Thus from /recup-dir1/recup-dir1/files to /recup-dir1/files. The child directories can be left empty because i can use something like find . -type -d -empty -delete to delete all empty dirs

So the problem is i have no idea in which directories there are the child directories with the same name and in which there are not.

In pseudo code i need something like this.

While more directories are unchecked
get name-x of  next dir
   enter dir  
   If name-x/name-x exist
   move all files in name-x/name-x to name-x
   mark dir as done
next 

My best guess is to create a little python script to make a list of all directories which have a child with the same name and loop this list throug a command like find something something -exec mv

Maybe this could be done with bash scripting or another solution exists. Like some rsync command, however since i created this mess probably with rsync i don't think that will be the solution.

Edit: here is an actual part of the tree output: The toplevel dirs are inside /mnt/external-disk/tst-backup There are no sub-dirs on lower levels.

│   └── recup_dir.1
├── recup_dir.10
│   └── recup_dir.10
├── recup_dir.100
│   └── recup_dir.100
├── recup_dir.102
│   └── recup_dir.102
└── recup_dir.1020
    └── recup_dir.1020
  • I'd think about using the tree -d command - with extra parameters; for example is an XML file useful? – Jeremy Boden Jun 03 '21 at 17:39
  • Thanks @JeremyBoden. I looked into the three manpage an constructed these commands: $ tree -d -P recup.dir* --prune -o /home/tom/tmp-backup2.json -J and $ tree -d -P recup.dir* --prune -o /home/tom/tmp-backup2.xml -X Next step is editing those files to remove all unaffected directories en somehow loop through one of them an start moving the files. – TomDerks Jun 04 '21 at 08:05
  • ... BTW, 2 questions: 1) once you have moved file from /path/to/dirname/**/dirname/ to/path/to/dirname, why do you want to do with the emptied subdirectory /path/to/dirname/**/dirname/ ? 2) what if among moved files some have names that are identical to files in the destination dir ? What to do ? – Cbhihe Jun 04 '21 at 13:44
  • Hi @Cbhihe thanks for the welcome. My answer to your questions: Yes i will post the answer when i have a solution
    1. All empty dirs may deleted or moved to /dev/null.
    – TomDerks Jun 05 '21 at 09:26
  • To slow to edit so here is the full comment: Hi @Cbhihe thanks for the welcome. My answer to your questions: Yes i will post the answer when i have a solution (i have not at the moment )
    1. All empty dirs may deleted or moved to /dev/null.
    2. If a file with the same name exists then the moved file should be renamed with an extra number on the the end. 'file_a.txt exists so new name is file_a-1.txt'
    – TomDerks Jun 05 '21 at 09:33

2 Answers2

3

With zsh, you could do:

for dir in **/*(NDodoN/e['[[ $REPLY:t = $REPLY:h:t ]]']); do
  contents=($dir/*(NDoN))
  (( $#contents == 0 )) ||
    mv -- $contents $dir:h/ &&
    rmdir -- $dir
done

Where:

  • **/*(qualifiers) recursive globbing with glob qualifiers
  • N: nullglob: don't complain if there's no match
  • D: dotglob: include hidden files
  • od: order depth first (leaves before the branches they're on).
  • oN: don't bother with ordering the list of files otherwise.
  • /: restrict to files of type directory.
  • e['expression']: restrict to files for which the expression code returns true (inside which the current file path is stored in $REPLY).
  • $REPLY:t: tail (basename) of the file
  • $REPLY:h:t: tail of the head (dirname) of the files)

With bash 4.4+ and GNU find or the find or most BSDs, you could do something similar with:

shopt -s nullglob dotglob
readarray -td '' dirs < <(
  LC_ALL=C find . -depth -regex '.*\(/[^/]*\)\1' -type d -print0
)
for d in "${dirs[@]}"; do
  contents=("$d"/*)
  (( ${#contents[@]} == 0 )) ||
    mv -- "${contents[@]}" "${d%/*}/" &&
    rmdir -- "$d"
done

This time using a regular expression to match the ./path/to/dir/dir files using basic regular expression back-references.

  • Thank you @Stéphane. Those are both very compact solutions. The zsh one is a bit hard to read for me but your explanation is very helpful. I go test these solutions later today. – TomDerks Jun 06 '21 at 19:22
  • Both the bash as the zsh version do work. However bc I do not quite understand both bash language i will be very carefull to use these. I saved both solutions as a zsh or bash script and put some echo statements in to track progress. – TomDerks Jun 06 '21 at 23:58
2

Try this, based on GNU find v4.8.0 and Bash v5.1.8

Part 1: Parse directory tree + detect sub-dir name dupes

Assume that a certain directory in your tree has the following structure:

./
|__test1/
     |__dirname with space
     |           |__test2
     |                |__ test2
     |__dirname **
     |       |__test1
     |
     |__reboot
     |     |__test1
     | 
     |__test2/
          |__test3/
               |__test2/
                    |__test1/
                         |__test1/

(Strange directory names are there to demonstrate code safety.)

You see that some sub-directories (subdirs) are repeated in different ways. Some are repeated multiple times, not just once (e.g. test1), one is not repeated (test3), and they can be repeated either as parent and child or separated by an arbitrary number of intermediate subdirs.

The code below reveals subdir name dupes in a directory structure in a detailed way.

  • it parses the file tree for the subdir structure starting from $PWD
  • it finds dupes for each components of any subdir path of 2 or more levels, not counting the root level which is $PWD. In my experiment, the longest subdir path is: ./test1/test2/test1/test3/test2/test1/test1
  • it prints the first subdir dupe found at each subdir level, starting from the leaf, i.e. reading the subdir path right to left.
  • printing is redirected toward a file, in reverse order so the longest subdir path is shown first. Two consecutive semicolons separates the path components (to the left of the ";;"), from first dupe (to the right of ";;") found according to the previous bullet.

[Code]

$ find ./* -type d -exec bash -c 'set -o noglob; IFS="/" subdir=($(printf "%s " "$1")); dirlevels=$((${#subdir[@]}-1)); dupe="$(awk '\''!($1 in sd) {sd[$1];next} {print $1}'\'' < <(printf "%s\n" ${subdir[@]:1}))";[ $dirlevels -ge 2 ] && [ ! -z "$dupe"  ] && (printf "%s/" "${subdir[@]:1}";printf " ;; %s\n" "$(tail -n 1 < <(printf "%s\n" "$dupe"))";)' shellexec {} \; | tac >| tmp.data

$ cat -n tmp.data

1 test1/reboot/test1/ ;; test1 2 test1/dirname with space/test2/test2/ ;; test2 3 test1/test2/test1/test3/test2/test1/test1/ ;; test1 4 test1/test2/test1/test3/test2/test1/ ;; test1 5 test1/test2/test1/test3/test2/ ;; test2 6 test1/test2/test1/test3/ ;; test1 7 test1/test2/test1/ ;; test1 8 test1/dirname **/test1/ ;; test1

Part 2: Processing of subdir name dupes; moving contents

Processing takes place in the order displayed in tmp.data.

  • on tmp.data's first line, the first name dupe on the path ./test1/test2/test1/test3/test2/test1/test1 is test1. We can transfer its contents to the left most subdir level with the same name: ./test1/
  • once the contents have been moved with no clobbering of existing files at destination, the right most subdir level test1 is deleted.
  • we go on to line 2 of tmp.data and repeat the above steps.
  • etc until all lines in tmp.data have been consumed.

At this stage the question (to the question's author: @TomDerks) is what to do with the right-most test1/* on line 6 ? Should all its contents be moved to the left-most directory with the same name, which in this case is the first subdir level on the path ? Does "all" includes files in ./test1/test2/test1/ as well as the subdirectory test3 and its contents ?
The complete solution (Part 2) hinges on that.

Cbhihe
  • 2,701