3

Is there a Unix utility to restrict a file or directory listing to given minimum and maximum number of levels deep?

I have a git command which displays a unique listing of the directories where files have changed,git diff --name-only HEAD~3 HEAD~0 | sed 's|/[^/]*$||' | uniq and I want to limit the output further to those a certain number of levels deep. Is there a way to accomplish that?

Here is an example from a Drupal installation. After I install or update modules the git command shows which modules were added or changed. The module directory is the 4th level so any deeper directories should be cut off and duplicates should be removed from the 4th level directoris

sites/all/modules/table_trash
sites/all/modules/table_trash/css
sites/all/modules/table_trash/drush
sites/all/modules/table_trash/js
sites/all/modules/table_trash/libraries
sites/all/modules/table_trash/libraries/variants/js
sites/all/modules/table_trash
sites/all/modules/video_filter
sites/all/modules/views_aggregator
sites/all/modules/views_aggregator/views
sites/all/modules/views_aggregator
sites/all/modules/views_aggregator/views_aggregator_more_functions
sites/all/modules/views_php
sites/all/modules/views_php/plugins/views
sites/all/modules/views_php
sites/all/modules/views_watchdog
sites/all/modules/views_watchdog/views/handlers
sites/all/modules/views_watchdog/views/plugins
sites/all/modules/views_watchdog/views/theme
sites/all/modules/views_watchdog/views
sites/all/modules/views_watchdog

It appears to be a general case of applying grep command which strips out all lines after the Nth appearance of a separator character, '/' in this instance, but I am looking for something that also removes duplicates from the output, which is why in this case I need to pass it through the uniq command. In this case both the maximum and minimum number of levels should be 4.

vfclists
  • 7,531
  • 14
  • 53
  • 79

2 Answers2

2

You could use awk to do this - and in the process eliminate the need for sed and uniq. Set limit appropriately

git diff --name-only HEAD~3 HEAD~0 | 
awk -vlimit=3 -F'/' -vOFS='/' -- '--NF == limit && !x[$0]++'
iruvar
  • 16,725
1
diff --name-only HEAD~3 HEAD~0 | sed -ne 's|/||5;t' -e 's||/|4p' | sort -u

Your command line practically does it already. You can target [num]th occurrence of a pattern with a sed s///ubstitution command by just adding the [num] to the command. When you test for a successful substitution and don't specify a target :label, the test branches out of the script. This means all you have to do is test for s///5 or more slashes, then print what remains.

Or, at least, that handles the lines which exceed your maximum. Apparently you also have a minimum requirement. Luckily, that is just as simple:

sed -ne 's|/||5;t' -e 's||/|4p'

...just replace the 4th occurrence of / on a line with itself and tack your print on to the s///ubstitution flags. Because any lines matching / 5 or more times have already been pruned, the lines containing 4 / matches contain only 4.

For uniques uniq won't work unless its input is sorted - so you might as well use sort -u anyway.

Run on your example data the sed ... | sort pipeline prints:

sites/all/modules/table_trash/css
sites/all/modules/table_trash/drush
sites/all/modules/table_trash/js
sites/all/modules/table_trash/libraries
sites/all/modules/views_aggregator/views
sites/all/modules/views_aggregator/views_aggregator_more_functions
sites/all/modules/views_watchdog/views
mikeserv
  • 58,310