Grep command that finds/excludes all line where a separator character appears a certain number of times?

Question

Is there a Unix utility to restrict a file or directory listing to given minimum and maximum number of levels deep?

I have a git command which displays a unique listing of the directories where files have changed,git diff --name-only HEAD~3 HEAD~0 | sed 's|/[^/]*$||' | uniq and I want to limit the output further to those a certain number of levels deep. Is there a way to accomplish that?

Here is an example from a Drupal installation. After I install or update modules the git command shows which modules were added or changed. The module directory is the 4th level so any deeper directories should be cut off and duplicates should be removed from the 4th level directoris

sites/all/modules/table_trash
sites/all/modules/table_trash/css
sites/all/modules/table_trash/drush
sites/all/modules/table_trash/js
sites/all/modules/table_trash/libraries
sites/all/modules/table_trash/libraries/variants/js
sites/all/modules/table_trash
sites/all/modules/video_filter
sites/all/modules/views_aggregator
sites/all/modules/views_aggregator/views
sites/all/modules/views_aggregator
sites/all/modules/views_aggregator/views_aggregator_more_functions
sites/all/modules/views_php
sites/all/modules/views_php/plugins/views
sites/all/modules/views_php
sites/all/modules/views_watchdog
sites/all/modules/views_watchdog/views/handlers
sites/all/modules/views_watchdog/views/plugins
sites/all/modules/views_watchdog/views/theme
sites/all/modules/views_watchdog/views
sites/all/modules/views_watchdog

It appears to be a general case of applying grep command which strips out all lines after the Nth appearance of a separator character, '/' in this instance, but I am looking for something that also removes duplicates from the output, which is why in this case I need to pass it through the uniq command. In this case both the maximum and minimum number of levels should be 4.

score 2 · Answer 1 · answered Feb 03 '15 at 18:00

2

You could use awk to do this - and in the process eliminate the need for sed and uniq. Set limit appropriately

git diff --name-only HEAD~3 HEAD~0 | 
awk -vlimit=3 -F'/' -vOFS='/' -- '--NF == limit && !x[$0]++'

answered Feb 03 '15 at 18:00

iruvar

16,725

passing the output to grep command will be much easier for me. I tried your command and got no output. Can you explain it better? – vfclists Feb 03 '15 at 18:13
@vfclists, what flavour of UNIX are you on? – iruvar Feb 04 '15 at 17:59
Ubuntu trusty and precise – vfclists Feb 04 '15 at 19:17
@vfclists, please paste in some output from the git command into your question – iruvar Feb 07 '15 at 02:57
I have updated the question with some sample input – vfclists Feb 09 '15 at 00:01
@vfclists, it might have to do with mawk being the default awk on Ubuntu. If you have the ability to install GNU awk(aka gawk), you might want to do that and give it a twirl – iruvar Feb 09 '15 at 14:22

mikeserv · Answer 2 · 2015-02-09T00:48:49.027

diff --name-only HEAD~3 HEAD~0 | sed -ne 's|/||5;t' -e 's||/|4p' | sort -u

Your command line practically does it already. You can target [num]th occurrence of a pattern with a sed s///ubstitution command by just adding the [num] to the command. When you test for a successful substitution and don't specify a target :label, the test branches out of the script. This means all you have to do is test for s///5 or more slashes, then print what remains.

Or, at least, that handles the lines which exceed your maximum. Apparently you also have a minimum requirement. Luckily, that is just as simple:

sed -ne 's|/||5;t' -e 's||/|4p'

...just replace the 4th occurrence of / on a line with itself and tack your print on to the s///ubstitution flags. Because any lines matching / 5 or more times have already been pruned, the lines containing 4 / matches contain only 4.

For uniques uniq won't work unless its input is sorted - so you might as well use sort -u anyway.

Run on your example data the sed ... | sort pipeline prints:

sites/all/modules/table_trash/css
sites/all/modules/table_trash/drush
sites/all/modules/table_trash/js
sites/all/modules/table_trash/libraries
sites/all/modules/views_aggregator/views
sites/all/modules/views_aggregator/views_aggregator_more_functions
sites/all/modules/views_watchdog/views

Grep command that finds/excludes all line where a separator character appears a certain number of times?

2 Answers2

Linked