List (or move) only files with a certain number of lines?

Question

Dealing with a whole bunch of two-line config files, I'd like a way to exclude any files that have a different number of lines.

So, something like:

mv * destdir only if file contains exactly two lines

Or:

wc -l * | grep '^ *2' | xargs mv {} destdir

Except that neither of those is actual working code.

While writing this I realized I do have a way to do this, which is ugly as heck, and I've included it below as an answer.

Is there an easy/clean way to do this?

don_crissti · Answer 1 · 2015-11-26T20:24:52.477

4

You could use awk, exit on line 3 (the END rule is still executed) and exit 1 in the END block if no. of lines is not 2 e.g. with zsh:

print -rl -- *(.e_'awk "NR==3{exit}END{if(NR!=2){exit 1}}" $REPLY'_)

will list two-line files in the current directory; replace print -rl with mv and add the destination if you want to move them.
With other shells:

for file in ./*; do [[ -f $file ]] && \
awk 'NR==3{exit};END{if(NR!=2){exit 1}}' "$file" && mv "$file" "$dest"
done

Other ways, e.g. with z shell
and gnu awk:

awk 'ENDFILE{if(FNR==2){print FILENAME}}' ./*(.)

or gnu sed (v. 4.2.2 or later):

sed -ns '2{$F}' ./*(.)

to list the two-line files¹ and e.g.:

for f (./*(.))
sed -n '2{$Q 1};3q' $f || mv $f $dest

to move them.

^{1: those would both go through the whole input so not really suited if you're working with huge files; in that case, you may want to sed -n '2{$F};3q' for each file or use the first awk solution}

edited Nov 26 '15 at 20:24

answered Nov 26 '15 at 02:44

don_crissti

82,805

What does F do in sed? I can't find documentation on it anywhere. Or is that a typo? – Wildcard Nov 26 '15 at 19:44
Using GNU sed 4.2.1, sed F filename returns unknown command: 'F'. And looking through the entirety of info sed I don't see it mentioned anywhere. Did you test it? What version of sed are you using? EDIT: Ah, I see the link in your comment now. I suspected that's what it was supposed to do...but it doesn't work on my CentOS 6 vagrant box. – Wildcard Nov 26 '15 at 20:19
1

@Wildcard - yes, my bad for not specifying that F was added in sed 4.2.2 – don_crissti Nov 26 '15 at 20:22

score 4 · Accepted Answer · edited Apr 13 '17 at 12:36

Your kludgy solution isn't too bad for starters... you are just missing the fact that not only can awk give you the number of lines, you can also instruct it to exit with the right status code so that you can then chain it with the cp command:

for file in * ; do awk 'NR==3{exit}END{exit NR!=2}' "$file" && cp "$file" /tmp; done

NR is the number of records, and as suggested in @don_crissti's answer, we can use the NR==3 check to stop further processing once we encounter a third line.

NR!=2 looks funny, because awk's true/false values are 1/0, but in the shell, we need 0 to represent a success status for && to work correctly. The inverse of that works too (depending on how strongly do you react to seeing !=):

for file in * ; do awk 'NR==3{exit}END{exit NR==2}' "$file" || cp "$file" /tmp; done

I like this solution because (a) it doesn't assume the filenames are sane (b) it is POSIX compatible. (Also it doesn't depend on zsh, which I know nothing about.) — Wildcard, Dec 04 '15 at 06:33

mikeserv · Answer 3 · 2015-11-27T00:15:39.277

if your filenames are fairly sane, and you can delimit on both : and a newline, then:

grep -m3 '' ./* ./*/* |
cut -d: -f1 | uniq -c |
grep -v '^ *[13] '

^that command will list all not-dot files in the current directory and in all immediate child directories which contain only two lines.

You don't really need to worry about sorting for uniq, because globs are sorted. I use the GNU -max match option because it is much faster if grep quits at the third input line than it is if it continues through to the end, but it will work without it as well. The idea is to get grep to print the filenames for each line they contain, then to count the occurences of each filename in its output, and then to filter out anything more or less than 2.

I ran it against some random source code dirs, and, of all of them, I had two files which contained only the two lines:

  2 ./dex/coll.sh
  2 ./jimtcl/jim-config.h.in

it would be neater to replace the last line with:

... |
sed -ne's/^ *2  *//p'

...though.

Assuming sane filenames in any context is something I dislike. But yes, if you're comfortable assuming that (and you're doing it interactively, not scripting it), then this is good. — Wildcard, Dec 04 '15 at 06:29
@Wildcard - if you're not comfortable with it, you can do find . ! -type d ! -path "*[:$IFS]*" -exec ... {} + and then invert the selection with a more conservative approach for a second run. Just make sure $IFS is set to a default value first, or drop the space and tab if you like. — mikeserv, Dec 04 '15 at 06:51

score 0 · Answer 4 · answered Nov 26 '15 at 01:13

0

I worked out the following kludgy solution:

for file in * ; do if [ "$(wc -l "$file" | awk '{print $1}')" == "2" ] ; then cp "$file" /tmp/ ; fi; done

There must be a better way which doesn't start two processes for every single file in the current directory.

answered Nov 26 '15 at 01:13

Wildcard

36,499

2

If process count is a concern, you can save a process by doing if [ -r "$file" ] && [ $(wc -l < "$file") -eq 2 ] – Mark Plotnick Nov 26 '15 at 07:08

RobertL · Answer 5 · 2015-11-26T02:05:21.737

0

Using the -t target directory option on mv with xargs:

wc -l * | sed -n 's/^[[:space:]]*2[[:space:]]\+//p'  | xargs mv -t "$DESTDIR"

edited Nov 26 '15 at 02:05

answered Nov 26 '15 at 01:57

RobertL

6,780

1

fwiw, this won't work if the file names contain spaces or other funky chars. – don_crissti Nov 26 '15 at 02:06

List (or move) only files with a certain number of lines?

5 Answers5

Linked

Related