4

Lets assume that I am in a directory with a lot of files. How would you search the contents of all the files in a directory and display the longest line that contains the string “ER” but not “Cheese”?

So far, to my best knowledge, I'm trying to do this in one line command.

I am thinking I need to use grep -r for recursive, in order to search through all the files in the directory but my end goal is to just display the longest line, so I assume so far it should be like:

grep -r -e "ER" 

and when I do -v "Cheese" attached to it out of small hope, it doesn't work of course.

Is this not possible with one line of command? If so, what would I need to do in multiple lines?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
jkl0619
  • 41

5 Answers5

13

Here's an awk solution:

 awk '/ER/ && !/Cheese/ {if (length($0) > maxlen) { maxline=$0; maxlen=length($0);}} END {print maxlen, maxline;}' *

(it also prints the length of the longest line, but if you don't want that, just say ... END {print maxline;}.

The advantage over the grep solution of Jeremy Dover is that it does one pass over the input. The disadvantage is that if there are multiple lines with the same max length, it only prints the first one (or the last one if you use >= to compare the lengths); the grep solution prints all of them.

NickD
  • 2,926
  • You might want to incorporate the FILENAME awk variable to identify the file containing the line – glenn jackman Sep 24 '17 at 14:04
  • 1
    You could write an awk program that prints all lines of the max length by adding lines of the max length to an awk array. Clear the array every time the max length seen increases. (Warning: memory requirement scales with number of lines of current-max length, rather than just with max line length. In most use-cases it should be fine on a typical modern computer with plenty of RAM, though. And even if it does swap, it doesn't touch a large working-set repeatedly, so it might still be better than re-reading input files, and might not cause thrashing.) – Peter Cordes Sep 24 '17 at 20:23
6

This one line will do what you ask for (for files in one directory):

awk '{l=length($0)}/ER/&&!/Cheese/&&(length($0)>l){l=length($0);line=$0}END{print(line)}' *

If there are several lines that match, this will print only the first line that contains ER, not Cheese and is longer than a previously selected line.

Also, this will scan files in the pwd (*). If you need recursion, files will need to be selected with a find command.

find . -type f -iname '*.sh' -exec sh -c 'awk '\''{l=length($0)}/ER/&&!/Cheese/&&(l>lm){lm=l;li=$0}END{print(li)}'\'' "$@"' awksh {} +

Or in several lines (for readability):

find . -type f -iname '*.sh' -exec sh -c '\
awk '\''{l=length($0)}/ER/&&!/Cheese/&&(l>lm){lm=l;li=$0}END{print(li)}'\'\
' "$@"' awksh {} +
6
awk '/ER/ && !/Cheese/ && length > m {
       m=length; d=$0; f=substr(FILENAME, 3); n=FNR
     }
     END { print m, f ":" n, d }' ./*

Assuming there's only regular files in the current directory, this will print the length of the longest line fulfilling the criteria in the question (m), along with the filename in which it was found (f), the line number (n) and the line itself (d).

The output may look something like

8 file:3 Hello ER

The longest line was 8 characters long and was found on line 3 in a file called file.

Kusalananda
  • 333,661
3

I believe the following one-liner should work:

L=`grep -h "ER" * | grep -v Cheese | wc -L`; grep -h "ER" * | grep -v Cheese | grep -P ".{$L}"

The first command finds all lines in files in the directory containing "ER" (you only need the -R option if you have subdirectories, otherwise the glob * is all you need), removes the lines with Cheese, and then finds the longest of those lines with the wc -L command.

The second command (alas) performs the search for conforming lines again, but then looks for lines of the maximum length. You may not need the -P option to grep, depending on your grep version.

  • 1
    You probably want grep -h "ER" * else you end up with the file name as part of the line (at least if there is more than one file in the directory). – NickD Sep 24 '17 at 03:09
  • Yep, will edit. In my test I was using cat *, which avoided the problem. – Jeremy Dover Sep 24 '17 at 03:15
  • 1
    Perhaps a bit liberal with the definition of 'one-liner'? I mean with ; many a large shell program can be put on one line. ;) On a more serious note, highly recommend you avoid backticks for process substitution and use $(command) style instead. L=$(grep -h ER * | grep -v Cheese | wc -L) – B Layer Sep 25 '17 at 02:13
  • Thanks for your notes. Understood re: one-liner...was thinking overall length. Re: backticks, I work daily with a lot of older systems, and understand the advantages of $() syntax, but backticks are more portable. – Jeremy Dover Sep 25 '17 at 11:04
  • 1
    Only you know what works best in your own environments but the general statement that backticks are more portable is not accurate I'm afraid. The dollar syntax is available in every modern shell and it's part of the POSIX standard while backticks have largely fallen out of favor. More: https://unix.stackexchange.com/a/48393/213782 ... I sympathize with your "legacy situation" at work but it's good to avoid exposing the many beginners here to some of the not-quite-best practices that we are forced into sometimes. :) Cheers. – B Layer Sep 29 '17 at 09:05
  • Fair enough. I will retract my statement regarding portability (though it is certainly true in my environment). However, I disagree with the statement "it's good to avoid exposing the many beginners here to some of the not-quite-best practices". "Legacy situations" are a lot more common than anybody might wish, and I would argue that exposure to deprecated, but still deployed, practices remains valuable; that said, the comments discussing it add value as well. – Jeremy Dover Sep 29 '17 at 11:07
2

One which prepends the length of the string, sorts numerically, and prints the first result's second field to get the original string back.

 grep -h ER * | grep -v Cheese | awk '{ print length($0) " " $0}' | sort -nr| head -1| awk '{print $2}'    

This approach allows you to do more sophisticated queries than "MAX" or "MIN" if you need to. Note the use of AWK. This is exactly what it is really good for.

  • You could do the first 3 steps with a single awk, using awk '/ER/ && !/Cheese/{ print length($0) " " $0}'`. (See the other awk answers). – Peter Cordes Sep 24 '17 at 20:16
  • @PeterCordes I know. The idea here is to tie small commands together in the Unix spirit instead of writing program logic. If I really wanted to program, this could easily be done in a single Perl snippet. – Thorbjørn Ravn Andersen Sep 24 '17 at 20:21