1

How can I show the contents of a text file, with every occurrence of a word highlighted?

2 Answers2

3
grep -C $(wc -l foo.txt | cut -d" " -f 1) --color word foo.txt

Where foo.txt is the file path and word is the search term. If you are not familiar with grep, beware the search term is actually a regular expression. This doesn't matter if word is purely alphanumeric, but certain characters such as [ and ] have a special meaning in a regular expression and if you want to use them literally, prefix \. Also, this passes through the shell, so $ will do funny things if not also escaped with \.

You can include spaces in word if you enclose it in quotes.

The other commands used here are wc and cut, which will almost always be stock on a POSIX system such as GNU/Linux. They have man pages that explain their use.

goldilocks
  • 87,661
  • 30
  • 204
  • 262
  • 1
    One advantage of this general method is that the -F option may be added; then word is interpreted as literal text to match rather than a regular expression. (The primary method given in my answer--and jacksonh's answer to the related question--don't readily support that enhancement.) An alternative to parsing the output of wc is to prevent it from ever knowing the filename by using input redirection instead of a path argument: $(wc -l < foo.txt) – Eliah Kagan Jan 26 '17 at 15:53
  • 1
    @EliahKagan - will all due respect to goldilocks, you don't need to read the file twice, count the lines and use a non-standard option like -C just to be able to grep for fixed strings and print the non-matching lines too. It's much simpler than that. – don_crissti Jan 26 '17 at 23:58
3

You can do this with just grep. Where file.txt is your file and word is the text you want to highlight, you can use:

grep --color -E 'word|$' file.txt

This matches and highlights occurrences of word. In addition, it alternatively (|) matches and highlights the empty string at the end of each line ($). But since that string is empty, no extra text is actually highlighted.

Matching $ in addition to word serves the purpose of ensuring every line contains a match. Then grep prints every line, even without the -A, -B, or -C options or any calls to other utilities.

The -E flag makes grep interpret its pattern argument as a POSIX extended regular expression (ERE). Depending on what characters are in word, you may want to make it use a basic regular expression (BRE) instead. Although alternation with | is not officially part of POSIX basic regular expressions, grep implementations often support it as an extension with \|:1

grep --color 'word\|$' file.txt

In particular, on GNU/Linux systems you have GNU Grep, which supports this. If you are relying on --color, you can likely rely on this behavior too.

These commands are simpler than the way in goldilocks's answer. But the technique in that answer does have a distinct advantage in some circumstances. Since the methods here use | and $, they have to really be regular expressions, rather than fixed strings. However, with goldilocks's method, you can add the -F flag. Then word can contain whatever text you like, even \, provided you quote word properly to ensure the shell passes it to grep unmodified.

For example, you can use:

grep --color -FC "$(wc -l < file.txt)" 'word' file.txt

For further reading, see Convince grep to output all lines, not just those with matches (as steeldriver suggested).


1As far as the standard (IEEE Std 1003.1-2008) specifies, alternation is a feature of ERE (via |), but it is not a feature of BRE. But implementations are permitted to interpret \| as they wish, and many interpret it as alternation:

Some implementations have extended the BRE syntax to add alternation. For example, the subexpression "\(foo$\|bar\)" would match either "foo" at the end of the string or "bar" anywhere. The extension is triggered by the use of the undefined "\|" sequence.

Eliah Kagan
  • 4,155
  • @Fox That was unclear--thanks for noticing and drawing my attention to this. I've reworded and added an explanatory footnote. In addition to the syntax for alternation varying between dialects, my understanding is BRE--so far as it's formally defined--really doesn't have alternation. The \| syntax is allowed as a non-standard extension to BRE and widely implemented. Please let me know if you believe I'm mistaken on this or if the post remains unclear. (Of course, as you say, alternation is not a vendor-provided extension of ERE--every conforming ERE implementation must implement it.) – Eliah Kagan Jan 26 '17 at 18:42
  • 1
    BRE (as defined by the standard) are a subset of true regular expressions. The update and footnote are great in explaining that – Fox Jan 26 '17 at 18:46