27

I need to search for a keyword using awk, but I want to perform a case-insensitive (non case sensitive) search.

I think the best approach is to capitalize both the search term ("key word") and the target line that awk is reading at the same time. From this question I how to use toupper to print in all uppercase, but I don't know how to use it in a match because that answer just shows printing and doesn't leave the uppercase text in a variable.

Here is an example, given this input:

blablabla    
&&&Key Word&&&
I want all 
these text and numbers 123
and chars !"£$%&
as output
&&&KEY WORD&&&
blablabla

I'd like this output:

I want all 
these text and numbers 123
and chars !"£$%&
as output

This is what I have, but I don't know how to add in toupper:

awk "BEGIN {p=0}; /&&&key word&&&/ { p = ! p ; next } ; p { print }" text.txt
Sadegh
  • 599

2 Answers2

30

Replace your expression to match a pattern (i.e. /&&&key word&&&/) by another expression explicitly using $0, the current line:

tolower($0) ~ /&&&key word&&&/

or

toupper($0) ~ /&&&KEY WORD&&&/

so you have

awk 'tolower($0) ~ /&&&key word&&&/ { p = ! p ; next }; p' text.txt

You need single quotes because of the $0, the BEGIN block can be removed as variables are initialised by default to "" or 0 on first use, and {print} is the default action, as mentioned in the comments below.

meuh
  • 51,383
  • 4
    Note that you could simplify that to awk 'toupper($0)~/&&&KEY WORD&&&/ { p = ! p ; next } ; p;' text.txt. There's no need for the BEGIN block and since the default action is to print, p; is enough. – terdon Apr 01 '16 at 12:35
  • 1
    "There's no need for the BEGIN block" since an uninitialized variable evaluates as false. – glenn jackman Apr 01 '16 at 13:44
  • Thanks for the optimisations. I usually try to limit my reply to minimal changes to the original, but it's true the new result is much tighter and quite readable. – meuh Apr 01 '16 at 14:06
  • 2
    Just a note: tolower is present on ancient (or not so ancient) awk versions (ex: AIX) systems, but toupper is not always available ^^. – Olivier Dulac Apr 01 '16 at 20:20
22

gawk has an IGNORECASE builtin variable, which, if set to nonzero, causes all string and regular expression comparisons to be case-insensitive. You could use that:

BEGIN{IGNORECASE=1}
/&&&key word&&&/ { foo bar baz }

etc. This is specific to gawk, though, but I find it to be more readable than the (more portable) alternative by meuh. Whether that's a problem is, of course, fully up to you.

  • 1
    I've wanted to support awk for years in one of my biggest gawk projects, but the lack of case insensitive search triggers that gawk has made it a non starter due to the number of case insensitive searches the stuff runs. gensub is the other gawk only feature that was too hard to replace in awk. But gawk isn't always installed by default on some machines and distributions, though it's almost always available, but it's unfortunate that by 2016 they couldn't change awk and posix to expand the functionality of such standard tools a bit. – Lizardx Apr 01 '16 at 19:30
  • 4
    @Lizardx: that's the whole point of not expanding: keep it standard. Otherwise you just create another standard, and then you have some incompabilities between them (they do that, but try to keep the standard changes to a minimum ... even then, multiple standard is one of the main plagues of computing) – Olivier Dulac Apr 01 '16 at 20:22
  • 2
    I don't agree. With careful execution, you can introduce expansions while supporting all legacy methods, what happens if you fail to do that is the stuff just starts to fade away into irrelevance over time. Everything in computing evolves, the trick is to maintain a very stable reliable evolution. Bash is a good example of doing that, super reliable and simply add new features, it's not 'two standards' so much as, use what is supported, and once the changes have rolled out globally you can start using the new features because only the oldest legacy systems won't have support. – Lizardx Apr 01 '16 at 20:46
  • @Lizardx How is that different from what gawk is doing? They expand and add new features while maintaining backwards compatibility (i.e. old programs run unchanged). It's just other awks ignore this development and don't adopt new features from gawk. Yes, POSIX could update the standard, but then the new POSIX standard would break old programs. So the only solution is for everyone to move to gawk, or else we will be stuck in the past forever. Note that mawk adopted a few things from gawk. – jena Apr 26 '22 at 22:32
  • jena, the problem is that the more refusal to do incremental development the more irrelevant POSIX becomes. The notion that you must maintain functionality from 20-40 year old operating systems is absurd, nothing in computing works that way, you evolve, or you die. The BSDs (with the exception of OpenBSD) are failing to advance, and losing market share as a result. It's like saying C can't change, which would simply mean that C would vanish as a viable language. Posix was relevant in the days of fragmented unix systems. Linux was the solution to that fragmentation, which is why it took over.. – Lizardx Apr 27 '22 at 01:23
  • And I speak as someone who maintains active support for 15+ year old operating systems, aggressively active support, far more than 99+% of people you might run across and old legacy hardware, but I NEVER rely on posix for anything related to that, I just use a subset of features that always work for all tools I use, and I use the right languages, so they work then, and they work today. That's unrelated to posix, that's just using the right tools for the job. Virtually nobody does this type of thing anymore, and if you want to, use subsets of the language or tools, it's not rocket science. – Lizardx Apr 27 '22 at 01:27
  • As an aside, anyone who has tried doing realworld support for BSDs and Linux, all generations of Linux going back to 2.4 kernel at least, would know that posix is a total joke, you can't rely on any output from any tools, particularly not those made by the BSDs today, everything has to be tested and verified, only the most pointlessly trivial task could be done in a true cross platform/os/generation manner. Most cross platform stuff I see sacrifices cross genereration/time support for cross os support, which means it's totally non posix relevant. Realworld is very different from fantasy world. – Lizardx Apr 27 '22 at 01:33