2

I have input.txt

abcd
abcg

To select lines beginning with 'a' and ending with 'g' I write:

cat input.txt | awk '/^a/' | awk '/g$/{print $0}'

How can I combine the regular expressions ^a and g$ to be able to use only one instance of awk?

Viesturs
  • 943
  • 3
  • 12
  • 16

3 Answers3

4

Just use a single regex that matches both start and finish:

awk '/^a.*g$/' input.txt

Or, if you really want to use two, you can combine them with &&:

awk '/^a/ && /g$/' input.txt
terdon
  • 242,166
  • Is .* a combination of . and * or is it a separate operator? – Viesturs Sep 22 '22 at 15:32
  • 1
    It is a combination of . (any character) and the modifier * which means "0 or more times", so it means "match absolutely anything, including nothing at all". – terdon Sep 22 '22 at 15:38
  • 1
    Note that anything is not the same thing as any sequence of characters. For instance, in a UTF-8 locale and on a GNU system, both don't give the same outcome on the output of printf 'appliqu\351ing\n' (appliquéing encoded in ISO8859-1) for instance. – Stéphane Chazelas Sep 22 '22 at 15:54
  • Sigh. I will never understand this sort of thing. You're absolutely right, @StéphaneChazelas, printf 'appliqu\351ing\n' | awk '/^a.*g$/' returns nothing while printf 'appliqu\351ing\n' | awk '/^a/ && /g$/' works on my Arch system, although at least printf 'appliqu\351ing\n' | perl -ne 'print if /^a.*g$/' and printf 'appliqu\351ing\n' | perl -ne 'print if /^a/ && /g$/' both work. I take it that . doesn't match \351 for some reason? – terdon Sep 22 '22 at 16:00
  • \351 cannot be decoded into a character in UTF-8, so it's not matched by .. With perl, you need -C or -Mopen=locale to decode input as text. – Stéphane Chazelas Sep 22 '22 at 16:05
  • The real question should be: Why are lines with invalid character allowed in your text file ? @StéphaneChazelas – QuartzCristal Sep 22 '22 at 21:58
  • Any of -C0 or -C or -C127 or -Mopen=locale will match appliquéing encoded in ISO8859-1 using .*. So, it will get printed. @StéphaneChazelas – QuartzCristal Sep 22 '22 at 22:11
  • On my system, with perl 5.36.0, printf 'appliqu\351ing\n' | perl -ne 'print if /^a.*g$/' is printed, with no extra options. – terdon Sep 23 '22 at 08:49
  • But that's not decoded as text, so it's only valid in single-byte locales and only because your pattern has no non-ASCII characters. For instance perl -ne 'print if /^..$/' would print a line containing ê in a locale using UTF-8. With -C/-Mopen=locale, @QuartzCristal with -w, you'd get errors on that \351 bytes, and without it with -Mopen=locale would be decoded into something like the 4 character \xE9 string. In any case POSIXly, the behaviour of awk/grep on non-text is unspecified. I was just pointing out that the two approaches were not strictly equivalent. – Stéphane Chazelas Sep 23 '22 at 12:17
  • Also beware there are some locales where the charset has characters whose multibyte encoding ends in the same encoding as that of g, so printf 'appliqu\351ing\n' | perl -ne 'print if /^a.*g$/' could match on lines that end in those characters. – Stéphane Chazelas Sep 23 '22 at 12:19
2

No need for awk, just grep:

grep "^a.*g$" input.txt
1

To make the answer as generic as possible using , here is an alternate way to perform the desired action, where string is passed as from the command line.

Demonstration test data is embedded in this example.

Using the script

#!/bin/sh

sSTRT="${1}" sEND="${2}"

echo "John Wells John Wayne Robert Wayne" | awk -v sTrt="^${sSTRT}" -v sEnd="${sEND}$" ' $0 ~ sTrt && $0 ~ sEnd '

and executing the command

script "John" "Wayne"

the output is

John Wayne

with other lines ignored.

Special note: the "^" abd "$" must be passed literally as part of the awk variables.