5

I want to be able to pass an argument on the command line to gawk that is evaluated for escape sequences.

The issue:

$ gawk 'BEGIN { print ARGV[1]; }' '\t'
\t

Instead, I would like to get an actual tab character.

From the gawk docs:

The escape sequences in the preceding list are always processed first, for both string constants and regexp constants. This happens very early, as soon as awk reads your program.

How can I interpret character escapes in the command line args?

The end goal is myscript.awk --sep '\t', where separator is a format string, so passing a literal tab isn't an option. I'm also familiar with the easy way I could perform this in bash, but I'm interested in a way to do this in [g]awk.

Anthon
  • 79,293
cdosborn
  • 540

3 Answers3

2

How can I print the unescaped version of command line args?

print ARGV[1]

The problem is that you don't want the unescaped command line argument. You want to interpret it. You're passing \t (the two-character string backslash, lowercase T), and you want that to be translated to a backslash. You'll need to do this manually. Just translating \t to a tab is easy — gsub(/\\t/, "\t") — but if you want to support octal escapes as well, and remove backslash before a non-recognized character, that's cumbersome in awk.

split ARGV[1], a, "\\";
s = a[1]; delete a[1];
for (x in a) {
    if (skip_next) {
        skip_next = 0;
    } else if (x == "") {
        s = s "\\";
        skip_next = 1;
    } else if (x ~ /^[0-7][0-7][0-7]/) {
        s = s sprintf("%c", 64*substr(x,1,1) + 8*substr(x,2,1) + substr(x,3,1));
        sub(/^.../, x);
    } else if (x ~ /^[0-7][0-7]/) {
        s = s sprintf("%c", 0 + 8*substr(x,1,1) + substr(x,2,1));
        sub(/^../, x);
    } else if (x ~ /^[0-7]/) {
        s = s sprintf("%c", 0 + substr(x,1,1));
        sub(/^./, x);
    } else {
        sub(/^a/, "\a", x) ||
        sub(/^b/, "\b", x) ||
        sub(/^n/, "\n", x) ||
        sub(/^r/, "\r", x) ||
        sub(/^t/, "\t", x) ||
        sub(/^v/, "\v", x);
    }
    s = s x;
}

(Warning: untested code!) Instead of this complex code, you could invoke printf in a subshell. Even that isn't so easy to do when the string could be multiline.

s = ARGV[1]
gsub(/'/, "'\\''", s)
cmd = "printf %b '" s "'."
s = ""
while ((cmd | getline line) > 0) s = s line "\n"
sub(/..$/, "", s)

Note that when you write "\t" in an awk script, that's a string containing the tab character. It's the way the awk syntax is: backslash has a special meaning in a string literal. Note: in a string literal, not in a string. If a string contains a backslash, that's just another character. The source code snippet "\t", consisting of four characters, is an expression whose value is the one-character string containing a tab, in the same way that the source code snippet 2+2, consisting of three characters, is an expression whose value is the number 4.

It would be better for your awk script to take the separator argument as a literal string. That would make it easier to use: your interface requires the caller to escape backslashes in the argument. If you want the separator to be a tab, pass an actual tab character.

  • @StéphaneChazelas, i'm only interested in one line (I should have mentioned). I think %b is unnecessary, if you replace it with just s then the escapes will be interpreted anyways. – cdosborn Jun 08 '15 at 15:34
  • @cdosborn, well no, %s doesn't do any expansion. It's %b that does echo-like expansion. Also note that %b or echo expansion is different from the one done by awk (or the format argument of printf) for octal sequences: you need \011, \11 won't do. – Stéphane Chazelas Jun 08 '15 at 16:04
  • I meant the string s in the example. – cdosborn Jun 08 '15 at 16:07
1

First of all, you're not actually passing a tab to your awk. Remember that the shell evaluates the arguments before passing them to awk and '\t' in quotes is evaluated as a literal \ followed by a \t:

$ set -x
$ gawk 'BEGIN { print ARGV[1]; }' '\t'
+ gawk 'BEGIN { print ARGV[1]; }' '\t'
\t

As you can see above, you are not passing a tab to gawk so you can hardly expect it to print one. Compare with the version below which does pass a tab:

$ gawk 'BEGIN { print ARGV[1]; }' "$(printf '\t')"
++ printf '\t'
+ gawk 'BEGIN { print ARGV[1]; }' ' '  ## note the tab
                         ## This line contains a printed tab

Alternatively, you could pass the tab as a variable:

gawk -v t='\t' 'BEGIN {print t}'

Here, the '\t' is being expanded by awk, not the shell, so the tab is interpreted correctly.

terdon
  • 242,166
0

The solution is to use getline.

Inside a file:

BEGIN { 
    sep = ARGV[1]
    gsub(/'/, "'\\''", sep);
    gsub(/%/, "%%", sep);
    "printf -- '" sep "'" | getline sep; 
    printf sep;
}
cdosborn
  • 540
  • Please read my and terdon's answer carefully. In your case, \t inside awk was interpreted as normal, it's not \\t like when the shell pass to awk anymore. – cuonglm Jun 07 '15 at 17:51
  • 1
    Please read my question carefully. My intent was never to make sure \\t was passed in, but to evaluate a format string, with the condition that the arg be passed as --sep '\t'. Both of your answers ignore the first line of my question. – cdosborn Jun 07 '15 at 18:36
  • \\t because you pass \t from the command line, the shell treat it as literal, escaped to \\t before passing to awk. You need to use "$(printf '\t')" or use awk variable instead awk -v t='\t' 'BEGIN {print t}'. I don't know why myscript.awk --sep "$(printf '\t')" does do what you want? – cuonglm Jun 07 '15 at 18:46
  • Your snippet is broken. Try it with the argument "a (i.e. gawk '…' '"a') – Gilles 'SO- stop being evil' Jun 08 '15 at 01:41
  • You're right. I updated the code. The file version which evades the shell escaping works for the cases I tested. I cannot get the other version to work with "'". – cdosborn Jun 08 '15 at 02:15
  • Note that that awk argument ends up being interpreted as shell code (for instance, try with a "'\$(reboot)'" argument). Also getline only reads one line from the echo output, so you can't use that for 'foo\nbar' for instance. Note that awk will expand the escape sequences if you use awk -v sep='\t' ... or awk '...' sep='\t'. Not all echo implementations expand escape sequences, for instance bash's needs a -e to perform the expansions. – Stéphane Chazelas Jun 08 '15 at 09:21
  • @StéphaneChazelas, i needed to update my example. On a side note, my previous call to echo, was to bin/echo which doesn't actually support -e. I forgot the commonecho is a builtin. – cdosborn Jun 08 '15 at 14:53
  • @StéphaneChazelas, See if you can execute commands on the updated version. It should enforce that its input is within ''. Credit to @Gilles. – cdosborn Jun 08 '15 at 15:00
  • Try with % or %999999999s. – Stéphane Chazelas Jun 08 '15 at 15:06
  • When you do "cmd" | getline in awk, awk runs sh -c cmd. So it is the builtin echo of sh you'll get, not /bin/echo. – Stéphane Chazelas Jun 08 '15 at 15:08
  • I was wrong again, thanks. It seems like if % is handled then printf will only interpret escaped chars. – cdosborn Jun 08 '15 at 15:17
  • @StéphaneChazelas: Can you make any comment about my answer? – cuonglm Jun 08 '15 at 17:01