Formatting output with Awk and regex

Question

I have ~20 files all around 300 lines long, populated by data formatted like this:

62640 usec, 3500 usec
1640 usec, 480 usec
360 usec, 520 usec
1200 usec, 500 usec
340 usec, 520 usec

and I want to turn this into Arduino code, in the format

delayMicroseconds(62640);
pulseIR(3500);
delayMicroseconds(1640);
pulseIR(480);
delayMicroseconds(360);
pulseIR(520);
delayMicroseconds(1200);
pulseIR(500);
delayMicroseconds(340);
pulseIR(520);

and so on, where the number in the delayMicroseconds() function is the first number on each line, and the number in pulseIR() is the second number on each line.

Any ideas? I feel like this should be possible in awk.

score 10 · Answer 1 · answered Oct 07 '18 at 22:18

There's no need for any regex here - just printf to format the fields into a string:

$ awk '{printf("delayMicroseconds(%d);\npulseIR(%d);\n", $1, $3)}' file
delayMicroseconds(62640);
pulseIR(3500);
delayMicroseconds(1640);
pulseIR(480);
delayMicroseconds(360);
pulseIR(520);
delayMicroseconds(1200);
pulseIR(500);
delayMicroseconds(340);
pulseIR(520);

score 2 · Answer 2 · answered Oct 07 '18 at 22:17

you can try this:

    #!/bin/bash
    cat file |sed '1d' | while IFS= read line; do

    n1=$(echo $line | awk '{print $1}')
    n2=$(echo $line | awk '{print $3}') 

    echo "
    delayMicroseconds($n1);
    pulseIR($n2);
    "
    done

delayMicroseconds(62640);
pulseIR(3500);
delayMicroseconds(1640);
pulseIR(480);
delayMicroseconds(360);
pulseIR(520);
delayMicroseconds(1200);
pulseIR(500);
delayMicroseconds(340);
pulseIR(520);

Alternatively you may try perl as follows;

perl -pe 's/(\d+)[\D]+(\d+)[\D]+/delayMicroseconds($1);\npulseIR($2);\n/' file
delayMicroseconds(62640);
pulseIR(3500);
delayMicroseconds(1640);
pulseIR(480);
delayMicroseconds(360);
pulseIR(520);
delayMicroseconds(1200);
pulseIR(500);
delayMicroseconds(340);
pulseIR(520);

ilkkachu · Answer 3 · 2018-10-08T05:50:04.767

With GNU sed:

$ sed -Ee 's/([0-9]+) usec, ([0-9]+) usec.*/delayMicroseconds(\1);\npulseIR(\2);/' < data 
delayMicroseconds(62640);
pulseIR(3500);
...

-E tells sed to use extended regular expressions^(*), the s/pattern/replacement/ command runs a search-replace operation over the current line (sed repeats the instructions given for each input line).

The pattern is ([0-9]+) usec, ([0-9]+) usec.* where [0-9] means any one digit, + any number of the previous "atom", and (...) saves ("captures") whatever was matched. So ([0-9]+) means any number of any digits, and saves the result. The letters are matched as-is, and the final .* matches any number of any characters, just to eat any possible garbage at end of line.

In the replacement, \1 and \2 are replaced with the saved contents of the (...) groups (but the parenthesis themselves are literal here), and \n means the newline characters (that may be not work on all seds).

^(*)See Why does my regular expression work in X but not in Y? for the difference between the different regex types. Also, there are a number of tutorials for sed online. I suggest finding one or two and playing around with it, at least to acquaint yourself with the s/// command.

score 1 · Answer 4 · answered Oct 08 '18 at 11:26

Using GNU awk:

awk -v RS=' usec[,\n] ?' '{print (NR%2?"delayMicroseconds":"pulseIR")"("$0")"}' file

This is using the string usec as record separator. This allows to catch each number as a different record and so the print command displays alternatively the 2 string with the wanted number.

Formatting output with Awk and regex

4 Answers4