9

Ok, since this is a complex question, I will explain it clearly. I got a file content shown as below:

$ Cat File1 
ABC Cool Lol POP {MNB}
ABC Cool Lol POP {MNB}
ABC Cool Lol POP {MNB}
ABC Cool Lol POP {TBMKF}
ABC Cool Lol POP {YUKER}
ABC Cool Lol POP {EFEFVD}

The output that I want

-Cool MNB +  POP ;
-Cool MNB  + POP ;
-Cool MNB  + POP ;
-Cool TBMKF + POP ;
-Cool YUKER + POP ;
-Cool EFEFVD +POP ;

Firstly I try to take out the last column from the File1 and print it out by sed 's/[{}//g' File1 > File3

After that I copy the whole content of File1 to a new File4

cp File1 File4

After that I replace the data inside the File4 with the File3 data (means the data without bracket one "File1 last column that one")

awk 'FNR==NR{a[NR]=$1;next}{$5=a[FNR]}1' File3 File4 >>File5 

Output should be like this

ABC Cool Lol POP MNB
ABC Cool Lol POP MNB
ABC Cool Lol POP MNB
ABC Cool Lol POP TBMKF
ABC Cool Lol POP YUKER
ABC Cool Lol POP EFEFVD

Finally, I try

awk -F“ " '{print - $2,$5 +,$4 ";"}‘ File5

But the outcome did not come out as shown as I want, only the similar data MNB is all listed down, others did not shown up (File one last column data),

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
  • Are you using gnu awk? – 123 Sep 15 '16 at 13:29
  • i not sure what u mean .But I just a new begineer for touching awk .This is the task that I need to be done ,I try my best to slowly one step to one step to do that based on my understanding of awk. – heng960407 Sep 15 '16 at 13:31
  • 1
    type awk --version, whats the result? – 123 Sep 15 '16 at 13:33
  • 3
    Please change your title to something more specific to your problem. This will make it easier for others who have similar questions in future to find it. At the moment "A question about awk" is very general. – Tom Fenech Sep 16 '16 at 10:38

4 Answers4

16

I don't know why you are copying things left and right. The simple thing is

awk '{print "-" $2, substr($5,2,length($5)-2), "+", $4, ";"}' File1

I put the - in the beginning and the ; at then end.

In between we print

  • $2 because we want it as it is.
  • a substring of $5, which is the string without the first and the last character. We skip the first character by starting at position 2 (awk has always been strange about that) and leave out the last character by only selecting a substring which is two characters shorter, than the original $5
  • the + because we want it
  • and then $4

However, I'm not sure if all these string functions are specific to GNU awk.

Bananguin
  • 7,984
7

With sed

sed '
    s/\S\+\s/-/
    s/\(\S\+\s\)\{2\}{\(\S\+\)}/\2 + \1;/
    ' File1

And awk variation

awk -F"[[:blank:]{}]+" '{print "-" $2, $5, "+", $4}' ORS=" ;\n" File1
Costas
  • 14,916
6

Easy TXR job:

$ txr -c '@(repeat)
@a @b @c @d {@e}
@(do (put-line `-@b @e + @d ;`))
@(end)' -
ABC Cool Lol POP {MNB}
ABC Cool Lol POP {MNB}
ABC Cool Lol POP {MNB}
ABC Cool Lol POP {TBMKF}
ABC Cool Lol POP {YUKER}
ABC Cool Lol POP {EFEFVD}
[Ctrl-D][Enter]
-Cool MNB + POP ;
-Cool MNB + POP ;
-Cool MNB + POP ;
-Cool TBMKF + POP ;
-Cool YUKER + POP ;
-Cool EFEFVD + POP ;

Using TXR Lisp awk macro to transliterate Awk solution:

 txr -e '(awk (t (prn `-@[f 1] @{[f 4] [1..-1]} + @[f 3] ;`)))'

Fields are in the f list, and indexing is zero based.

Kaz
  • 8,273
  • 1
    +1 for the lisp and crytiest look ! That language MUST compete in pcg ( programming code golf) – Archemar Sep 15 '16 at 18:21
  • @Archemar TXR doesn't compete in golfing very well because there are specialized languages designed for that which do things like assign functions to individual characters, which can then be strung together to achieve composition. – Kaz Sep 15 '16 at 18:30
  • @Archemar Put an entry in: http://codegolf.stackexchange.com/questions/68712/output-the-next-kana – Kaz Sep 15 '16 at 20:53
  • 1
    @Kaz Is there a TXR tutorial somewhere ? The man page seems rather huge. How does it perform compared to awk ? – bli Sep 21 '16 at 08:21
  • 1
    @bli GNU Awk is something like at least 30 times faster at basic field splitting through a large file than the TXR awk macro, which is some 220+ lines of interpreted code, including the overall loop for processing input sources into records and fields. – Kaz Sep 21 '16 at 16:11
3

Using awk is easiest when the $1,$2,... fields already contain the exact strings you want to work with. The field separator, if it contains more than one character, is interpreted as a regular expression. We don't need to do any search and replace or substring operations to get rid of the {curly braces}. We just count them as part of the delimiter.

awk -F'[ {}]+' '{printf("-%s %s + %s ;\n", $2, $5, $4)}'

Using printf instead of print also makes it a bit easier to see how the string will be formatted, but if you want to have print "-"$2,$5" + "$4";" instead of printf("-%s %s + %s ;\n", $2, $5, $4), that's an option.

Ray
  • 143