3

I want to enclose these in single/double quote and ending with a comma. But I also need it so that the last line does not include the comma as it is the last record.

I've found several post of the same requirement but cannot find one that exclude the last line.

Below is the sample data I've saved into /tmp/x

ANONYMOUS
APEX_040200
APEX_PUBLIC_USER
APPQOSSYS
AUDSYS

Desired Output below enclosed in single quotes. May need to enclose in double quotes too later on.

'ANONYMOUS',
'APEX_040200',
'APEX_PUBLIC_USER',
'APPQOSSYS',
'AUDSYS'

I am able to use the following but don't know how to make it so that the last line is in single/double quotes but without the comma.

awk '{ printf "'\''%s'\'',\n", $0 }' /tmp/x ## single-quote
awk '{ printf "\"%s\",\n", $0 }' /tmp/x ## double-quote

So, the output am getting is below, note the comma on the last record.

"ANONYMOUS",
"APEX_040200",
"APEX_PUBLIC_USER",
"APPQOSSYS",

'ANONYMOUS', 'APEX_040200', 'APEX_PUBLIC_USER', 'APPQOSSYS',

How can I suppress the trailing , for the last line?

AdminBee
  • 22,803
benbart
  • 31

8 Answers8

4

Using any awk:

$ awk -v q=\' -v ORS= '{print s q $0 q; s=","RS} END{print RS}' file
'ANONYMOUS',
'APEX_040200',
'APEX_PUBLIC_USER',
'APPQOSSYS',
'AUDSYS'

or to use double instead of single quotes just change -v q=\' to -v q=\":

$ awk -v q=\" -v ORS= '{print s q $0 q; s=","RS} END{print RS}' file
"ANONYMOUS",
"APEX_040200",
"APEX_PUBLIC_USER",
"APPQOSSYS",
"AUDSYS"
Ed Morton
  • 31,617
  • Thanks Ed, this works fine. can i pass on a variable whether I want to use single or double quote? That way, i only need to 'remember' one of them? – benbart Oct 18 '23 at 22:04
  • My answer shows you exactly how to pass on a single or double quote in a variable so I don't understand why you're asking. – Ed Morton Oct 18 '23 at 23:22
3

The following awk program should work:

awk 'FNR>1{printf ",\n"} {printf "\047%s\047",$0} END{printf "\n"}' input.txt

This will

  • print a comma followed by a newline before anything else, except for when processing the first input line, to terminate the "previous line"
  • print the content of the current line enclosed in single-quotes (expressed via octal ASCII code), but without comma or new-line (since it doesn't know yet which will be needed)
  • at end-of-file, print only a new-line, to terminate the output corresponding to the last input line
AdminBee
  • 22,803
3

Does it have to be with AWK? sed allows checking for the last line directly ($), so you can remove a final comma with $s/,$//.

Or, in full, to both add the quotes and commas and remove the comma on the last line:

% sed -e "s/.*/'&',/" -e '$s/,$//' file.txt
'ANONYMOUS',
'APEX_040200',
'APEX_PUBLIC_USER',
'APPQOSSYS',
'AUDSYS'

or, through using a shell variable to hold the quote character (so we can swap it to a single quote through using q=\' instead.):

% q=\"
% sed -e "s/.*/$q&$q,/" -e '$s/,$//' file.txt
'ANONYMOUS',
'APEX_040200',
'APEX_PUBLIC_USER',
'APPQOSSYS',
'AUDSYS'
ilkkachu
  • 138,973
  • Why not sed -e "s/.*/'&',/; \$s/,\$//" file instead? That way, you only need to escape the two $. Or even sed -e "s/.*/'&',/" -e '$s/,$//' file which makes it work with any sed (the ability to join commands with ; inside the sed script is a GNU thing, or at least, not a global sed thing) and removes the need for escaping. – terdon Oct 18 '23 at 17:29
  • @terdon, yes, the one with two -e's makes sense. And so would something like -e "s/.*/'&',/;"'$s/,$//' (The semicolon works in Busybox and on my mac too, and I'm just too used to it from other programming languages to ever remember it might not be supported everywhere. Though now that I look, the POSIX text seems to say "Editing commands other than {...}, a, b, c, i, r, t, w, :, and # can be followed by a , optional characters, and another editing command.".) – ilkkachu Oct 18 '23 at 18:48
  • 1
    doesn't have to be awk. it just seems to be the one 'easiest' to use, am always lost with sed. can i pass on a variable whether I want to use single or double quote? – benbart Oct 18 '23 at 22:03
  • @benbart, not really in sed, but you could use a shell variable (edited). though it's a bit icky in that the variable gets embedded in the middle of the sed code, so putting e.g. a / there will break the sed script – ilkkachu Oct 19 '23 at 19:19
2

I would use perl for this. And, since this is perl, there are many ways.

$ perl -lne '$k="\"$_\""; print eof ? $k : "$k,"' file
"ANONYMOUS",
"APEX_040200",
"APEX_PUBLIC_USER",
"APPQOSSYS",
"AUDSYS"

Which is just a condensed version of:

$ perl -lne 'if(eof){ print "\"$_\""} else { print "\"$_\","}' file
"ANONYMOUS",
"APEX_040200",
"APEX_PUBLIC_USER",
"APPQOSSYS",
"AUDSYS"

or, with the quote character in a variable (use $q=chr(34) instead for a double-quote):

$ perl -lne '$q=chr(39); print eof ? "$q$_$q" : "$q$_$q,"' file
'ANONYMOUS',
'APEX_040200',
'APEX_PUBLIC_USER',
'APPQOSSYS',
'AUDSYS'

or

$ perl -lne 'if($last){print "\"$last\","} $last=$_; END{print "\"$last\""}' file
"ANONYMOUS",
"APEX_040200",
"APEX_PUBLIC_USER",
"APPQOSSYS",
"AUDSYS"

Or, if the file is small enough to fit in memory:

$ perl -007 -pe 's/^/"/;s/\n/",\n"/g; s/,\n"$/\n/; ' file
"ANONYMOUS",
"APEX_040200",
"APEX_PUBLIC_USER",
"APPQOSSYS",
"AUDSYS"

or

$ perl -007 -F'\n' -ne 'print "\""; print join("\",\n\"",@F); print "\"\n"' file
"ANONYMOUS",
"APEX_040200",
"APEX_PUBLIC_USER",
"APPQOSSYS",
"AUDSYS"

To switch between different quoting characters, you can use this, and pass the desired quoting character as the first argument:

## single quotes
$ perl -lne 'BEGIN{$q=$ARGV[0]; shift}if(eof){ print "$q$_$q"} else { print "$q$_$q,"}' \' file
'ANONYMOUS',
'APEX_040200',
'APEX_PUBLIC_USER',
'APPQOSSYS',
'AUDSYS'

double quote

$ perl -lne 'BEGIN{$q=$ARGV[0]; shift}if(eof){ print "$q$$q"} else { print "$q$$q,"}' " file "ANONYMOUS", "APEX_040200", "APEX_PUBLIC_USER", "APPQOSSYS", "AUDSYS"

So you could indeed have that as a variable:

$ quote1='"'
$ quote2="'"

And then use those:

perl -lne 'BEGIN{$q=$ARGV[0]; shift}if(eof){ print "$q$_$q"} else { print "$q$_$q,"}' "$quote1" file
terdon
  • 242,166
1

I would just pipe the output to sed:

awk '{ printf "\"%s\",\n", $0 }' /tmp/x | sed '$ s/.$//'

source

ramius
  • 853
1

Using awk:

$ awk -v c=','  -v q=\' 'NR>1{print a c} {a = q $0 q} END{print a}' file
Or

$ awk -v c=',' -v q=' 'a{print a c} {a = q $0 q} END{print a}' file

As suggested by @ilkkachu

The above command would produce output with single quotes. To get output in double quotes use -v q=\" instead of -v q=\'.

1

Using Raku (formerly known as Perl_6)

~$ raku -e 'put join ",\n", lines.map: *.subst(:global, /^ | $/, "\"");'  file 
"ANONYMOUS",
"APEX_040200",
"APEX_PUBLIC_USER",
"APPQOSSYS",
"AUDSYS"

OR:

~$ raku -e 'put join Q:b[,\n], lines.map: *.subst(:global, /^ | $/, Q:b["]);'  file
"ANONYMOUS",
"APEX_040200",
"APEX_PUBLIC_USER",
"APPQOSSYS",
"AUDSYS"

Above, both the zero-width beginning-of-string ^ token and the zero-width end-of-string $ token are subst-ituted with a double-quote character. Adding the :global named-argument assures that both substitutions per line are performed. The lines are then join-ed on ",\n" comma-newline to give the final output. (FYI, just join on "," comma if you want all your lines concatenated into one and separated by commas).

For quoting problems Raku offers an entire sub-language called the Q-lang to help massage your text into its final format. Hence the Q:b["] "quote-respecting-backslash-interpolation" code above. The Q:b[...] format works great for inserting characters like Q:b[\t] tabs and Q:b[\n] newlines without having to add extraneous single/double- quotes.

Specifically for single-quote and double-quote problems... Raku offers a related solution. Just use the Unicode name (either \c[APOSTROPHE] or \c[QUOTATION MARK]) as below. The fewer single/double- quotes in your one-liners the more portable your code is (e.g. to Windows):

~$ raku -e 'put join Q:b[,\n], lines.map: *.subst(:global, /^ | $/, "\c[APOSTROPHE]");'  file
'ANONYMOUS',
'APEX_040200',
'APEX_PUBLIC_USER',
'APPQOSSYS',
'AUDSYS'

#OR:

~$ raku -e 'put join Q:b[,\n], lines.map: *.subst( /^ | $/, Q:b[\c[APOSTROPHE]]);' file 'ANONYMOUS', 'APEX_040200', 'APEX_PUBLIC_USER', 'APPQOSSYS', 'AUDSYS'


[Below I try to translate @terdon's excellent Perl solutions into Raku. But hopefully the above suffices].

Raku only has eof as method call on filehandles. So an approximate (but verbose) translation of the Perl solution by @terdon is as follows:

~$ raku -e 'my $fh = $*ARGFILES.IO.open; for $fh.lines() { 
               put $fh.eof ?? (Q:b["] ~ $_ ~ Q:b["]) !! (Q:b["] ~ $_ ~ Q:b[",]) 
            };'  file

#OR:

~$ raku -e 'my $fh = $*ARGFILES.IO.open; put $fh.eof
?? (Q:b["] ~ $_ ~ Q:b["])
!! (Q:b["] ~ $_ ~ Q:b[",])
for $fh.lines();' file

#OR:

~$ raku -e 'given $*ARGFILES.IO.open -> $fh { $fh.eof
?? put(""" ~ $_ ~ """)
!! put(""" ~ $_ ~ "",")
for $fh.lines() } ;' file

In Raku, the ternary operator is Test ?? True !! False , so the code scans pretty closely with the ternary Perl answer given by @terdon. Also, string-concatenation in Raku is accomplished with ~ tilde (commas work just as well). Regexes are discarded in this solution and indeed the Raku code could be more simply written with backslash escapes as put("\"$_\"") or similar.

@terdon provides a -007 Perl solution which reads the entire file into memory. Raku's equivalent is slurping the file. Below zero-width ^^ beginning-of-line and $$ end-of-line regex tokens are utilized:

~$ raku -e 'given slurp() {S:g/ ^^ /"/ andthen S:g/ $$ /",/ andthen S/ \,\n $ // andthen .put};'  file

Probably obvious but notable anyway: save code inside the singlequotes to something like doublequote.p6 if you want a zero-boilerplate script you can run (+/- making it executable).


Sample Input:

ANONYMOUS
APEX_040200
APEX_PUBLIC_USER
APPQOSSYS
AUDSYS

https://docs.raku.org/syntax/eof%20-%20perlfunc
https://raku.org

jubilatious1
  • 3,195
  • 8
  • 17
  • 1
    Nice! If you do go for any of mine, please leave me a comment, I'd love to see how and if that sort of classic "line noise" (:P) Perl translates into Raku. – terdon Oct 18 '23 at 17:38
  • @terdon well I've tried! It's verbose but many Perl tricks can be used to make it shorter. FYI Raku's lines routine autochomps by default, which is why it's my initial solution (using join to spare the last line). I'm actually learning quite a lot of Perl here from you and others (e.g. @ilkkachu)! – jubilatious1 Oct 18 '23 at 18:54
0

Well this is the most beautiful way I could possibly think of:

awk '{printf "%s\047%s\047", end, $0, end=",\n"}' /tmp/x

Basically you add the ,\n with a variable, but you declare the variable only after the first output and use the ascii code \047 for the ' :)


Note that the above doesn't print a final newline, so it would not create a valid text file as defined by POSIX. If that's a problem, use this instead:

awk '{printf "%s\047%s\047", end, $0, end=",\n"}END{print ""}' /tmp/x
terdon
  • 242,166
Bog
  • 989
  • Please note that this is basically the approach taken by @EdMorton in this answer, only that your program doesn't print a terminating newline for the last line of input (which would be considered a malformed text file in many environments). Also, should the , before the end= assignment not be a ;? – AdminBee Oct 19 '23 at 08:28
  • Well yes same approach, but better looking. And it doesn't matter if it is , or ; :) – Bog Oct 19 '23 at 08:41
  • This is indeed nice, but the lack of final newline is a problem. POSIX defines a text file as "A file that contains characters organized into zero or more lines" and defines lines as "A sequence of zero or more non- characters plus a terminating character.". This means that without the final newline, this isn't a file, and some tools will not be able to deal with it. You could just add END{print ""} to your script to get around that. – terdon Oct 19 '23 at 11:53
  • @terdon Man I like you :)) And yeah I know, but in most cases it doesn't matter. In this case, it would improve the Quality but decrease the Beatifulness^^ But yeah you are right. Go ahead, you can edit my answer (so you get credit for that) :) – Bog Oct 19 '23 at 12:31
  • , end=",\n" (leading comma) should be ; end=",\n" (leading semicolon) in both scripts. As written ",\n" will be passed to printf as an argument after it's been evaluated and YMMV with what any awk version does when printf gets too many arguments for the format string you provided. – Ed Morton Oct 19 '23 at 17:31
  • 1
    Oh gosh, this is terribly obfuscated. For one, you have a variable called end, but used in the front. Also the thing Ed said about putting the assignment in the argument list. About that, I also wonder if the order of evaluation for function arguments and their side-effects is defined in AWK (IIRC, it's not in C, and e.g. GCC warns about a construct like that. I think newer languages tend to define a left-to-right order, though.) – ilkkachu Oct 19 '23 at 19:03