A unix command to truncate each line of a file

Question

I have a CSV file and I want to truncate it from the third semicolon. For example, if I have this file:

1;foo;bar;baz;x;y;z
2;foo;bar;baz;x;y;z
3;foo;bar;baz;x;y;z

I want to get the following output:

1;foo;bar
2;foo;bar
3;foo;bar

I don't know what kind of Unix command I can use for that. What do you suggest?

Note that this manipulation will be done on a KSH script.

score 19 · Accepted Answer · answered Nov 13 '12 at 16:49

19

For the sake of variety, here's another way with cut:

cut -d \; -f -3

answered Nov 13 '12 at 16:49

Chris Down

125,559
25
270
266

I've never really learned to use cut. :-) – Omnifarious Nov 13 '12 at 17:56
wouldn't it be cut -d \; -f 3? – gt6989b Nov 13 '12 at 19:34
3

@gt6989b No, that would print the third field. -3 says to print all fields up to and including the third field. – Chris Down Nov 13 '12 at 19:41
@ChrisDown, thanks, was thinking about -f 1-3, this is a useful shortcut. – gt6989b Nov 13 '12 at 20:54

Omnifarious · Answer 2 · 2012-11-13T16:53:17.300

6

This will accomplish what you ask:

awk -F';' '{print $1 ";" $2 ";" $3;}' <input >output

The awk utility is well designed for this task. It can easily cut up individual lines into fields, then manipulate them based on that. The -F';' argument tells awk to use ; as the field separator. The quotes are necessary because the shell would interpret ; as a command separator without them.

The command given to awk to execute for each line (the '{print $1 ";" $2 ";" $3;}' bit) is similarly quoted to keep all the funny characters ({, }, $, " and ; in this case) from being treated specially by the shell and make sure the whole thing is passed to awk as one unit.

And, of course, <input and >output are the redirection directives being given to the shell to redirect the command's input and output from and to a file.

edited Nov 13 '12 at 16:53

answered Nov 13 '12 at 16:40

Omnifarious

1,322

You need to set OFS, otherwise the ';' will be converted to spaces. – jordanm Nov 13 '12 at 16:42
@jordanm: nod That's one way. The other way is just to put the literal ';' characters in there. :-) I wasn't sure if , would do what I wanted, which is why I had to test it. – Omnifarious Nov 13 '12 at 16:43
5

Most of the current awks allow this shorter way: awk -F';' -vOFS=';' 'NF=3'. (With extra precaution: awk -F';' -vOFS=';' 'NF>3{NF=3}1'.) – manatwork Nov 13 '12 at 16:56
@manatwork I like your answer the most. – jordanm Nov 13 '12 at 17:04
@manatwork: Wow. I suppose that makes sense. But it's starting to get perlish in its terse obscurity. – Omnifarious Nov 13 '12 at 17:57

score 3 · Answer 3 · answered Nov 13 '12 at 16:41

You can do this using awk, which is not dependent on the shell. You will need to write the output to a temporary file, and then move it on top of the existing one.

awk -F';' 'BEGIN { OFS=";" } { print $1,$2,$3 }' file.txt > newfile.txt
mv newfile.txt file.txt

score 1 · Answer 4 · answered Nov 13 '12 at 17:22

1

Not the greatest alternative, just in case you need in-place editing and wish to solve it with sed:

sed -i ':b;s/;[^;]*//3;tb' file.txt

answered Nov 13 '12 at 17:22

manatwork

31,277

2

Systems that have ksh usually don't have GNU sed, and sed -i is a GNU extension. – Gilles 'SO- stop being evil' Nov 13 '12 at 21:58
Doh! I checked every used sed command against the POSIX specification before posting, but I forgot the command line option. (By the way, personally I always have a ksh implementation on my Linuxes, either Public Domain Korn Shell or MirBSD™ Korn Shell.) – manatwork Nov 14 '12 at 06:47
And allowing things after the : command is also a GNU extension. POSIX clearly says you can't have semicolon followed by other commands after :. – Stéphane Chazelas Mar 26 '24 at 06:12

jubilatious1 · Answer 5 · 2024-03-30T21:53:55.293

Using Perl

~$ perl -lane '   @F =  split(";"); print join ";", @F[0..2];'  file

Using Raku (formerly known as Perl_6)

$~ raku   -ne 'my @F = .split(";");   put join ";", @F[0..2];'  file

Here are two answers using Perl and Raku, respectively. Data is read-in linewise using the Perl -lane or Raku -ne non-autoprinting flags. (The difference is Raku performs the -l autochomping by default, however Raku doesn't have the -a flag).

After this, the code is virtually identical. In Raku you have to declare the @F (or @G, or @H array, etc.) with my or our (scope descriptor). Also Raku demands that you indicate what object you're calling split on (.split is short for $_.split meaning the $_ topic variable, which holds individual lines of data as they are read-in).

Finally you either print (Perl) or put (Raku). Raku has print as well, but put adds a newline terminator for you.

Sample Input:

1;foo;bar;baz;x;y;z
2;foo;bar;baz;x;y;z
3;foo;bar;baz;x;y;z

Sample Output:

1;foo;bar
2;foo;bar
3;foo;bar

Perl References:
https://perldoc.perl.org
https://www.perl.org

Raku References:
https://docs.raku.org
https://raku.org

A unix command to truncate each line of a file

5 Answers5

Linked