cut column 2 from text file

Question

My text file has no delimiter to specify separator just spaces, how do I cut out column 2 to output file,

39    207  City and County of San Francisc   REJECTED          MAT = 0
78    412  Cases and materials on corporat   REJECTED          MAT = 0
82    431  The preparation of contracts an   REJECTED          MAT = 0

So output I need is

207
412
432

See https://unix.stackexchange.com/questions/109835/how-do-i-use-cut-to-separate-by-multiple-whitespace — rogerdpack, Dec 07 '21 at 18:24

score 32 · Answer 1 · edited Sep 04 '22 at 11:52

32

It is easiest with awk which treats multiple consecutive spaces as a single one, so

awk '{print $2}' file

prints

207
412
431

But obviously there are many, many other tools which will do the job, even some which were not designed for such tasks, like (GNU) grep:

grep -Po '^[^ ]+[ ]+\K[^ ]+' file

edited Sep 04 '22 at 11:52

terdon

242,166

answered May 26 '16 at 21:58

jimmij

47,140

score 18 · Answer 2 · edited Sep 03 '21 at 08:18

Use pipes to squeeze the extra whitespaces and send your data (e.g, in columns.txt) into cut:

tr -s ' ' < columns.txt | cut -d" " -f2

In the example data you provided, a single space delimiter puts the data you want in field 5. However, if the first column was numerical and had leading spaces in order to align it to the right, you will need to adjust the field number. Squashing whitespace with tr -s ' ' first avoids having to deal with this.

To send that output into another file use redirection:

tr -s ' ' < columns.txt | cut -d" " -f2 > field2.txt

Using the awk command you could do something like the below which recognises automatically the field you are after because there is data there(?) I need to learn more about awk.

awk -F' ' '{print $2}' columns.txt

alexus · Answer 3 · 2022-09-04T09:23:57.903

2

per man cut

 -w      Use whitespace (spaces and tabs) as the delimiter.  Consecutive
         spaces and tabs count as one single field separator.

shell:

% cat $$
39    207  City and County of San Francisc   REJECTED          MAT = 0
78    412  Cases and materials on corporat   REJECTED          MAT = 0
82    431  The preparation of contracts an   REJECTED          MAT = 0
% cut -w -f2 $$
207
412
431
%

edited Sep 04 '22 at 09:23

answered Sep 03 '22 at 21:09

alexus

986

1

right, i just updated my answer and removed cat ;-) thank you! – alexus Sep 04 '22 at 09:24
3

what version of cut has this option? definitely not mine... – Michał F Apr 25 '23 at 10:05

score 1 · Answer 4 · answered May 27 '16 at 09:54

as @jimmij said, awk '{print $2}' file is the simplest answer.

If, for some reason, you don't want to use awk and insist on using cut, you can use sed to convert every instance of two or more spaces into a single tab (cut's default delimiter) before piping into cut:

$ sed -e 's/  \+/\t/g' riley.txt | cut -f2 
207
412
431

score 1 · Answer 5 · answered May 08 '22 at 04:43

1

Using Perl

perl -lane 'print $F[1];'

Using Raku (formerly known as Perl_6)

raku -ne 'put .words[1];'

See:
https://unix.stackexchange.com/a/109894/227738
https://unix.stackexchange.com/a/555394/227738
https://unix.stackexchange.com/a/701811/227738

answered May 08 '22 at 04:43

jubilatious1

3,195
8
17

score 0 · Answer 6 · answered May 26 '16 at 21:26

0

You can still use single spaces as your delimiter, you'll just have more columns. Increase the value you give to cut -d' ' -f from 2 to 5, or maybe 6. Increment the number until you get the desired results.

answered May 26 '16 at 21:26

Ryder

284

score 0 · Answer 7 · answered Dec 25 '19 at 07:23

0

grep -Po '^[^ ]+[ ]+\K[^ ]+' file

Above one is very useful when Linux utilities like awk for data extraction does not work if you are changing to "root" user inside a shell script like below one:

sudo -i <<EOF
ps aux | grep -E -i "[l]js"  |grep -v "javaagent" | awk '{print $2}' # So awk won't work here
ps aux | grep -E -i "[l]js"  | grep -v "javaagent" | grep -Po '^[^ ]+[ ]+\K[^ ]+'
EOF

answered Dec 25 '19 at 07:23

Alok Tiwari

101

1

The only reason "awk won't work" in your example is that the here-document is unquoted, meaning the $2 would be expanded by the shell to the second positional parameter. The solution is probably not to switch to a grep command but to either escape the $ in $2 as \$2 or to simply quote the whole document by using <<'EOF' in place of <<EOF. Also note that the task in your example is more simply carried out by pgrep. – Kusalananda Dec 25 '19 at 07:52
Thanks a lot @Kusalananda. It worked like a charm. – Alok Tiwari Dec 25 '19 at 09:04

cut column 2 from text file

7 Answers7