32

My text file has no delimiter to specify separator just spaces, how do I cut out column 2 to output file,

39    207  City and County of San Francisc   REJECTED          MAT = 0
78    412  Cases and materials on corporat   REJECTED          MAT = 0
82    431  The preparation of contracts an   REJECTED          MAT = 0

So output I need is

207
412
432
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
riley
  • 351

7 Answers7

32

It is easiest with awk which treats multiple consecutive spaces as a single one, so

awk '{print $2}' file

prints

207
412
431

But obviously there are many, many other tools which will do the job, even some which were not designed for such tasks, like (GNU) grep:

grep -Po '^[^ ]+[ ]+\K[^ ]+' file
terdon
  • 242,166
jimmij
  • 47,140
18

Use pipes to squeeze the extra whitespaces and send your data (e.g, in columns.txt) into cut:

tr -s ' ' < columns.txt | cut -d" " -f2

In the example data you provided, a single space delimiter puts the data you want in field 5. However, if the first column was numerical and had leading spaces in order to align it to the right, you will need to adjust the field number. Squashing whitespace with tr -s ' ' first avoids having to deal with this.

To send that output into another file use redirection:

tr -s ' ' < columns.txt | cut -d" " -f2 > field2.txt

Using the awk command you could do something like the below which recognises automatically the field you are after because there is data there(?) I need to learn more about awk.

awk -F' ' '{print $2}' columns.txt
Stephen Kitt
  • 434,908
aeiounix
  • 404
  • 2
  • 5
2

per man cut

 -w      Use whitespace (spaces and tabs) as the delimiter.  Consecutive
         spaces and tabs count as one single field separator.

shell:

% cat $$
39    207  City and County of San Francisc   REJECTED          MAT = 0
78    412  Cases and materials on corporat   REJECTED          MAT = 0
82    431  The preparation of contracts an   REJECTED          MAT = 0
% cut -w -f2 $$
207
412
431
%
alexus
  • 986
1

as @jimmij said, awk '{print $2}' file is the simplest answer.

If, for some reason, you don't want to use awk and insist on using cut, you can use sed to convert every instance of two or more spaces into a single tab (cut's default delimiter) before piping into cut:

$ sed -e 's/  \+/\t/g' riley.txt | cut -f2 
207
412
431
cas
  • 78,579
1

Using Perl

perl -lane 'print $F[1];' 

Using Raku (formerly known as Perl_6)

raku -ne 'put .words[1];' 

See:
https://unix.stackexchange.com/a/109894/227738
https://unix.stackexchange.com/a/555394/227738
https://unix.stackexchange.com/a/701811/227738

jubilatious1
  • 3,195
  • 8
  • 17
0

You can still use single spaces as your delimiter, you'll just have more columns. Increase the value you give to cut -d' ' -f from 2 to 5, or maybe 6. Increment the number until you get the desired results.

Ryder
  • 284
0
grep -Po '^[^ ]+[ ]+\K[^ ]+' file

Above one is very useful when Linux utilities like awk for data extraction does not work if you are changing to "root" user inside a shell script like below one:

sudo -i <<EOF
ps aux | grep -E -i "[l]js"  |grep -v "javaagent" | awk '{print $2}' # So awk won't work here
ps aux | grep -E -i "[l]js"  | grep -v "javaagent" | grep -Po '^[^ ]+[ ]+\K[^ ]+'
EOF
  • 1
    The only reason "awk won't work" in your example is that the here-document is unquoted, meaning the $2 would be expanded by the shell to the second positional parameter. The solution is probably not to switch to a grep command but to either escape the $ in $2 as \$2 or to simply quote the whole document by using <<'EOF' in place of <<EOF. Also note that the task in your example is more simply carried out by pgrep. – Kusalananda Dec 25 '19 at 07:52
  • Thanks a lot @Kusalananda. It worked like a charm. – Alok Tiwari Dec 25 '19 at 09:04