How do I use cut to separate by multiple whitespace?

Question

I have this input, which is displayed in columns. I would like to get the second last column with the numbers of this sample:

[  3]  1.0- 2.0 sec  1.00 MBytes  8.39 Mbits/sec
[  3]  2.0- 3.0 sec   768 KBytes  6.29 Mbits/sec
[  3]  3.0- 4.0 sec   512 KBytes  4.19 Mbits/sec
[  3]  4.0- 5.0 sec   256 KBytes  2.10 Mbits/sec
...

If I use

cut -d\  -f 13

I get

Mbits/sec
6.29
4.19
2.10

because sometimes there are additional spaces in between.

The last column is Mbits/sec, is that what you want or the 2 last columns? — terdon, Jan 18 '14 at 14:43
I was looking for the same thing for wg show all latest-handshakes which looks like multiple whitespace for separator (I don't really know) and it turns out that the default separator (whatever that is) worked fine! So <cmd> | cut -f 3 worked nicely. — Karl Pokus, Mar 24 '21 at 12:24
See https://stackoverflow.com/questions/7142735/how-to-specify-more-spaces-for-the-delimiter-using-cut# — rogerdpack, Oct 29 '21 at 18:43

score 277 · Accepted Answer · edited Jun 22 '22 at 16:42

277

If we use tr command along with squeeze option (-s flag ) to convert all multiple consecutive spaces to a single space and then perform cut operation with space as delimiter – we can access the required column carrying the numbers:

< file tr -s ' ' | cut -d ' ' -f 8

edited Jun 22 '22 at 16:42

ilkkachu

138,973

answered Jan 25 '17 at 14:17

Wald Schilfrohr

2,786

13

BSD cut offers -w for such cases! – Michael-O Mar 24 '20 at 10:05
@Michael-O Nice! I'm using cut from GNU coreutils on Arch Linux and it lacks such an option. – mehdix May 26 '20 at 12:40
4

this can be done without cat: tr -s ' ' file | cut -d ' ' -f 8 – Josh Aug 26 '20 at 14:16
2

My version of tr (8.30) does not accept a file, so I need to redirect the stream. The command becomes: tr -s ' ' < file | cut -d ' ' -f 8 (mind the <). – Anthony Labarre Feb 15 '22 at 20:23
1

@AnthonyLabarre Indeed. I tried to edit the post to make that correction, but SO said "Edits must be at least 6 characters; is there something else to improve in this post?", and I refuse to play games to get around that, so the wrong answer stands, at least for now. – Steve Summit Apr 12 '22 at 15:31
tr -s ' ' is an answer I was NOT looking for NOW but I was looking a few moments ago and finding only sed options to replace multiple white spaces into one! Thank you! – Grzegorz Aug 24 '22 at 13:22
It's funny how sometimes the little people do more work more quickly with a fewer words than all the "important" people put together. Here's tr fixing a common use-case with 9 characters. – NeilG May 26 '23 at 04:16
Ftr: the cut on MacOS is BSD compatible – dtk Jan 12 '24 at 16:57

score 36 · Answer 2 · edited Aug 11 '15 at 22:15

36

To answer your question literally:

sed 's/   */:/g' | cut -d : -f 5

or

awk -F '  +' '{print $5}'

But that won't do if the number in brackets reaches 10, etc. If you're only interested in the numbers, you could remove everything else.

sed 's/[^.0-9][^.0-9]*/:/g' | cut -d : -f 6

edited Aug 11 '15 at 22:15

Stéphane Chazelas

544,893

answered Jan 18 '14 at 00:02

Gilles 'SO- stop being evil'

829,060

yes, sure only the numbers, but only your 3rd example works correctly – rubo77 Jan 18 '14 at 00:08
@rubo77 Works for me. The first two examples do exactly what you ask in your title. Or did you want to strip off the unit as well? In that case, add | sed 's/ .*//' at the end of the first two examples. Of course there are many other ways to do it. – Gilles 'SO- stop being evil' Jan 18 '14 at 00:13
a bit shorter with + instead of *: cat test | sed 's/[^.0-9]+/:/g' | cut -d : -f 6 – rubo77 Sep 01 '16 at 08:29
@rubo77 If your sed supports it, that is. It's supported by GNU and BusyBox but not by e.g. BSD or Solaris. POSIX specifies + and ? in ERE but leaves \+ and \? in BRE undefined. – Gilles 'SO- stop being evil' Sep 01 '16 at 08:57
So the right option for columned input into cut would be -c35-38 – rubo77 Jun 20 '20 at 02:33
Thanks, I now have alias split = "sed 's/\s\s*/ /g' |cut -d ' ' -f " in my bashrc. – Thomas Ahle Apr 26 '21 at 18:35

score 17 · Answer 3 · edited Jun 11 '20 at 12:04

These commands will all print the last column of a space separated file:

awk '{print $NF}' file

in awk, NF is the number of fields and $NF is the last field.
perl -lane 'print $F[$#F]' file

-a splits the file on whitespace into the array @F, $#F is the number of elements in the array so $F[$#F] is the last element. The -n means read the file given on the command line and apply the script passed with -e to each line. -l just adds a newline character (\n) to each print statement.
sed 's/.* //g'

a simple regular expression that matches everything to the last space and deletes it, leaving only the last column.
rev file | cut -d' ' -f 1 | rev

rev reverses its output so the last field is the first, cut with delimiter space to print it and rev to reverse the text back to normal. This won' t work if you have consecutive whitespace.

Based on your input, I am guessing you don't actually want the last column but the penultimate one or the two last ones. In that case use these to print the last 2 (8.39 Mbits/sec):

awk '{print $(NF-1),$NF}' file 
perl -lane 'print "$F[$#F-1] $F[$#F]"' file 
sed 's/.* \(.* .*\)/\1/' file 
rev file | cut -d' ' -f 1,2 | rev

and these to print the penultimate (8.39):

awk '{print $(NF-1)}' file 
perl -lane 'print $F[$#F-1]' file 
sed 's/.* \(.*\) .*/\1/' file 
rev file | cut -d' ' -f 2 | rev

kenorb · Answer 4 · 2015-08-11T23:02:27.423

You can't separate multiple occurrence of whitespaces using cut as per manual:

Output fields are separated by a single occurrence of the field delimiter character.

unless the text is separated by the same amount or you use tr to remove excess of them.

Otherwise use alternative tools such as awk, sed or ex.

For example:

ex -s +'%norm $2Bd0' +%p +q! foo.txt

Replace +q! with -cwq to save the changes in-place.

score 4 · Answer 5 · answered Dec 03 '19 at 17:02

Use a perl one-liner like so:

perl -lane 'print $F[-2]' input_file

Explanation:

Option -e causes the perl interpreter to look for the script inline, rather than in a file.

Option -n causes the input (file or STDIN from a pipe) to be read line by line.

Option -l strips the input record separator (OS-dependent, newline on UNIX by default) after reading the line, and adds it at the end to every print

Option -a causes each input line to be split on whitespace into array @F, and $F[-2] is the second element counting from the end, which is the field you want. You can also use $F[$#F-1], where $#F is the last index of the array @F, which is slightly less readable.

score 3 · Answer 6 · answered Jun 20 '20 at 02:36

3

the right option for columned input into cut would be

cut -c35-38

answered Jun 20 '20 at 02:36

rubo77

28,966

This is pretty hardcoded to the above example which is probably one in a million. – karatedog Mar 23 '23 at 09:57
yes, but it shows, how you would always get the right columns. this would work even if the column you look for contains sometimes extra spaces – rubo77 Jan 17 '24 at 09:37
This cuts out the characters from the 35th position up to the 38th position, regardless of the content. To cut this way properly, you have to be sure your columns 1. are always the same width, 2. are always start at the same position. As soon as one of the rows contain 123.5 Mbits/sec and/or download time goes up to 234.0- 235.0 sec level, the above solution is bust. Your solution is rigid, the above task needs a flexible one. – karatedog Jan 18 '24 at 12:43

score 3 · Answer 7 · answered Jun 22 '22 at 16:31

3

Use the -w option, if your cut version is BSD compatible

$ echo "firstColumn        secondColumn   thirdColumn" | cut -w -f3
thirdColumn

answered Jun 22 '22 at 16:31

Cory Klein

18,911

Which it is on MacOS (which I didn't know ) – dtk Jan 12 '24 at 16:55

score 1 · Answer 8 · answered Feb 01 '23 at 08:04

1

Answering the question in the body instead of the one in the heading:

echo '[  3]  2.0- 3.0 sec   768 KBytes  6.29 Mbits/sec' \
| rev | cut -d' ' -f1 | rev

Obviously use -f2 but the question asks for the last column so I comply.

answered Feb 01 '23 at 08:04

Vorac

3,077

1

you are right, that sentence wasn't quite exact, I edited my question – rubo77 Feb 02 '23 at 18:22

jubilatious1 · Answer 9 · 2022-06-23T03:15:27.690

Using Raku (formerly known as Perl_6)

raku -ne '.words[*-2..*].put;'

Sample Input:

[  3]  1.0- 2.0 sec  1.00 MBytes  8.39 Mbits/sec
[  3]  2.0- 3.0 sec   768 KBytes  6.29 Mbits/sec
[  3]  3.0- 4.0 sec   512 KBytes  4.19 Mbits/sec
[  3]  4.0- 5.0 sec   256 KBytes  2.10 Mbits/sec

Sample Output:

8.39 Mbits/sec
6.29 Mbits/sec
4.19 Mbits/sec
2.10 Mbits/sec

You might want to try Raku, a member of the Perl-family of programming languages. Above, the words routine breaks on whitespace. Columns can be selected with square brackets: since Raku (and Perl) are zero-indexed, the second-to-last column is *-2 and the last column is *-1. Here, either words[*-2..*-1] or words[*-2..*] works, the latter indicating 'give me the second-to-last column up to * whatever'.

Oh, the OP only wants the second-to-last column? Titled by the last column?

~$ raku -ne '.words[*-1].put if ++$ == 1; .words[*-2].put;' file
Mbits/sec
8.39
6.29
4.19
2.10

https://docs.raku.org/routine/words
https://raku.org

dsimic · Answer 10 · 2023-08-03T04:03:14.153

I've created a patch that adds new -m command-line option to cut, which works in the field mode and treats multiple consecutive delimiters as a single delimiter. This basically solves the OP's question in a rather efficient way. I also submitted this patch upstream a couple of days ago, and let's hope that it will be merged into the coreutils project.

There are some further thoughts about adding even more whitespace-related features to cut, and having some feedback about all that would be great. I'm willing to implement more patches for cut and submit them upstream, which would make this utility more versatile and more usable in various real-world scenarios.

How do I use cut to separate by multiple whitespace?

10 Answers10

Linked