5

I've no issue with this cut command.

wolf@linux:~$ echo ab cd ef
ab cd ef

wolf@linux:~$ echo ab cd ef | cut -d ' ' -f 1 ab

wolf@linux:~$ echo ab cd ef | cut -d ' ' -f 2 cd

However, when I try the same command with different input like this, I did not get the output as expected.

wolf@linux:~$ ip address show eth0 | grep 'inet '
    inet 10.10.10.10/24 brd 10.10.10.255 scope global dynamic eth0

wolf@linux:~$ ip address show eth0 | grep 'inet ' | cut -d ' ' -f 1

wolf@linux:~$ ip address show eth0 | grep 'inet ' | cut -d ' ' -f 2

What's wrong in the second example?

awk doesn't seems to have a problem with the same input, strange.

wolf@linux:~$ ip address show eth0 | awk '/inet / {print $1}'
inet

wolf@linux:~$ ip address show eth0 | awk '/inet / {print $2}' 10.10.10.10/24

Wolf
  • 1,631
  • 5
    ... there are 4 spaces before inet ...? – steeldriver Aug 27 '20 at 14:25
  • Thanks @steeldriver. But why does awk doesn't have this issue? It works with the first argument. – Wolf Aug 27 '20 at 14:30
  • 2
    @Wolf awk counts the first non-blank character as part of the first field, by default. – Kusalananda Aug 27 '20 at 14:36
  • 4
    About the fact that it's manipulating IP addresses: ip's JSON output is meant to be parsed. Along with the jq command you can do this in shell. Eg: ip -4 -json address show vboxnet0 | jq -j '.[] | .addr_info[] | .local, "/", .prefixlen, "\n"' – A.B Aug 27 '20 at 17:35
  • Alternatively, ip -4 -br a s eth0 | awk '{print $3}' – annahri Aug 28 '20 at 08:57
  • 1
    @A.B It seems that points are awarded for terseness, so ip -4 -j a ls vboxnet0 | jq -r '.[] | .addr_info[] | "\(.local)/\(.prefixlen)"' – u1686_grawity Aug 28 '20 at 12:11
  • @user1686 my jq foo is low, but i'm still learning (because of ip's JSON output). – A.B Aug 28 '20 at 12:33

3 Answers3

11

cut takes each and every delimiter as meaningful, even if there are consecutive spaces. This is unlike awk, which by default splits on whitespace, but takes multiple whitespace as only one delimiter, and ignore leading whitespace.

You could get awk to behave similarly by setting the field separator FS to [ ]:

$ ip add show lo | awk -F'[ ]' '$5 == "inet" {print $6}'
127.0.0.1/8

That has to be set like that, as a regexp, since a single space by itself in FS marks the default, special operation.

See, e.g. the GNU awk manual has to say on field separators.

ilkkachu
  • 138,973
6

The line you grep from the output starts with 4 spaces, if you select the fifth field you'll get the first string:

admin:~ # ip address show eth0 | grep 'inet ' | cut -d " " -f 5
inet
eblock
  • 966
  • Yeah, thanks. But why does awk doesn't have this issue? It works with the first argument. – Wolf Aug 27 '20 at 14:29
  • 6
    This is not an issue, it is different behaviour by design. cut uses a single character as a delimiter – exactly one space in your example. For the $x accessors, awk performs a shell-like expansion on the input, giving you access to the "words" separated by at least one white-space character. – Hermann Aug 27 '20 at 14:38
  • 1
    By far the most straightforward way to work around this issue is to pass the input through tr -s ' ' before cut. Counting spaces by hand is annoying, and may be fragile if the utility changes its output. Also, you can use the same tr invocation to collapse tabs and spaces if that turns out to be necessary. – Kevin Aug 28 '20 at 00:45
0

I've created a patch that adds new -m command-line option to cut, which works in the field mode and treats multiple consecutive delimiters as a single delimiter. This basically solves the OP's question in a rather efficient way. I also submitted this patch upstream a couple of days ago, and let's hope that it will be merged into the coreutils project.

There are some further thoughts about adding even more whitespace-related features to cut, and having some feedback about all that would be great. I'm willing to implement more patches for cut and submit them upstream, which would make this utility more versatile and more usable in various real-world scenarios.

dsimic
  • 123