-2

I have a bunch of lines in a file that look like this

word_word_word 0 word_word
word_word_word 1 wordwordword
word word word word 0 word word word word
word 2 word_word_word word word
word word_word 3 word

I want to cat the file and get an output that looks like this:

word_word_word 0
word_word_word 1
etc...

How to I cut/awk/or whatever the line and display everything from the first byte through the first single-digit number?

Thanks!

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255

3 Answers3

0

Using grep:

grep -oP '^.*?\s[0-9](\s|$)' file

The -o option tells grep to print only the match.

The pattern is looking for:

  • The beginning of the line ^
  • Followed by literally anything repeating (non-greedy) .*?
  • Followed by whitespace \s
  • Followed by a single digit between 0 and 9 [0-9]
  • Finally that single digit must be followed by whitespace or the end of the line (\s|$)
jesse_b
  • 37,005
  • You should mention that requires GNU grep for its experimental -P option. – Ed Morton Nov 08 '19 at 23:42
  • Shouldn't it be -oP instead of -oE so that the 'non-greedy' works ? –  Nov 09 '19 at 00:58
  • Per POSIX ? after * or any other repetition metachar is undefined behavior in an ERE so I wouldn't count on it even sort of working in any given grep. – Ed Morton Nov 09 '19 at 13:54
  • It's just choosing to ignore the ? and it'd do the same without the -E. Other greps could behave differently. I was just commenting on your statement that the non-greedy match sort of works with extended regex in grep - YMMV. – Ed Morton Nov 09 '19 at 14:07
0

For loop with awk:

awk '{
  for(i=1;i<=NF;i++){
    if($i ~ /[0-9]/){
      printf "%s\n",$i;
      break
    }
    printf "%s ",$i
  }
}' file

Will get what you want.

徐新晨
  • 101
-1

If this (using any sed in any shell on every UNIX box) isn't all you need then edit your question to provide a better example including lines that this doesn't work for:

$ sed 's/\( [0-9]\) .*/\1/' file
word_word_word 0
word_word_word 1
word word word word 0
word 2
word word_word 3
Ed Morton
  • 31,617
  • This will match the entire line: word word_word 30 word – jesse_b Nov 09 '19 at 13:56
  • The OP doesn't have any lines like that, every line contains a single-digit number. See the posted sample input which is all we have to go on and we don't know what the output SHOULD be if they had a line that didn't contain a single digit since there's no lines like that nor lines that contain no digits nor lines where the digit is at the start/end of the line nor other potential rainy day cases in the posted sample. – Ed Morton Nov 09 '19 at 13:58
  • Right but if there's lines that don't look like that then all bets are off as there's several possibilities we don't know how to handle. Your answer and mine assume a space before and after the digit - what if there isn't and the digit appears as the start or end of the line? You get my point - it's up to the OP to provide truly representative sample input/output as that drives how we write a solution to handle the non sunny day cases and there's no point coding for a bunch of cases that the OP simply doesn't have. – Ed Morton Nov 09 '19 at 14:00
  • I don't have enough rep to vote to close yet apparently as I don't see that button but anyway I don't think it's unclear. The OP has lines that always have a single digit in the middle of them, per the sample input/output provided. Your solution won't display a line where that single digit occurs at the start or end of the line - that's also fine as the OP doesn't have to handle those cases either per the sample input provided. – Ed Morton Nov 09 '19 at 14:04
  • Don't forget to handle start of the line too if you're going to be covering cases the OP hasn't shown us how to handle. Also we don't know if it's always white space around the digit. Maybe it could be a punctuation character like hi there 1. uh oh. What if the single digit is mid-"word"? I'm sure there's lots of other cases the OP hasn't shown us and probably doesn't have that we could write code to cover. – Ed Morton Nov 09 '19 at 14:09
  • I really don't care enough to discuss as we're just guessing at whether or not the OP has different input than shown and, if so, how that should be handled. If the OP has different input than shown then they should update their question to show it, that's all. Meanwhile I'm happy to throw up this simple answer for the case the OP has shown us. – Ed Morton Nov 09 '19 at 14:19