9

I have a string, for example

"Icecream123 AirplaneBCD CompanyTL1 ComputerYU1"

Let's say I know that my string will contain for sure the substring IceCream but I don't know what follows it.

It might be 123 as in my example or it might be something different.

While I can use grep to detect if "Icecream" substring exists in my string with the following command

echo $string | grep -oF 'Icecream';

Which will print

Icecream

I want with a command to get it to print the whole substring, which in my example is

Icecream123

Of course what follows Icecream is random and not known beforehand so I can't just do

$SUBSTRING=$(echo $string | grep -oF 'Icecream')
$SUBSTRINGTRAIL=123
echo $SUBSTRING$SUBSTRINGTRAIL
Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
Sonamor
  • 193
  • 1
  • 2
  • 7
  • is the substring fixed / static -- always "Icecream", or is it variable? – Jeff Schaller Jun 08 '18 at 21:55
  • will a space indicate the end of the desired suffix? – Jeff Schaller Jun 08 '18 at 22:00
  • @JeffSchaller Sadly, I don't know that. I am actually getting a multiline output from another command, which I store in a variable, this variable is my $string, when it get's echoed it displays the multiline output as a signle line with a space between them. I don't actually know if that's a space or a special character such as LF. I thought that it's space. – Sonamor Jun 08 '18 at 22:05
  • I mean, for example, Icecream123 AirplaneBCD you want stopped at 123. Is that because there's a space after the 3, or something else? – Jeff Schaller Jun 08 '18 at 22:07
  • $string is populated by a multiline output, when it's echoed it displays this multiline output as a single line with a space between the lines. Now I don't know if that space is actually a space or the NewLine character which is console it shows up as a space. I hope this makes more sense. – Sonamor Jun 08 '18 at 22:14
  • 1
    If you're not sure what your data is, it's hard to write an appropriate solution. All the answers so far are assuming your data is on one line, like you've shown it. I was trying to figure out what your delimiter was -- where the "trailing" part should stop. – Jeff Schaller Jun 08 '18 at 22:17
  • @JeffSchaller I undestand that, and I actually remembered it when I've read your comment. If you refresh the page you will see that I have already picked up a solution as echo "$string" | grep -oP 'Icecream.*?\b' and 'Icecream\S+' are working :) Thanks for your time – Sonamor Jun 08 '18 at 22:18

5 Answers5

16

If your grep supports perl compatible regular expressions, you could match non-greedily up to the next word boundary:

echo "$string" | grep -oP 'Icecream.*?\b'

Otherwise, match the longest sequence of non-blank characters:

echo "$string" | grep -o 'Icecream[^[:blank:]]*'

Or keep everything in the shell and remove the longest trailing sequence of characters starting with a space:

echo "${string%% *}"
steeldriver
  • 81,074
  • 2
    For the PCRE, I'd use 'Icecream\S+' for some non-blank characters. – glenn jackman Jun 08 '18 at 20:19
  • Thanks for your comments, saddly it seems that my version of grep does not support perl regex.

    Could you add some more details about your third option? I am not quite sure how to implement it.

    – Sonamor Jun 08 '18 at 21:54
  • After some more testing it seems that using either echo "$string" | grep -oP 'Icecream.*?\b' or 'Icecream\S+' it does the job. Thanks – Sonamor Jun 08 '18 at 22:09
  • it's really confusing that although your $string variable is a string you still have to put it between double quotes! – Sonamor Jun 08 '18 at 22:10
  • @Sonamor in this case the quoting is not strictly necessary; however there are so many cases where it is that it's a good habit to get into. See for example When is double-quoting necessary? – steeldriver Jun 08 '18 at 23:09
  • @Sonamor You absolutely need to learn about quoting! It's the most important to know, I think. The point is that many problems with quoting are like that: confusing. Works first, but not with other input. – Volker Siegel Jun 09 '18 at 13:21
8

Since you tagged bash:

[[ $string =~ (Icecream[^ ]*) ]] && result=${BASH_REMATCH[1]}

More generally, for a search term in $search:

[[ $string =~ ($search[^ ]*) ]] && result=${BASH_REMATCH[1]}

... or with parameter expansion:

# remove any leading text up to -and through- the search text:
x=${string##*$search}

# remove any trailing space onwards
result=$search${x%% *}
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
7

Using a grep that knows about -o:

$ printf '%s\n' "$string" | grep -o '\<Icecream[^[:blank:]]*'
Icecream123

The pattern \<Icecream[^[:blank:]]* matches the string Icecream (where the I is preceded by a non-word character, or the start of the line) followed by zero or more non-blanks (not spaces or tabs).


Using awk:

$ printf '%s\n' "$string" | awk -v RS=' ' '/^Icecream/'       
Icecream123

The awk program divides the string into space-separated records, and tests each one. It will print the ones that start with the string Icecream.

Using mawk or GNU awk, you may also use

printf '%s\n' "$string" | awk -v RS='[[:blank:]]' '/^Icecream/'

since they interpet RS as a regular expression if it contains more than one character.


With sed, in a similar fashion as with grep:

$ printf '%s\n' "$string" | sed 's/.*\(\<Icecream[^[:blank:]]*\).*/\1/'
Icecream123

Using /bin/sh:

set -- Icecream123 AirplaneBCD CompanyTL1 ComputerYU1
for string; do
    case $string in
        Icecream*)
            printf '%s\n' "$string"
            break
    esac
done

Perl (with a little help from tr):

$ printf '%s\n' "$string" | tr ' ' '\n' | perl -ne '/Icecream\S*/ && print'
Icecream123

or just

$ printf '%s\n' "$string" | perl -ne '/(Icecream\S*)/ && print $1, "\n"'
Icecream123
Kusalananda
  • 333,661
  • Or, split into lines and match the key: echo "$string" | grep -o '\S\+' | grep "Icecream" –  Jun 08 '18 at 22:34
2

For example, if you use GNU grep:

$ echo "Icecream123 AirplaneBCD CompanyTL1 ComputerYU1" | grep -oP '\bIcecream.*?(\s|$)' --color

It uses PCRE.

1

A little bit simpler perhaps, especially since you say that your version of grep does not support perl regex:

$ echo $string | tr ' ' '\n' | grep 'Icecream' Icecream123

The tr splits the string into lines by replacing all the spaces with newlines. Then you can use grep easily.

You can also write the following to obtain only what follows the word you are looking for:

$ echo $string | tr ' ' '\n' | sed -n 's/Icecream//p' 123

Law29
  • 1,156