14

So I have a line:

ID: 54376

Can you help me make a regex that would only return numbers without "ID:"?

NOTE: This string is in a file.

slm
  • 369,824

7 Answers7

20

Try this:

grep -oP '(?<=ID: )[0-9]+' file

or:

perl -nle 'print $1 if /ID:.*?(\d+)/' file
terdon
  • 242,166
cuonglm
  • 153,898
7

There are many ways of doing this. For example:

  1. Use GNU grep with recent PCREs and match the numbers after ID: :

    grep -oP 'ID:\s*\K\d+' file
    
  2. Use awk and simply print the last field of all lines that start with ID:

    awk '/^ID:/{print $NF}' file
    

    That will also print fields that are not numbers though, to get numbers only, and only in the second field, use

    awk '($1=="ID:" && $2~/^[0-9]+$/){print $2}' file
    
  3. Use GNU grep with Extended Regular Expressions and parse it twice:

    grep -Eo '^ID: *[0-9]+' file | grep -o '[0-9]*'
    
terdon
  • 242,166
  • Thanks! What \K is doing in first example? – rnd_d May 14 '15 at 14:26
  • 2
    @rnd_d it's a Perl Compatible Regular Expressions (PCRE) construct which means "ignore anything matched up to this point". It is used like a lookbehind, it let's me use -o to print only the matched portion but also discard things I'm not interested in. Compare echo "foobar" | grep -oP "foobar" and echo "foobar" | grep -oP 'foo\Kbar' – terdon May 14 '15 at 15:27
4

Use egrep with -o or grep with -Eo option to get only the matched segment. Use [0-9] as regex to get just numbers:

grep -Eo [0-9]+ filename
4
sed -n '/ID: 54376/,${s/[^ 0-9]*//g;/./p}'

That will print only all numbers and spaces occurring after ID: 54376 in any file input.

I've just updated the above a little to make it a little faster with * and not to print blank lines after removing the non-{numeric,space} characters.

It addresses lines from regex /ID: 54376/ ,through the $last and on them s///removes all or any *characters ^not [^ 0-9]* then prints /any/ line with a .character remaining.

DEMO:

{
echo line 
printf 'ID: 54376\nno_nums_or_spaces\n'
printf '%s @nd 0th3r char@cter$ %s\n' $(seq 10)
echo 'ID: 54376'
} | sed -n '/ID 54376/,${s/[^ 0-9]*//g;/./p}'

OUTPUT:

 54376
1  03  2
3  03  4
5  03  6
7  03  8
9  03  10
 54376
mikeserv
  • 58,310
1

Using sed:

{
    echo "ID: 1"
    echo "Line doesn't start with ID: "
    echo "ID: Non-numbers"
    echo "ID: 4"
} | sed -n '/^ID: [0-9][0-9]*$/s/ID: //p'

The -n is "don't print anything by default", the /^ID: [0-9][0-9]*$/ is "for lines matching this regex" (starts with "ID: ", then 1 or more digits, then end of line), and the s/ID: //p is of the form s/pattern/repl/flags - s means we're doing a substitute, to replace the pattern "ID: " with replacement text "" (empty string) using the p flag, which means "print this line after doing the substitution".

Output:

1
4
godlygeek
  • 8,053
0

Another GNU sed command,

sed -nr '/ID: [0-9]+/ s/.*ID: +([0-9]+).*/\1/p' file

It prints any number after ID:

Avinash Raj
  • 3,703
  • You really don't need the +. If the difference between one character and 3 characters is your script may not work in all seds you should probably do: sed -n '/ID: \([0-9][0-9]*\).*/{s//\1/;s/.*[^0-9]//;/./p}'. Your answer also misses the first ID: [0-9] on a line containing two occurrences of ID: [0-9]. – mikeserv May 25 '14 at 04:02
0

Use grep + awk :

  grep "^ID" your_file | awk {'print $2'}

Bonus : easy to read :)

lily
  • 1