So I have a line:
ID: 54376
Can you help me make a regex that would only return numbers without "ID:"?
NOTE: This string is in a file.
So I have a line:
ID: 54376
Can you help me make a regex that would only return numbers without "ID:"?
NOTE: This string is in a file.
There are many ways of doing this. For example:
Use GNU grep
with recent PCREs and match the numbers after ID:
:
grep -oP 'ID:\s*\K\d+' file
Use awk
and simply print the last field of all lines that start with ID:
awk '/^ID:/{print $NF}' file
That will also print fields that are not numbers though, to get numbers only, and only in the second field, use
awk '($1=="ID:" && $2~/^[0-9]+$/){print $2}' file
Use GNU grep with Extended Regular Expressions and parse it twice:
grep -Eo '^ID: *[0-9]+' file | grep -o '[0-9]*'
-o
to print only the matched portion but also discard things I'm not interested in. Compare echo "foobar" | grep -oP "foobar"
and echo "foobar" | grep -oP 'foo\Kbar'
– terdon
May 14 '15 at 15:27
Use egrep
with -o
or grep
with -Eo
option to get only the matched segment. Use [0-9]
as regex to get just numbers:
grep -Eo [0-9]+ filename
sed -n '/ID: 54376/,${s/[^ 0-9]*//g;/./p}'
That will print only all numbers and spaces occurring after ID: 54376
in any file input.
I've just updated the above a little to make it a little faster with *
and not to p
rint blank lines after removing the non-{numeric,space} characters.
It addresses lines from regex /ID: 54376/
,
through the $
last and on them s///
removes all or any *
characters ^
not [^ 0-9]*
then p
rints /
any/
line with a .
character remaining.
{
echo line
printf 'ID: 54376\nno_nums_or_spaces\n'
printf '%s @nd 0th3r char@cter$ %s\n' $(seq 10)
echo 'ID: 54376'
} | sed -n '/ID 54376/,${s/[^ 0-9]*//g;/./p}'
54376
1 03 2
3 03 4
5 03 6
7 03 8
9 03 10
54376
Using sed:
{
echo "ID: 1"
echo "Line doesn't start with ID: "
echo "ID: Non-numbers"
echo "ID: 4"
} | sed -n '/^ID: [0-9][0-9]*$/s/ID: //p'
The -n
is "don't print anything by default", the /^ID: [0-9][0-9]*$/
is "for lines matching this regex" (starts with "ID: ", then 1 or more digits, then end of line), and the s/ID: //p
is of the form s/pattern/repl/flags
- s
means we're doing a substitute, to replace the pattern "ID: "
with replacement text ""
(empty string) using the p
flag, which means "print this line after doing the substitution".
Output:
1
4
Another GNU sed command,
sed -nr '/ID: [0-9]+/ s/.*ID: +([0-9]+).*/\1/p' file
It prints any number after ID:
+
. If the difference between one character and 3 characters is your script may not work in all sed
s you should probably do: sed -n '/ID: \([0-9][0-9]*\).*/{s//\1/;s/.*[^0-9]//;/./p}'
. Your answer also misses the first ID: [0-9]
on a line containing two occurrences of ID: [0-9]
.
– mikeserv
May 25 '14 at 04:02
Use grep + awk :
grep "^ID" your_file | awk {'print $2'}
Bonus : easy to read :)
grep
if you're using awk
. awk '/^ID/ { print $2 }'
does the same thing, and avoids grep line-buffering issues. It's also pretty much the same as one of the solutions in @terdon's answer.
– cas
May 12 '16 at 13:02
-o
and-P
are GNU extensions togrep
.-o
works on the BSD's as well. PCRE support with-P
is not always compiled in either. – Matt May 25 '14 at 10:06