How to find a position of a character?

Question

I need to identify the postion of a character in string.

Example, the string is RAMSITALSKHMAN|1223333.

grep -n '[^a-zA-Z0-9\$\~\%\#\^]'

How do I find the position of | in the given string?

@Braiam I'm sure other solutions would be appreciated, like python, perl, etc. — Nathan majicvr.com, May 14 '20 at 04:18

runejuhl · Answer 1 · 2014-09-02T19:47:39.930

35

You can use -b to get the byte offset, which is the same as the position for simple text (but not for UTF-8 or similar).

$ echo "RAMSITALSKHMAN|1223333" | grep -aob '|'
14:|

In the above, I use the -a switch to tell grep to use the input as text; necessary when operating on binary files, and the -o switch to only output the matching character(s).

If you only want the position, you can use grep to extract only the position:

$ echo "RAMSITALSKHMAN|1223333" | grep -aob '|' | grep -oE '[0-9]+'
14

If you get weird output, check to see if grep has colors enabled. You can disable colors by passing --colors=never to grep, or by prefixing the grep command with a \ (which will disable any aliases), e.g.:

$ echo "RAMSITALSKHMAN|1223333" | grep -aob '|' --color=never | \grep -oE '^[0-9]+'
14

For a string that returns multiple matches, pipe through head -n1 to get the first match.

Note that I use both in the above, and note that the latter will not work if grep is "aliased" through an executable (script or otherwise), only when using aliases.

edited Sep 02 '14 at 19:47

answered Sep 02 '14 at 18:18

runejuhl

583

3

Now search for 2 ;) – Izkata Sep 02 '14 at 19:38
Thanks @Izkata, you're right. I've updated my post a tiny bit and added the missing hat ^ :) – runejuhl Sep 02 '14 at 19:48
2

Which version of grep did you use? I get 0:| as output-- because 0 is the byte position of the beginning of the line where | is found. – Alex May 25 '17 at 14:20
@Alex GNU grep from Debian stretch: grep (GNU grep) 2.27. Are you perhaps using OS X? – runejuhl May 26 '17 at 12:03
1

Can confirm - macos reports 0:| – Brian May 03 '20 at 15:55

score 13 · Answer 2 · answered Sep 02 '14 at 17:51

If you're using the bash shell, you can use purely built-in operations without the need for spawning external processes such as grep or awk:

$ str="RAMSITALSKHMAN|1223333"
$ tmp="${str%%|*}"
$ if [ "$tmp" != "$str" ]; then
> echo ${#tmp}
> fi
14
$

This uses a parameter expansion to remove all occurrences of | follows by any string and save that in a temporary variable. It is then just a matter of measuring the length of the temporary variable to get the index of |.

Note the if is checking if the | exists at all in the original string. If it doesn't then the temporary variable will be the same as the orginal.

Note also this provides the zero-based index of | which is generally useful when indexing bash strings. However if you require the one-based index, then you can do this:

$ echo $((${#tmp}+1))
15
$

probably the best answer, this syntax is beautiful and so fast and easy to use when you understand its meaning, long live to the core — vdegenne, Dec 30 '16 at 22:49
It's also broken in the presence of special unescaped characters, which is usually why people start asking for 'positions of literal text' and not 'substrings'. — i30817, Dec 12 '21 at 20:20

cuonglm · Answer 3 · 2014-09-02T20:15:15.947

11

Try:

printf '%s\n' 'RAMSITALSKHMAN|1223333.' | grep -o . | grep -n '|'

output:

15:|

This will give you the position with index based-1.

edited Sep 02 '14 at 20:15

answered Sep 02 '14 at 14:54

cuonglm

153,898

1

Its not working :( – user82782 Sep 02 '14 at 14:59
1

@user82782: What command did you run? How you know it didn't work? – cuonglm Sep 02 '14 at 15:00
printf '%s\n' '|' | grep -o . | grep -n '|' prints 1, not 0 as expected. – l0b0 Sep 02 '14 at 19:18
1

@l0b0: The OP does not tell he wanted index base 0 or 1. – cuonglm Sep 02 '14 at 19:36
I just mean what a software developer would expect. – l0b0 Sep 02 '14 at 19:48
@l0b0 You can always pass it to the $(( 'command' - 1)) to change it to index 0. – Alex May 25 '17 at 14:22

score 5 · Answer 4 · edited Sep 02 '14 at 16:12

5

You can use awk's index function to return the position in characters where the match occurs:

echo "RAMSITALSKHMAN|1223333"|awk 'END{print index($0,"|")}'
15

If you don't mind using the Perl's index function, this handles reporting zero, one or more occurrences of a character:

echo "|abc|xyz|123456|zzz|" | \
perl -nle '$pos=-1;while (($off=index($_,"|",$pos))>=0) {print $off;$pos=$off+1}'

For readability, only, the pipeline has been split across two lines.

As long as the target character is found, index returns a positive value based at zero (0). Hence, the string "abc|xyz|123456|zzz|" when parsed returns positions 0, 4, 8, 15 and 19.

edited Sep 02 '14 at 16:12

cuonglm

153,898

answered Sep 02 '14 at 14:57

JRFerguson

14,740

for this use, awk is more usefull/easy than grep. – Archemar Sep 02 '14 at 15:02
This only print the first position, won't work with string like RAMSITALSKHMAN|1|223333 – cuonglm Sep 02 '14 at 15:05

bluefoggy · Answer 5 · 2014-09-02T15:10:45.033

3

We can also do it using "expr match" or "expr index"

expr match $string $substring where $substring is a RE.

echo `expr match "RAMSITALSKHMAN|1223333" '[A-Z]*.|'`

And above will give you the position because it returns the length of the substring matched.

But to be more specific for searching index :

mystring="RAMSITALSKHMAN|122333"
echo `expr index "$mystring" '|'`

edited Sep 02 '14 at 15:10

answered Sep 02 '14 at 14:58

bluefoggy

662

I don't have enough reputation for commenting anywhere else. I personally liked answer given by @Gnouc . However why to use awk and make it complex when we can do simple things using 'expr' – bluefoggy Sep 02 '14 at 15:29
@kingsdeb it's just a suggestion. – Avinash Raj Sep 02 '14 at 16:08
@kingsdeb: Because (1) the awk solutions can trivially be modified for report this information on every line of a file (all you have to do is remove the END, which was never really necessary, from JRFerguson’s answer, and Avinash Raj’s does it already); whereas, to do that with the expr solution, you would need to add an explicit loop (and Gnouc’s answer is not easily adaptable to do that at all, that I can see), and (2) the awk solutions can be adapted to report all the matches in each line somewhat more easily than the expr solution (in fact, Avinash Raj’s does that already, too). – G-Man Says 'Reinstate Monica' Sep 02 '14 at 17:37
Why would you use echo \...`` here? – Stéphane Chazelas Sep 03 '14 at 11:29
This is to just show the output here – bluefoggy Sep 03 '14 at 12:13

score 3 · Answer 6 · answered Sep 02 '14 at 15:38

3

Another awk command,

$ echo 'RAMSITALSKHMAN|1223333'| awk 'BEGIN{ FS = "" }{for(i=1;i<=NF;i++){if($i=="|"){print i;}}}'
15

By setting the Field separator as null string, awk turns individual character in the record as separate fields.

answered Sep 02 '14 at 15:38

Avinash Raj

3,703

score 1 · Answer 7 · answered Sep 03 '14 at 11:06

some alternatives include:

similar to Gnouc's answer, but with the shell:

echo 'RAMSITALSKHMAN|1223333' |
tr -c \| \\n | 
sh

sh: line 15: syntax error near unexpected token `|
sh: line 15: `|'

with sed and dc possibly spanning multiple lines:

echo 'RAMSITALSKHMAN|1223333' |
sed 's/[^|]/1+/g;s/|/p/;1i0 1+' |dc

15

with $IFS...

IFS=\|; set -f; set -- ${0+RAMSITALSKHMAN|1223333}; echo $((${#1}+1))

That will also tell you how many there are like...

echo $(($#-1))

score 0 · Answer 8 · answered May 14 '20 at 05:37

0

Python answer

text='skfwlefk|3oeio|ajda'
print([idx for idx, char in enumerate(text) if char == '|'])
# prints '[8, 14]'

answered May 14 '20 at 05:37

tinnick

300
2
10

How to find a position of a character?

8 Answers8

Linked