I am using following command to grep character set range for hexadecimal code 0900 (instead of अ) to 097F (instead of व). How I can use hexadecimal code in place of अ and व?
bzcat archive.bz2 | grep -v '<[अ-व]*\s' | tr '[:punct:][:blank:][:digit:]' '\n' | uniq | grep -o '^[अ-व]*$' | sort -f | uniq -c | sort -nr | head -50000 | awk '{print "<w f=\""$1"\">"$2"</w>"}' > hindi.xml
I get the following output:
<w f="399651">और</w>
<w f="264423">एक</w>
<w f="213707">पर</w>
<w f="74728">कर</w>
<w f="44281">तक</w>
<w f="35125">कई</w>
<w f="26628">द</w>
<w f="23981">इन</w>
<w f="22861">जब</w>
...
I just want to use hexadecimal code instead of अ and व in the above command.
If using hexadecimal code is not at all possible , can I use unicode instead of hexadecimal code for character set ('अ-व') ?
I am using Ubuntu 10.04
-v
inverts the match, from your question text it seems that is not what you want. – Christian.K Aug 26 '11 at 06:14