4

Another regex that I can't seem to crack :(

I tried with egrep '([qwrtzpsdfghjklxcvbnmy]{1})|([qwrtzpsdfghjklxcvbnmy]{3})|([qwrtzpsdfghjklxcvbnmy]{5})|([qwrtzpsdfghjklxcvbnmy]{7})' greek.txt

However this also returns words with 4 consonants, I do not understand why. Why is it matching words with 4 vowels as well?

So this is my greek.txt :

alpha
beta
gamma
delta
epsilon
zeta
eta
theta
iota
kappa
lambda
mu
nu
xi
omicron
pi
rho
sigma
tau
upsilon
phi
chi
psi
omega

So alpha is ok ( l p h = 3 ), beta isn't ( b t = 2) , gamma is ok (g m m = 3), delta is ok (d l t =3 ), etc.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255

1 Answers1

6

[abcde]{3} matches three consecutive characters in the set abcde. So you're looking for lines containing one, three, five or seven consecutive characters in that set of yours, which is equivalent to looking for lines with a one of these characters.

To look for Greek letters, the first step is to have a pattern match Greek letters, not Latin letters. To look for lines containing at least 13 Greek letters, look for 13 occurrences of the pattern “a Greek letter followed by some other stuff”. Here's a pattern that looks for lowercase unadorned Greek letters only:

<greek.txt grep -E '([αβγδεζηθικλμνξοπρςστυφχψω].*){13}'

If you want lines containing exactly 13 lowercase unadorned Greek letters (plus some other stuff that isn't a luGl), filter the results to eliminate lines containing 14 luGl.

<greek.txt grep -E '([αβγδεζηθικλμνξοπρςστυφχψω].*){13}' |
grep -v -E '([αβγδεζηθικλμνξοπρςστυφχψω].*){14}'

If you want lines containing exactly 13 luGl and no other character:

<greek.txt grep -x -E '[αβγδεζηθικλμνξοπρςστυφχψω]{13}'

Now if you want an even number of consonants, look for lines consisting of “something that doesn't contain any consonant followed by an even number of (a consonant followed by something that doesn't contain any consonant)”. For an odd number, add another occurrence of that last subpattern.

cons="βγδζθκλμνξπρςστφχψω"
<greek.txt grep -E "^[^$cons]*([$cons][^$cons]*[$cons][^$cons]*)*[$cons][^$cons]"

In Perl, you can match a Greek letter with the pattern \p{Greek}, and a lowercase letter in any alphabet with the pattern \p{Ll}. To look for a lowercase Greek letter, look for (?=\p{Ll})\p{Greek}. You must run your script under Unicode semantics; the easiest way to do this is to run it with the -C option. () = m/REGEXP/g is a Perl idiom to count the number of matches.

<greek.txt perl -C -l -ne 'print if (() = m/(?:(?=\p{Ll})\p{Greek})/g) == 13'

There's no built-in way to match Greek vowels, so a Perl solution to the second part of your problem will have to match them explicitly.

  • 1
    I edited my answer to make it more clear, I wasn't really looking for REAL greek letters :x

    I feeld bad because you put so much effort in this :/

    – Lucas Kauffman Aug 09 '11 at 08:54
  • Upvote!! I love this answer. It is like someone asked what is the meaning of life, you went found "Q" came back with the equation to create the universe from a bowl of Jell-o and the person asking says I was looking for the number 42 – 2bc Mar 26 '12 at 21:51