How do I use not in rx-to-string?

Question

I just started working with rx and was able to achieve the opposite of the regex I wanted:

(rx-to-string `(: (or
  (: line-start (not ,comment-start))
  (: line-start (zero-or-more whitespace) line-end))))

Unfortunately, notting this regex was not as straightforward as I had hoped. I have tried many different variations with not, none of them working.

Would appreciate it if you could also show how to invert this regex too, so people looking up this q will get a simple example first:

(rx-to-string `(: line-start ,comment-start))

Thanks a bunch!

Whether you can even use `not` there is going to very much depend on the value of `comment-start`. How about showing us the specific value you're interested in, rather than a variable which may or may not be valid. — phils, Apr 23 '16 at 06:43
The notion of negation in regexps is not straightforward. In theory, regular expressions include or/and/not, but in practice most implementations only include `or` among those three. And this `not` doesn't really mean what you want anyway: instead of meaning "make sure there's no way to match RX", it means "try to find a way to fail to match RX". — Stefan, Apr 25 '16 at 12:49

phils · Accepted Answer · 2016-04-23T06:53:43.043

5

The documented possible uses of not are^*:

(not (any SET ...))
     matches any character not in SET ...
(not (syntax SYNTAX))
     matches a character that doesn't have syntax SYNTAX.
(not (category CATEGORY))
     matches a character that doesn't have category CATEGORY.

All of which operate on a single character, whereas comment-start is a string of arbitrary length (or nil), so that's not something you can necessarily use directly.

^* There's also (not word-boundary) which is equivalent to not-word-boundary, and seems to be something of an anomaly. The latter is more consistent with the other rx forms.

edited Apr 23 '16 at 06:53

answered Apr 23 '16 at 06:47

phils

48,657
3
76
115

1

The point is that comment-start changes depending on mode. That is why I need to use rx-to-string. I want to find the point of the first line that does not start with a comment char or empty line :) Will need to think about this some more. – The Unfun Cat Apr 23 '16 at 07:39
`(while (comment-forward))` ? – phils Apr 23 '16 at 08:11
I knew there must be an easier way to do it. But when starting to learn a lang I always do things in the hardest way possible for some reason. – The Unfun Cat Apr 23 '16 at 13:43
Regular expressions are the wrong tool for *not* matching things. This is why some languages have picked up extensions to them that allow matching for everything that does *not* look like a certain word, such as Perl (but not Emacs Lisp). You'll have an easier time changing the code to contain that logic instead of encoding it as regex. – wasamasa Apr 23 '16 at 15:45
@wasamasa Thanks. Your comment is also very valuable. I'd upvote it if it were made into an answer. – The Unfun Cat Apr 24 '16 at 07:57
I'll consider doing that, would need to demonstrate how the actually negated regex would look like in a language supporting that mechanism as opposed to a language not doing that (for the answer to be more than a comment). – wasamasa Apr 24 '16 at 09:34
@phils Sorry, what documentation? Did you unearth the fabled `rx` doc? – yPhil Aug 04 '17 at 01:20
I unearthed the fabled `C-h f rx` – phils Aug 04 '17 at 01:29

score 4 · Answer 2 · answered Apr 24 '16 at 22:16

rx-to-string is an alternate concrete syntax for regular expressions. It translates its argument to a regular expression piece by piece. There is no negation operator in regular expressions, so there is no general negation operator in rx-to-string either. The not operator only recognizes a few specific constructs; for example, character sets can be negated easily, e.g. [abcdef] to [^abcdef] and vice versa, so rx supports not on character sets. Similarly \sX can be negated to \SX, \b to \B, etc.

In principle, it would be possible to implement not, since the complement of the language recognized by a regular expression can also be recognized by a regular expression. However, this requires a complete structural change in the regular expression, and in general the size of the regular expression for the negation is exponential in size compared to the original expression, so this may require a very long calculation and a large amount of memory, in addition to the coding effort. This is why regular expression engines don't provide negation, or only a restricted form of it (e.g. Perl's negative lookahead and lookbehind assertions).

The usual workaround when you need to negate something is to put the part you want to match in a group, and use code to analyze the matched group afterwards.

(if (string-match "stuff and \\(.*\\) and more" mystring)
    (let ((could-be-anything (match-group 1)))
      (if (not (save-match-data (string-match "not this" mystring))
          …)))

How do I use not in rx-to-string?

2 Answers2