0

How can I specify a regex pattern which matches any line without

  • "lecture" anywhere in it?

  • ".pdf" at the end?

Aaron Hall
  • 419
  • 1
  • 6
  • 20
Tim
  • 4,987
  • 7
  • 31
  • 60
  • 2
    With a true regular expression you cannot. What is it that you are really trying to do with such lines (or with the other, non-matching lines)? Tell us that and we can likely suggest how to accomplish what you want. (For example, if you just want to remove such lines then see commands `flush-ilnes` and `keep-lines`.) Typically, if you need the opposite of what a regexp matches you find what it matches and subtract that from the original search space to get the complement. But just what you're trying to do makes a difference in how you might want to proceed. – Drew Oct 01 '18 at 21:53
  • I want to remove lines without "lecture" anywhere in them. I want to remove lines without ".pdf" at the end of them. I was thinking of matching such lines and replace them with empty using `regex-replace`. – Tim Oct 01 '18 at 22:08
  • @phils WHen removing a line, not just remove the content of the line, but also the new line character at the end. – Tim Oct 01 '18 at 22:57
  • 1
    It's `M-x keep-lines` which you're wanting. This lets you specify the pattern `lecture\|\.pdf$` for the lines you want to keep, and all non-matching lines are deleted. – phils Oct 01 '18 at 23:10
  • @phils That last comment should be an answer instead. – Omar Oct 01 '18 at 23:34
  • The *stated question* isn't a duplicate, though (even though it precisely solves what turned out to be the actual problem). – phils Oct 02 '18 at 00:22
  • 1
    FWIW, it *is* possible to do it with a true regular expression. The problem is that this regexp is monstruous. You can construct it by turning `.*lecture.*` into a DFA, then negating that DFA, and then turning that negated DFA back into a regexp. – Stefan Oct 02 '18 at 02:41
  • @Stefan In principle, what kinds of patterns are not direct or easy to be represented in regex, and what are? – Tim Oct 02 '18 at 02:45

1 Answers1

1

For your actual problem, per the comments, M-x keep-lines is what you're looking for, and in your case you would keep lines matching the regexp lecture\|\.pdf$


As Drew comments, you can't do what you've actually asked for with a regular expression in Emacs, as PCRE-style arbitrary zero-width look-ahead assertions are not available.

If the pattern is anchored to the beginning of the string, then it becomes practical (but still not easy) to match "strings which do not start with X". The following is a regexp which matches lines which do not start with lecture (and are not, in their entirety, a prefix of that word):

^\([^l]\|l[^e]\|le[^c]\|lec[^t]\|lect[^u]\|lectu[^r]\|lectur[^e]\).*

You get the idea.

In rare cases that's useful, but more often you would do what Drew outlined:

Typically, if you need the opposite of what a regexp matches you find what it matches and subtract that from the original search space to get the complement.


You might also find this trick interesting:

M-x query-replace-regexp and replace .* with:
\,(if (string-match-p "lecture\\|\\.pdf$" \&) \& "")

Note that this is matching and replacing all lines -- but the ones which match the embedded regexp are 'replaced' with identical text, and the ones which don't are replaced with an empty string.

Adding a newline C-qC-j to the end of that .* search pattern would produce the same textual result as keep-lines with lecture\|\.pdf$

phils
  • 48,657
  • 3
  • 76
  • 115
  • Thanks. In principle, what kinds of patterns are not direct or easy to be represented in regex, and what are? – Tim Oct 02 '18 at 02:45
  • I'm not sufficiently familiar with the theory to be a good person to answer that properly. Trying to use regexp to parse HTML or other arbitrary balanced forms is one [classic](https://stackoverflow.com/a/1732454) mistake; and as already mentioned you can't have arbitrary zero-width look-ahead or look-behind assertions like PCRE provides, so you need to be consuming all the text that you're matching. I feel that the kinds of things you *can* match are indicated fairly well by the manual: `C-h i g (elisp)Syntax of Regexps` – phils Oct 02 '18 at 03:49