Matching non-# in Regexp

Question

I'm trying to use replace-regexp to change the following code

samples <- rep(10, 5)*1000  ## steps determined by samples * thin
adaptive.ratio <- c(rep(c(1,0), 2), 0) ## Alternate between adaptive and non-adaptive
...

to

print(paste0("samples: ", samples)) ## steps determined by samples * thin
print(paste0("adaptive.ratio: ", adaptive.ratio)) ## Alternate between adaptive and non-adaptive
...

I am using the following regexp to do so

^\(\b.*\b\)\([^#]*\) → print(paste0("\1: ", \1) ) \2

But this highlights

samples <- rep(10, 5)*1000  ## steps determined by samples * thin
adaptive.ratio <- c(rep(c(1,0), 2), 0)

for the first replacement rather than what I want, which is to highlight only

print(paste0("samples: ", samples)) ## steps determined by samples * thin

I've tried escaping the # sign with #, adding a space between the two sets of parentheses, but that doesn't help.

I realize this is somehow related to the greedy nature of regexp matching, but I don't understand what am I doing wrong nor, as a result, how to fix it.

I think I know my issue with `\b` matching more than the first set of characters, it's due to the `.*` in between them which should match anything, including things that aren't word boundaries. I still don't understand, however, why the matching goes beyond the the new line — mikemtnbikes, Dec 15 '21 at 20:37
Greedy `.*` matches the maximum. Non-greedy `.*?` matches the minimum. Although you would want `\b.+?\b` rather than `\b.*?\b` to avoid matching the same `\b` twice. — phils, Dec 15 '21 at 21:26
As you've already established, while `.` does not match newlines, `[^#]` does. — phils, Dec 15 '21 at 21:31
Try `M-x re-builder` btw. It's a good visualiser, especially when the regexp contains groups. See `M-x finder-commentary RET re-builder` for a description, and https://emacs.stackexchange.com/q/5568 may be useful reading as well. — phils, Dec 15 '21 at 21:41

score 1 · Answer 1 · answered Dec 15 '21 at 19:46

1

Okay, I figured it out. I was misusing the word boundary markers (which I don't understand how so a comment or two would be great.)

This is the code I want to use

^\([A-z0-9_.]+\) \([^# <C-q 012>]*\) → print(paste0("\1: ", \1)

Where <C-q 012> inserts the new line character.

answered Dec 15 '21 at 19:46

mikemtnbikes

223
2
9

1

Interactively, you can just use `C-q C-j` to insert a newline char. – Drew Dec 15 '21 at 19:52
See `C-h i g (elisp)Regexp Backslash` regarding what `\b` matches. – phils Dec 15 '21 at 21:28

score 0 · Answer 2 · answered Dec 15 '21 at 19:48

Change your regexp to exclude not only # chars but also newline chars.

^\(\b.*\b\)\([^#
]*\) → print(paste0("\1: ", \1)) \2

That's what you type to interactively type the regexp, where the newline shown is typed using C-q C-j.

If you're defining the regexp in Lisp code, as a Lisp string, use this:

^\\(\\b.*\\b\\)\\([^#\n]*\\) → print(paste0("\\1: ", \\1)) \\2

Matching non-# in Regexp

2 Answers2