1

I'm trying to use replace-regexp to change the following code

samples <- rep(10, 5)*1000  ## steps determined by samples * thin
adaptive.ratio <- c(rep(c(1,0), 2), 0) ## Alternate between adaptive and non-adaptive
...

to

print(paste0("samples: ", samples)) ## steps determined by samples * thin
print(paste0("adaptive.ratio: ", adaptive.ratio)) ## Alternate between adaptive and non-adaptive
...

I am using the following regexp to do so

^\(\b.*\b\)\([^#]*\) → print(paste0("\1: ", \1) ) \2

But this highlights

samples <- rep(10, 5)*1000  ## steps determined by samples * thin
adaptive.ratio <- c(rep(c(1,0), 2), 0) 

for the first replacement rather than what I want, which is to highlight only

print(paste0("samples: ", samples)) ## steps determined by samples * thin

I've tried escaping the # sign with #, adding a space between the two sets of parentheses, but that doesn't help.

I realize this is somehow related to the greedy nature of regexp matching, but I don't understand what am I doing wrong nor, as a result, how to fix it.

mikemtnbikes
  • 223
  • 2
  • 9
  • I think I know my issue with `\b` matching more than the first set of characters, it's due to the `.*` in between them which should match anything, including things that aren't word boundaries. I still don't understand, however, why the matching goes beyond the the new line – mikemtnbikes Dec 15 '21 at 20:37
  • Greedy `.*` matches the maximum. Non-greedy `.*?` matches the minimum. Although you would want `\b.+?\b` rather than `\b.*?\b` to avoid matching the same `\b` twice. – phils Dec 15 '21 at 21:26
  • As you've already established, while `.` does not match newlines, `[^#]` does. – phils Dec 15 '21 at 21:31
  • Try `M-x re-builder` btw. It's a good visualiser, especially when the regexp contains groups. See `M-x finder-commentary RET re-builder` for a description, and https://emacs.stackexchange.com/q/5568 may be useful reading as well. – phils Dec 15 '21 at 21:41

2 Answers2

1

Okay, I figured it out. I was misusing the word boundary markers (which I don't understand how so a comment or two would be great.)

This is the code I want to use

^\([A-z0-9_.]+\) \([^# <C-q 012>]*\) → print(paste0("\1: ", \1)  

Where <C-q 012> inserts the new line character.

mikemtnbikes
  • 223
  • 2
  • 9
0

Change your regexp to exclude not only # chars but also newline chars.

^\(\b.*\b\)\([^#
]*\) → print(paste0("\1: ", \1)) \2

That's what you type to interactively type the regexp, where the newline shown is typed using C-q C-j.

If you're defining the regexp in Lisp code, as a Lisp string, use this:

^\\(\\b.*\\b\\)\\([^#\n]*\\) → print(paste0("\\1: ", \\1)) \\2
Drew
  • 75,699
  • 9
  • 109
  • 225