4

If I try and use emacs C-u align-regexp on the below with \(\s-*\)| and 1 and 1 and y I get an incorrectly formatted result:

Foobar Foobar foobar| foobar foobar |
         ||

Incorrectly formatted result:

Foobar Foobar foobar                  | foobar foobar |
                                      |               |

Note the large amount of whitespace before the |.

using a non C-u modified align-regexp works as expected.

Can anyone explain what's going on? Can someone also explain the dash in the \(\s-*\) part of the regexp? Thanks in advance.

Gracjan Polak
  • 1,082
  • 6
  • 21
Mike H-R
  • 265
  • 1
  • 8
  • Good question! The excessive whitespace doesn't make sense. I can reproduce it, but I can't explain it. By the way, `\(\s-*\)` is a regular expression subgroup (bounded by `\(` and `\)`), and matching on zero or more whitespace characters: `\s-` matches whitespace, and `*` matches zero or more. Details are in the [manual](https://www.gnu.org/software/emacs/manual/html_node/emacs/Regexp-Backslash.html#Regexp-Backslash) – Tyler Apr 19 '16 at 02:52
  • Thanks @Tyler. Ahhh, I guess I'm too used to PCREs as I always think of `\s` as being whitespace rather than `\s-`. – Mike H-R Apr 19 '16 at 09:02
  • One note, I have spacemacs, and the function `spacemacs/align-repeat-bar` does the correct thing. – Mike H-R Apr 19 '16 at 10:16
  • Another data point: I can reproduce in a text buffer, but I get the correct behavior when I try it in the `*scratch*` buffer. – nispio Apr 21 '16 at 17:38

1 Answers1

3

It turns out that the behavior of a given regexp is dependent upon the syntax table of the buffer in which it is being run. So in your case, the \s- matches any character that is part of the whitespace syntax class in that buffer. If the newline characters is part of that syntax class, then you have the possibility of getting matches that span multiple lines. Unfortunately, there seems to be a built-in assumption to the function align-region that matches will be confined to a single line.

The following function provides a wrapper around align-regexp which creates a temporary syntax table, if needed, based on the current syntax table of the buffer. It then modifies the temporary table to make sure that the newline character is not part of the whitespace syntax class:

;; Source: http://emacs.stackexchange.com/a/21776/93
(defun my-align-regexp ()
  "Wrapper around align-regexp which works around issues that
can occur when newlines are included in the whitespace syntax
class. [bug #23339]"
  (interactive)
  (setq this-command 'align-regexp)
  (if (eq ?\s (char-syntax ?\n))
      (let ((table (copy-syntax-table (syntax-table))))
        (modify-syntax-entry ?\n ">" table)
        (with-syntax-table table
          (call-interactively 'align-regexp)))
    (call-interactively 'align-regexp)))

I assume you could achieve the same result with advice, but I tend to avoid advice unless it is unavoidable. I have submitted a bug report for this issue [#23339], so hopefully this will get fixed in a future release.

nispio
  • 8,175
  • 2
  • 35
  • 73