compilation-mode and next-error confusion

Question

I'm going crazy trying to understand why compilation mode and next-error recognize some errors and not others. It appears as though we're long past the days of simple regexps for recognizing errors but I'm not finding anything that explains what is, or is not used now, much less how to debug it.

Why is this recognized as an error:

/Users/kpixley/projects/src-head/cevo/junos/ui/tests/Makefile.inc:5:0 (41): no match found, expected: ":", [ \t] or [\p{Latin}-_.${}/%0123456789]

While this is not?

/Users/kpixley/projects/src-head/cevo/jdid/jdid-infra/build-files/evo/src/Makefile.inc:14:24 (268): rule include_dir: include

Could you please provide exact details of the compiler you are using? Assuming other assumptions are correct, the errors which *are* being matched are only being matched by accident, as this error syntax is not actually supported. Can you also confirm whether the two numbers (e.g. `5:0` or `14:24`) are indeed LINE:COLUMN numbers? — phils, Jul 25 '20 at 02:58
5:0 and 14:24 are indeed line & column numbers. The "compiler" here is one I'm developing. — kpixley, Jul 27 '20 at 16:15
Well in that case the solution seems easy -- add a colon after the column number in the output you're generating, and then it will be a format that Emacs recognises. Also consider making it `:LINE.COLUMN:` which seems to be a well-known format. — phils, Jul 27 '20 at 20:30
Thank you. That will probably be my fix although I very appreciate the elaborations below as well. Thank you all. — kpixley, Jul 28 '20 at 19:10

lawlist · Answer 1 · 2020-07-24T06:24:27.570

1

In the following guess-timated answer (tested with Emacs 26.3), I have changed the fourth line from the bottom; i.e., (regexp "[0-9]?") to (regexp "[0-9]+?"). This permits Emacs to match the number 14 following the filename and the first : (colon). To try out this answer, evaluate the Lisp code underneath section labeled THE CODE, and then paste the working data into a scratch buffer and type: M-x compilation-mode

The following link provides other methods for changing an element of an alist: How to replace an element of an alist? I chose to use the solution provided by Dan to modify the alist at issue in this thread.

WORKING DATA:

/Users/kpixley/projects/src-head/cevo/junos/ui/tests/Makefile.inc:5:0 (41): no match found, expected: ":", [ \t] or [\p{Latin}-_.${}/%0123456789]

/Users/kpixley/projects/src-head/cevo/jdid/jdid-infra/build-files/evo/src/Makefile.inc:14:24 (268): rule include_dir: include

THE CODE:

;;;  Load the library before trying to change `compilation-error-regexp-alist-alist'
(require 'compile)

(setf (nth 1 (assoc 'gnu compilation-error-regexp-alist-alist))
  (rx
    bol
    (? (| (regexp "[[:alpha:]][-[:alnum:].]+: ?")
          (regexp "[ \t]+\\(?:in \\|from\\)")))
    (group-n 1 (: (regexp "[0-9]*[^0-9\n]")
                  (*? (| (regexp "[^\n :]")
                         (regexp " [^-/\n]")
                         (regexp ":[^ \n]")))))
    (regexp ": ?")
    (group-n 2 (regexp "[0-9]+"))
    (? (| (: "-"
             (group-n 4 (regexp "[0-9]+"))
             (? "." (group-n 5 (regexp "[0-9]+"))))
          (: (in ".:")
             (group-n 3 (regexp "[0-9]+"))
             (? "-"
                (? (group-n 4 (regexp "[0-9]+")) ".")
                (group-n 5 (regexp "[0-9]+"))))))
    ":"
    (| (: (* " ")
          (group-n 6 (| "FutureWarning"
                        "RuntimeWarning"
                        "Warning"
                        "warning"
                        "W:")))
       (: (* " ")
          (group-n 7 (| (regexp "[Ii]nfo\\(?:\\>\\|rmationa?l?\\)")
                        "I:"
                        (: "[ skipping " (+ ".") " ]")
                        "instantiated from"
                        "required from"
                        (regexp "[Nn]ote"))))
       (: (* " ")
          (regexp "[Ee]rror"))
       (: (regexp "[0-9]+?") ;; (regexp "[0-9]?")
          (| (regexp "[^0-9\n]")
             eol))
       (regexp "[0-9][0-9][0-9]"))))

SHOW YOUR WORK

One of my old math teachers used to say always "SHOW YOUR WORK". I came up with this guess-timated answer by first placing a message within the function compilation-parse-errors, with an eye towards extracting the regexp used to process the relevant components of the working data, which yielded the following regexp:

"^ *\\(?:[[:alpha:]][-[:alnum:].]+: ?\\|[ \t]+\\(?:in \\|from\\)\\)?\\(?1:\\(?:[0-9]*[^0-9\n]\\)\\(?:[^\n :]\\| [^-/\n]\\|:[^ \n]\\)*?\\)\\(?:: ?\\)\\(?2:[0-9]+\\)\\(?:-\\(?4:[0-9]+\\)\\(?:\.\\(?5:[0-9]+\\)\\)?\\|[.:]\\(?3:[0-9]+\\)\\(?:-\\(?:\\(?4:[0-9]+\\)\.\\)?\\(?5:[0-9]+\\)\\)?\\)?:\\(?: *\\(?6:\\(?:FutureWarning\\|RuntimeWarning\\|W\\(?::\\|arning\\)\\|warning\\)\\)\\| *\\(?7:[Ii]nfo\\(?:\>\\|rmationa?l?\\)\\|I:\\|\[ skipping \.+ ]\\|instantiated from\\|required from\\|[Nn]ote\\)\\| *\\(?:[Ee]rror\\)\\|[0-9]?\\(?:[^0-9\n]\\|$\\)\\|[0-9][0-9][0-9]\\)"

Then, I took the working data and used M-x re-builder to see what the above-mentioned regexp matched. I modified the second line of the working data by reducing the 14 to just one digit and that helped me zero in on the relevant section of the regexp at issue. From there, I looked at the compilation-error-regexp-alist-alist to locate the correspondenting section and found it in the gnu section of that variable.

edited Jul 24 '20 at 06:24

answered Jul 24 '20 at 00:35

lawlist

18,826
5
37
118

Converting `[0-9]?` to `[0-9]+?` looks wrong. Surely it should be `[0-9]*`. (I.e. zero-or-more instead of the original zero-or-one.) I'm not sure what that number *is*, mind you, but I'm assuming we want to retain the support for it not existing at all, and I'm dubious that non-greedy behaviour could be correct. – phils Jul 24 '20 at 02:11
Most likely this needs an upstream bug report. – phils Jul 24 '20 at 02:12
@phils -- thank you for helping me to improve upon this answer. I used re-builder just now and see that `[0-9]+` matches any integer (e.g. 5, 14, 100, 1000, etc.), and Emacs uses the `compilation-line-number` to colorize both the numbers `5` and `14` in each of the respective lines of working data. Could you give me an example, please, of when Emacs would be looking for something other than an integer (line number) in this example? – lawlist Jul 24 '20 at 02:22
I don't know for sure what this should be. The comments for the definition of the `gnu` section of `compilation-error-regexp-alist-alist` indicates that the expected format is `PROGRAM:SOURCE-FILE-NAME:LINENO: MESSAGE` but even the original working example `...Makefile.inc:5:0 (41): no match found...` doesn't fit that. My *guess* was that `:5:0` was :LINE:COLUMN and that COLUMN was intended to be optional, but at a glance I'm honestly unsure what is being matched. – phils Jul 24 '20 at 02:29
I have also assumed that this part of the regexp should only match numbers, though, and my suggestion was not at odds with that. I was just ensuring that, as with the original value, it was valid for it to be empty. – phils Jul 24 '20 at 02:31
@phils -- I'm still trying to wrap my head around the suggested revision ... By retaining support for the line number not existing at all, are we using the hypothetical line akin to the following, where we have `filename::column ...`?: `/Users/kpixley/projects/src-head/cevo/junos/ui/tests/Makefile.inc::0 (41): no match found, expected: ":", [ \t] or [\p{Latin}-_.${}/%0123456789]` If we use `[0-9]*` instead of `[0-9]+?`, then the hypothetical line described is not recognized as an error. – lawlist Jul 24 '20 at 02:55
It's entirely possible that I've misinterpreted this. I was looking at `:5:0` (working) and `:14:24` (not working) and assumed that the line number was `5` or `14`, and the problem was with matching the second number, `0` or `24`. I was *assuming* the existing regexp didn't fail to match multi-digit line numbers (which would be the norm), but that if the column was optional then there might have been a mistake made with that. You seem to be saying that the change you've made affects the *line* number though (in which case I'm not being helpful). – phils Jul 24 '20 at 04:36
@phils -- the face `compilation-line-number` that colorized just the `5` of the *working data* helped me zero-in on the `14` as a suspect. Your help throughout the years has been invaluable, and I have many citations in my own setup relating to various snippets of code that you have written along with links to the applicable threads. I was a little pressed for time this evening and did not take the time to show my work regarding how I arrived at my *guess*-timate. I have since updated the answer with the initial regexp that I derived using the *working data* to match using `M-x re-builder`. – lawlist Jul 24 '20 at 05:41
This part of the pattern dates back to commit 0ab31e4a9ffda94e1e741f9a4b0df5aff3c62570 Jul 19 2006 "(compilation-error-regexp-alist-alist) : Try to rule out false positives due to time stamps." so the message format seen here is seemingly conflicting with expectations. I can see that using `[0-9]+?` breaks some of the test cases in compile-tests.el whereas `[0-9]*` *appears* to support all the existing tests plus the new one; however there's so much going on here that I'm not yet convinced that it's that simple. – phils Jul 24 '20 at 22:27
I can also confirm that it is the second number in the `:14:24` sequence which is causing the problem. Although it's the `14` which gets highlighted as the line number, changing the `24` to just a single digit (and maybe `C-c C-u` in re-builder) allows it to match that line. – phils Jul 24 '20 at 22:30
For better context, I'll add an answer with the details from the test file. – phils Jul 24 '20 at 22:36
I don't think [0-9]* is ever right. That would match :: no line number which is a pointless match since the goal here is to match file & line number so we can display the offending source code. – kpixley Jul 27 '20 at 16:20

phils · Answer 2 · 2020-07-25T01:53:54.873

This isn't a complete answer, but it provides more context.

As @lawlist has shown, this matching is determined by the gnu regexp in compilation-error-regexp-alist-alist, which is currently defined as follows:

    (gnu
     ;; The first line matches the program name for

     ;;     PROGRAM:SOURCE-FILE-NAME:LINENO: MESSAGE

     ;; format, which is used for non-interactive programs other than
     ;; compilers (e.g. the "jade:" entry in compilation.txt).

     ;; This first line makes things ambiguous with output such as
     ;; "foo:344:50:blabla" since the "foo" part can match this first
     ;; line (in which case the file name as "344").  To avoid this,
     ;; the second line disallows filenames exclusively composed of
     ;; digits.

     ;; Similarly, we get lots of false positives with messages including
     ;; times of the form "HH:MM:SS" where MM is taken as a line number, so
     ;; the last line tries to rule out message where the info after the
     ;; line number starts with "SS".  --Stef

     ;; The core of the regexp is the one with *?.  It says that a file name
     ;; can be composed of any non-newline char, but it also rules out some
     ;; valid but unlikely cases, such as a trailing space or a space
     ;; followed by a -, or a colon followed by a space.
     ;;
     ;; The "in \\|from " exception was added to handle messages from Ruby.
     ,(rx
       bol
       (? (| (regexp "[[:alpha:]][-[:alnum:].]+: ?")
             (regexp "[ \t]+\\(?:in \\|from\\)")))
       (group-n 1 (: (regexp "[0-9]*[^0-9\n]")
                     (*? (| (regexp "[^\n :]")
                            (regexp " [^-/\n]")
                            (regexp ":[^ \n]")))))
       (regexp ": ?")
       (group-n 2 (regexp "[0-9]+"))
       (? (| (: "-"
                (group-n 4 (regexp "[0-9]+"))
                (? "." (group-n 5 (regexp "[0-9]+"))))
             (: (in ".:")
                (group-n 3 (regexp "[0-9]+"))
                (? "-"
                   (? (group-n 4 (regexp "[0-9]+")) ".")
                   (group-n 5 (regexp "[0-9]+"))))))
       ":"
       (| (: (* " ")
             (group-n 6 (| "FutureWarning"
                           "RuntimeWarning"
                           "Warning"
                           "warning"
                           "W:")))
          (: (* " ")
             (group-n 7 (| (regexp "[Ii]nfo\\(?:\\>\\|rmationa?l?\\)")
                           "I:"
                           (: "[ skipping " (+ nonl) " ]")
                           "instantiated from"
                           "required from"
                           (regexp "[Nn]ote"))))
          (: (* " ")
             (regexp "[Ee]rror"))
          (: (regexp "[0-9]?")
             (| (regexp "[^0-9\n]")
                eol))
          (regexp "[0-9][0-9][0-9]")))
     1 (2 . 4) (3 . 5) (6 . 7))

That alist is preceded with the comment:

;; If you make any changes to `compilation-error-regexp-alist-alist',
;; be sure to run the ERT test in test/lisp/progmodes/compile-tests.el.
;; emacs -batch -l compile-tests.el -f ert-run-tests-batch-and-exit

The current gnu test cases from compile-tests.el are:

;; gnu
foo.c:88: message
../foo.c:88: W: message
/tmp/foo.c:88:warning message
foo/bar.py:8: FutureWarning message
foo.py:88: RuntimeWarning message
foo.c:88:I: message
foo.c:88.23: note: message
foo.c:88.23: info: message
foo.c:88:23:information: message
foo.c:88.23-45: Informational: message
foo.c:88-23: message
;; The next one is not in the GNU standards AFAICS.
;; Here we seem to interpret it as LINE1-LINE2.COL2.
foo.c:88-45.37: message
foo.c:88.23-9.17: message
jade:dbcommon.dsl:133:17:E: missing argument for function call
G:/cygwin/dev/build-myproj.xml:54: Compiler Adapter 'javac' can't be found.
file:G:/cygwin/dev/build-myproj.xml:54: Compiler Adapter 'javac' can't be found.
{standard input}:27041: Warning: end of file not at end of a line; newline inserted
boost/container/detail/flat_tree.hpp:589:25:   [ skipping 5 instantiation contexts, use -ftemplate-backtrace-limit=0 to disable ]

to which we can add the two cases from this question (I've truncated the file paths, as that makes no difference).

Makefile.inc:5:0 (41): no match found, expected: ":", [ \t] or [\p{Latin}-_.${}/%0123456789]
Makefile.inc:14:24 (268): rule include_dir: include

We can then test these with M-x re-builder either by switching it to rx mode to use the original form¹, or for the default read mode using (cadr (assoc 'gnu compilation-error-regexp-alist-alist)):

"^\\(?:[[:alpha:]][-[:alnum:].]+: ?\\|[ \t]+\\(?:in \\|from\\)\\)?\\(?1:\\(?:[0-9]*[^0-9\n]\\)\\(?:[^\n :]\\| [^-/\n]\\|:[^ \n]\\)*?\\)\\(?:: ?\\)\\(?2:[0-9]+\\)\\(?:-\\(?4:[0-9]+\\)\\(?:\\.\\(?5:[0-9]+\\)\\)?\\|[.:]\\(?3:[0-9]+\\)\\(?:-\\(?:\\(?4:[0-9]+\\)\\.\\)?\\(?5:[0-9]+\\)\\)?\\)?:\\(?: *\\(?6:\\(?:FutureWarning\\|RuntimeWarning\\|W\\(?::\\|arning\\)\\|warning\\)\\)\\| *\\(?7:[Ii]nfo\\(?:\\>\\|rmationa?l?\\)\\|I:\\|\\[ skipping \\.+ ]\\|instantiated from\\|required from\\|[Nn]ote\\)\\| *\\(?:[Ee]rror\\)\\|[0-9]?\\(?:[^0-9\n]\\|$\\)\\|[0-9][0-9][0-9]\\)"

This confirms the issue: All of the original test cases match, but only one of the new cases matches.

As @lawlist identified, changing that [0-9]? makes a difference. If we change that to [0-9]* then all of the cases are now matched; however there's so much going on in this pattern that it's currently unclear to me whether or not that's the correct fix.

In the failure case:

Makefile.inc:14:24 (268): rule include_dir: include

The line number is 14, but it's the subsequent 24 which is failing to match the zero-or-one-digit [0-9]?. Reducing that to a single digit (as seen in the case which worked) means the original regexp matches the line. (Use C-cC-u to ensure re-builder picks up the change, if necessary.)

That [0-9]? dates back to commit 0ab31e4a9ff from 2006, and was part of a change intended to "rule out false positives due to time stamps":

we get lots of false positives with messages including times of the form "HH:MM:SS" where MM is taken as a line number, so the last line tries to rule out message where the info after the line number starts with "SS".

¹ You'll need make the top-level sequence explicit. Refer to the discussion of this gotcha in https://emacs.stackexchange.com/a/5577/454

It's possible that this was [bug 15944](https://debbugs.gnu.org/cgi/bugreport.cgi?bug=15944), but sadly the reporter never followed up the request for more information, so it never went anywhere. — phils, Jul 25 '20 at 01:38
After looking through the regexp, it seems to me that what Emacs is *expecting* from these errors is that there would be an additional colon `:` after the `:LINE:COLUMN` pairing. I.e. `:14:24` would be `:14:24:`, and consequently the earlier part of the regexp (the section preceding that `":"` separator, which is looking for line numbers and columns) would match. — phils, Jul 25 '20 at 02:52
I'll add add that `:LINE.COLUMN:` seems like it might be the more common syntax; but in either case that trailing colon is expected to be there. — phils, Jul 25 '20 at 03:02

compilation-mode and next-error confusion

2 Answers2