SMIE: defining a build.ninja grammar

Question

I'm trying to write a mode in SMIE, to figure out how it works and to create some documentation.

build.ninja (a build system used by Meson and others) is a perfect candidate due to its very simple syntax. So despite there being a ninja-mode, I decided to create one based on SMIE and to possibly include it into upstream Emacs.

Syntax showcase (barring that there's a few more keywords and rule only allows special variables):

rule my_rule_title
  local_var_rule  = some text
  command         = cc -c $in -o $out

global_var = some text

build path/obj.o: my_rule_title path/obj.c
  local_var_build = some text

Basically, rule and build accept a few parameters and have a body. The body only allows variable assignments to appear and is characterized by non-zero indentation level. So you can see local_var_rule is inside a rule region, but global_var is outside it.

I have spent some time studying other SMIE-based modes, reading documentation, and writing code. At this point I've monkey-typed something working, but not really properly, and I think main reason is that I don't know if my grammar is correct (unlikely). My current grammar is attached at the bottom.

So, here are questions I didn't find answers to:

Does a grammar have to cover complete buffer or only the interesting parts?

To give an example: the build.ninja example above has rule and build paragraphs. Obviously that means I have to write at least two SMIE rules: one is to cover possible appearance of rule and another for build. But once that's done, do I also write a rule that connects the two on the level of an entire buffer, i.e. to say "the buffer is expected to be composed of rules and builds"? Or having just the two is enough?
How do I define what symbols an identifier contains? For example a build title may contain slashes and escaped spaces, but variable and rule names are not allowed to have them.
How to define newline as a separator? E.g. a build ends with a newline, and then follows a region of assignments. I tried using a "\n", but I'm not sure if SMIE interprets the backslash, nor that a \n will work with other newline types.
- sub-question: defining that a line is allowed to continue on the next one if the previous line ended with a $ (i.e. escapes the newline). I guess if "\n" works, then I just have to create a separate rule for "$\n". But I decided to question that explicitly in case the answer to 3 is more complicated than that.
How to define a non-zero space token, that is to define that the variable assignment belongs to the previous build or rule?

My last attempt is the grammar below. I had some other variants that worked incorrectly, but they were incomplete as well. For this post I created a more complete version, but it does not compile for me because it doesn't like text definition, it throws Adjacent non-terminals: id text.

(defvar test-mode-smie-grammar
  (smie-prec2->grammar
   (smie-bnf->prec2
    '((id)
      (path) ;; TODO: define how it's different from `id'
      (statements (statement)
                  (statement "\n" statements))
      (statement (top_decls) (variable))
      (text (id text)
            (text "\n"))
      (variable (id "=" text))
      (build_title (path build_title)
                   (path ":"))
      (top_decls
       ("rule" id)
       ("build" build_title ":" text)
       )
      ))))

@Drew the question is one in the title, about building a grammar per described constraint. The questions in the body may be omitted if you manage to build a grammar some way that side-steps answering these questions, though such an answer would still have to make it clear how it allows to implement a grammar without answering them. IOW, I just don't see how one could possibly answer "how to build a grammar for the given language" without answering these questions, but it's okay if you do manage that. — Hi-Angel, Mar 31 '23 at 20:00
Please pose your multiple questions as separate posts. It's not about answering a complex question directly ("How to build a rocket to the moon?"). It's about posting specific how-to questions. After you've gotten answers to your N specific questions you may be able to answer your complex answer yourself - or you may have other specific questions to get you further on your way. If you post a complex question that includes N other questions, that's not a good fit for this site - neither the question nor the likely answers. — Drew, Mar 31 '23 at 21:43
@Drew I don't really understand how you imagine this. First of all, the subquestions seems to be trivial for someone acknowledgeable in how grammar works. Now, you want me to create separate posts. But note that each one of these sub-questions refer to the `build.ninja` syntax as an actual example of a problem *(which is rational, you wouldn't want some abstract question, you want to see why it's being asked)*. So do you imagine me just creating 4 complete copies of the post, except the 3 sub-questions removed, and with a reference that it's result of a split? — Hi-Angel, Mar 31 '23 at 22:42
…and, I mean of course, with the one question that is left being moved to the title of each such post. — Hi-Angel, Mar 31 '23 at 22:49
After spending lots of time working on this, researching, experimenting, etc, I think the best answer to "how to use SMIE" is "don't". I don't think anybody but the original author understands how it works and it is veeery hard to figure out. The non-SMIE [indentation support](https://github.com/ninja-build/ninja/pull/2281/commits/74642c5a6fd68ae71545796f3f7a7ee3e641a2f7) is very short. — Hi-Angel, Jul 01 '23 at 17:14

SMIE: defining a build.ninja grammar

0 Answers0