1

Languages like Erlang and Elixir use << >> for binaries and bit-string syntax, but they also use the classical < and > for comparison operators as well as -> and <- in list comprehensions.

Emacs syntax-table has the ability to identify pairs of characters. The ( ), [ ] and { } character pairs are identified in the syntax table as character pairs.

This allows the following very handy behaviours:

  • forward-sexp and backward-sexp commands to navigate to the matching pair,
  • er/expand-region to quickly mark all text with the pair.

Adding the < > pair in the syntax table causes problems with the other uses of the < and > characters because Emacs will see unbalanced pairs in statements such as if (a < 3).

Question:

Is there a way to solve this problem and get the ability to use commands such as forward-sexp and er/expand-region to see the << >> as balanced pairs when they are working on them, without adding balanced pair-syntax to < and >?

For instance, to overcome the problem, would it be a good idea to dynamically change the syntax of < and > by modifying the syntax table just around the execution of these commands when they are applied on those characters (using advice or re-writing a function that calls them)?

PRouleau
  • 744
  • 3
  • 10
  • I was afraid of that re. your previous question but it seemed you were happy so I didn't say anything about it :-). I think you'll need to bite the bullet and implement a `forward-sexp-function` that uses a (simplified) language parser - lexical analysis just cannot cope with such problems in general (although it might be able to deal with this situation with only a single-character look-ahead). – NickD Oct 01 '21 at 15:39
  • @NickD I was not thinking ahead enough... I wonder if its possible to dynamically change the syntax table to help writing the forward-sexp-function or a wrapper to forward-sexp, backward-sexp that sets the syntax-table conditionally depending on characters around point. That would simplify the implementation. – PRouleau Oct 01 '21 at 15:43
  • Check icon.el, python.el (maybe), verilog-mode.el in the lisp/progmodes directory of the emacs sources, and tex-mode.el in lisp/textmodes for some examples. Perhaps the most enlightening example however might be in lisp/nxml/nxml-mode.el which implements just the minimal parser to deal with XML by dealing with the problematic cases and ignoring the rest. See the commentary in lisp/nxml/nxml-rap.el. – NickD Oct 01 '21 at 15:48
  • Thanks. I was thinking of doing that. All of this started with me trying to get smartparens work well with Erlang. I had to write a function to fix the problems caused by the slurp, barf transformation applied to Erlang lists. That's working now. Next is to get the << >> to work properly... – PRouleau Oct 01 '21 at 15:52
  • Modifying the syntax table dynamically might work (for some version of "work"). My knee-jerk reaction is that it would be a hack that I wouldn't want to touch with a ten-foot pole, but it's not based on any objective evidence (or any evidence at all really). I would wait to hear more experienced lispers weigh in before I'd dive into those (IMO, shark-infested) waters though. – NickD Oct 01 '21 at 15:53
  • 1
    Someone once told me she was smimming in waters where there were sharks as long as someone was further away from the beach as she was... since it's only computers I'll give it a try and see what happens :-) – PRouleau Oct 01 '21 at 15:55
  • However, I might have to use real grammar handling because I'd ran into issues with the following valid Erlang statements: `<< <> || Bin <- [<<3,7,5,4,7>>] >>.` and `<< <<(X+1)/integer>> || <> <= <<3,7,5,4,7>> >>.` Perhaps time to learn SMIE... – PRouleau Oct 01 '21 at 18:02
  • 1
    @NickD I ended up biting the bullet and implemented specialized forward-sexp and backward-sexp commands that handle the Erlang `<< >>` bit syntax blocks until I get get the erlang.el syntax processing going. – PRouleau Oct 06 '21 at 04:21

1 Answers1

1

Yes. The syntax of a character in a buffer defaults to the value indicated by the buffer's syntax table, but this can be overridden by setting the syntax-table text property on a character in a buffer, provided that parse-sexp-lookup-properties is true. And you can set syntax-propertize-function to a function that applies this property as needed to a stretch of text. Modes written for older versions of Emacs that didn't have syntax-propertize-function could use font locking or custom post-change hooks for that.

For simple cases, the syntax-propertize-rules macro can generate a suitable syntax-propertize-function. It's similar to font lock keywords, but sets character syntaxes rather than faces.

The example below (untested) makes the outer characters in <<…>> matching balanced open-close characters.

(defconst foo-mode-syntax-propertize-function
  (syntax-propertize-rules
   ("\\(<\\)<" (1 "(>"))
   (">\\(>\\)" (2 ")<"))))
(defun foo-mode ()
  …
  (make-local-variable 'parse-sexp-lookup-properties)
  (setq parse-sexp-lookup-properties t)
  (make-local-variable 'syntax-propertize-function)
  (setq syntax-propertize-function foo-mode-syntax-propertize-function)
)

For a working, real-world example of making <<…>> balanced, see Erlang mode. It only sets the syntax when inserting < or > explicitly, which is very limited. I guess this code was written a long time ago before syntax-propertize-function made it more convenient.

There are several examples of using syntax-propertize-function as intended in modes distributed with Emacs. For example, bat mode uses syntax-propertize-rules to make things like the word rem be a multi-character content starter. Fortran mode has a somewhat more complicated rule to recognize the C comment marker in the proper column. For a much more example, look at the various things Perl mode does to accommodate some of Perl's syntactic complexity.

If you want to see a really complex example, explore how CC mode makes <…> balanced where warranted in C++.

  • Thanks. I'll need to study erlang.el more. I was using erlang-mode and with it the forward-sexp does not work with `<< >>` nor is it able to mark the region with er/expand-region. It does not activate pairing of `< >`. I was able to activate electric pairing insertion by adding a smartparens pair "<<" and ">>". The smartparens has other issues with Erlang that I have overcome with prostprocessing, wrapping functions and a function to fix issues in formatting of comma separated blocks. I need to study erlang.el code more deeply and syntax-propertize-function. – PRouleau Oct 01 '21 at 21:18
  • @PRouleau Because Erlang mode uses electric commands to mark `<>` as pairs, things like `forward-sexp` will only work as desired if you've just inserted them, and not if you paste them or load a file. I mention it as an example for how to declare the syntax of the characters, but not how to apply this syntax. Use `syntax-propertize-function` to apply the syntax, as all the other examples I cite do. – Gilles 'SO- stop being evil' Oct 01 '21 at 21:39
  • OK, I see the matching behaviour after just typing something like `ABC = <<16#4f>>` while point is just past the last `>`. If I then type the period to complete the Erlang expression forward-sexp continues to work. But if I save the file, exit Emacs, start it again on the same file, the behaviour is gone. I'm looking to maintain this behaviour throughout, but also have to deal with things like `<< <> || Bin <- [<<3,7,5,4,7>>] >>`. – PRouleau Oct 01 '21 at 21:56
  • @PRouleau To maintain the behavior, don't set the property the way Erlang mode does it. Set it the modern way, with `syntax-propertize-function`, like in my small example. – Gilles 'SO- stop being evil' Oct 02 '21 at 08:14
  • I believe that the group number must be 1 for both: the syntax-propertize-rules concatenate the regexps but also increments the second group number in the generated code. The code should work but somehow the function is probably not running because the syntax of the characters is not changing. I expanded the macro, turned it in a interactive command, debug-steped in a erlang buffer and saw it call put-text-property in the right spot. Yet cannot see the syntax-table property on the character. I can see the syntax-table property on the 'r' of rem in a batch file though. – PRouleau Oct 03 '21 at 03:14
  • ..and if I copy the content of the erlang buffer inside a .txt buffer and run the same code, I can get the < character to have the '(4 . 62) 'syntax-table property just fine. And if I (set-local parse-sexp-lookup-table t) in that .txt buffer I can use forward-sex and backward-sexp just fine! But not in the erlang-mode buffer... I am missing something... – PRouleau Oct 03 '21 at 03:24