For what languages is `syntax-ppss` appropriate?

Question

I've been looking at a way of detecting whether point is on a comment by looking at how the current buffer is fontified.

Smartparens defines sp-point-in-comment, which relies on syntax-ppss. However, it seems that syntax-ppss and parse-partial-sexp can be used for arbitrary languages, even if they don't use s-expressions.

For example, this Python:

x = 1
# I'm a comment
y = 2

Placing point inside the comment and evaluating (if (nth 4 (syntax-ppss)) 'comment 'not-comment) works correctly.

Does syntax-ppss work for any programming mode? Why do the docstrings discuss s-expressions?

Not that I've explored this thoroughly, but I’ve yet to find a language in which it doesn’t work. Even in text-derived modes such as latex it’s worked fine for me. — Malabarba, Oct 07 '14 at 13:13

score 14 · Accepted Answer · answered Oct 07 '14 at 13:27

Well, s-expressions are essentially “abstract syntax”, in the sense that they are merely a concrete syntax for abstract syntax trees, and thus any language can be represented as s-expressions, and manipulated with s-expression commands. Hence, syntax-ppss speaking of “Sexps” is simply the Lisp way to talk about abstract syntax trees.

Practically, though, syntax-ppss does not generally work for any mode. It's fundamentally targeted at Lisp-like languages, and if the concrete language of a language deviates from Sexps too much, it doesn't make much sense anymore to use Sexp commands to manipulate the language. It'd work, but there'd be too big a gap between the abstract representation and the concrete syntax, which would make most commands counter-intuitive.

However, some of underlying infrastructure of syntax-ppss is fairly generic. Major modes typically try hard to plug into it, because it makes them work nicely with many built-in Emacs' features and provides a generic interface for other 3rd party packages such as Smartparens.

Notably, syntax-ppss relies on Syntax Tables for strings and comments. Syntax Tables categorize individual characters by their syntactic class. There are classes for paired delimiters, string delimiters and comment characters.

The structure of strings and comments is fairly similar in almost all programming languages: Strings are normally enclosed in special delimiters. Comments can have special delimiters as well, or start with a certain character and extend to the end of the line. These structures can easily be captured in syntax tables, and almost all major modes define appropriate syntax tables, if only to profit from Emacs' syntactic fortification.

Hence, syntax-ppss works well for strings and comments in almost any language, but support and “usefulness” of other features varies.

score 4 · Answer 2 · edited Mar 11 '16 at 22:50

Adding to @lunaryorn's answer, I think syntax-ppss just rely on the robustness of emacs's syntax table system, which works for comment and string in most languages. But if the language has syntax that syntax table can't capture, and if the mode did't build a parser to add syntax properties to the right places, syntax-ppss would fail.

Try this in html-mode:

<p class="aa" id='bb'>"cc" 'dd'</p>

and call the following command:

(defun inside-string-p (&optional pos)
  "Return non-nil if inside string, else nil.
This depends on major mode having setup syntax table properly."
  (interactive)
  (let ((result (nth 3 (syntax-ppss pos))))
    (print result)
    result))

Only aa is true, but bb should also be true.

When in nxml-mode, none of it returns true, but at least aa should be true.

For what languages is `syntax-ppss` appropriate?

2 Answers2