2

I must be overlooking something obvious, but I can't find a way to encode a string/region into HTML, for example "Hölle" to "Hölle". Is there anything in emacs or web-mode or any other plugin that can do that?

jch
  • 5,680
  • 22
  • 39
Max
  • 93
  • 7
  • 2
    Not answering the question: it is only mandatory to encode `&`, `>`, `<` and `"` (it is even possible to leave `&` and `>` as is some times). All other HTML entities are only needed if you use single-byte encoding. So, if your goal is to simply have these characters in HTML, it's better to encode HTML using Unicode encoding (such as UTF-8). – wvxvw Feb 12 '15 at 06:31
  • Is there a reason to encode things this way? It makes the source hard to search. – Clément Feb 03 '16 at 19:14

2 Answers2

2

I just use an external utility (recode) for that

(defun my-html-entity-encode (b e)
  (interactive "r")
  (call-process-region b e "recode" t t nil "..HTML_4.0"))

E.g. select a region, then M-x my-html-entity-encode Return.

  • In the same spirit, one could press `C-u M-|`, enter "`recode ..HTML`", and press `RET`. (`C-u M-|` calls `shell-command-on-region` and replaces the current region.) – Constantine Feb 03 '16 at 17:01
1

This solution is amazingly ugly, but as far as I can tell, it works. Call it on the active region.

(defun my-replace-symbols-with-entity-names (start end)
  (interactive "r")
  (let ((count (count-matches "&")))
    (replace-string "&" "&amp;" nil start end)
    (setq end (+ end (* count 4))))
  (dolist (pair web-mode-html-entities)
    (unless (= (cdr pair) 38)
      (let* ((str (char-to-string (cdr pair)))
              (count (count-matches str start end)))
        (setq end (+ end (* count (1+ (length (car pair))))))
        (replace-string str
          (concat "&" (car pair) ";")
          nil start end)))))

Unfortunately, this is limited to web-mode's list of entities (web-mode-html-entities), which only contains 41 entities! (And does not seem to include accented characters).

You can easily add to this list yourself. Here is a list of all printable HTML entities that are named:

(setq web-mode-html-entities
  '(("quot" . 34)
     ("amp" . 38)
     ("apos" . 39)
     ("lt" . 60)
     ("gt" . 62)
     ("nbsp" . 160)
     ("iexcl" . 161)
     ("cent" . 162)
     ("pound" . 163)
     ("curren" . 164)
     ("yen" . 165)
     ("brvbar" . 166)
     ("sect" . 167)
     ("uml" . 168)
     ("copy" . 169)
     ("ordf" . 170)
     ("laquo" . 171)
     ("not" . 172)
     ("shy" . 173)
     ("reg" . 174)
     ("macr" . 175)
     ("deg" . 176)
     ("plusmn" . 177)
     ("sup2" . 178)
     ("sup3" . 179)
     ("acute" . 180)
     ("micro" . 181)
     ("para" . 182)
     ("middot" . 183)
     ("cedil" . 184)
     ("sup1" . 185)
     ("ordm" . 186)
     ("raquo" . 187)
     ("frac14" . 188)
     ("frac12" . 189)
     ("frac34" . 190)
     ("iquest" . 191)
     ("Agrave" . 192)
     ("Aacute" . 193)
     ("Acirc" . 194)
     ("Atilde" . 195)
     ("Auml" . 196)
     ("Aring" . 197)
     ("AElig" . 198)
     ("Ccedil" . 199)
     ("Egrave" . 200)
     ("Eacute" . 201)
     ("Ecirc" . 202)
     ("Euml" . 203)
     ("Igrave" . 204)
     ("Iacute" . 205)
     ("Icirc" . 206)
     ("Iuml" . 207)
     ("ETH" . 208)
     ("Ntilde" . 209)
     ("Ograve" . 210)
     ("Oacute" . 211)
     ("Ocirc" . 212)
     ("Otilde" . 213)
     ("Ouml" . 214)
     ("times" . 215)
     ("Oslash" . 216)
     ("Ugrave" . 217)
     ("Uacute" . 218)
     ("Ucirc" . 219)
     ("Uuml" . 220)
     ("Yacute" . 221)
     ("THORN" . 222)
     ("szlig" . 223)
     ("agrave" . 224)
     ("aacute" . 225)
     ("acirc" . 226)
     ("atilde" . 227)
     ("auml" . 228)
     ("aring" . 229)
     ("aelig" . 230)
     ("ccedil" . 231)
     ("egrave" . 232)
     ("eacute" . 233)
     ("ecirc" . 234)
     ("euml" . 235)
     ("igrave" . 236)
     ("iacute" . 237)
     ("icirc" . 238)
     ("iuml" . 239)
     ("eth" . 240)
     ("ntilde" . 241)
     ("ograve" . 242)
     ("oacute" . 243)
     ("ocirc" . 244)
     ("otilde" . 245)
     ("ouml" . 246)
     ("divide" . 247)
     ("oslash" . 248)
     ("Ugrave" . 249)
     ("Uacute" . 250)
     ("Ucirc" . 251)
     ("Uuml" . 252)
     ("yacute" . 253)
     ("thorn" . 254)
     ("yuml" . 255)
     ("OElig" . 338)
     ("oelig" . 339)
     ("Scaron" . 352)
     ("scaron" . 353)
     ("Yuml" . 376)
     ("fnof" . 402)
     ("circ" . 710)
     ("tilde" . 732)
     ("Alpha" . 913)
     ("Beta" . 914)
     ("Gamma" . 915)
     ("Delta" . 916)
     ("Epsilon" . 917)
     ("Zeta" . 918)
     ("Eta" . 919)
     ("Theta" . 920)
     ("Iota" . 921)
     ("Kappa" . 922)
     ("Lambda" . 923)
     ("Mu" . 924)
     ("Nu" . 925)
     ("Xi" . 926)
     ("Omicron" . 927)
     ("Pi" . 928)
     ("Rho" . 929)
     ("Sigma" . 931)
     ("Tau" . 932)
     ("Upsilon" . 933)
     ("Phi" . 934)
     ("Chi" . 935)
     ("Psi" . 936)
     ("Omega" . 937)
     ("alpha" . 945)
     ("beta" . 946)
     ("gamma" . 947)
     ("delta" . 948)
     ("epsilon" . 949)
     ("zeta" . 950)
     ("eta" . 951)
     ("theta" . 952)
     ("iota" . 953)
     ("kappa" . 954)
     ("lambda" . 955)
     ("mu" . 956)
     ("nu" . 957)
     ("xi" . 958)
     ("omicron" . 959)
     ("pi" . 960)
     ("rho" . 961)
     ("sigmaf" . 962)
     ("sigma" . 963)
     ("tau" . 964)
     ("upsilon" . 965)
     ("phi" . 966)
     ("chi" . 967)
     ("psi" . 968)
     ("omega" . 969)
     ("thetasym" . 977)
     ("Upsih" . 978)
     ("piv" . 982)
     ("ensp" . 8194)
     ("emsp" . 8195)
     ("thinsp" . 8201)
     ("zwnj" . 8204)
     ("zwj" . 8205)
     ("lrm" . 8206)
     ("rlm" . 8207)
     ("ndash" . 8211)
     ("mdash" . 8212)
     ("lsquo" . 8216)
     ("rsquo" . 8217)
     ("sbquo" . 8218)
     ("ldquo" . 8220)
     ("rdquo" . 8221)
     ("bdquo" . 8222)
     ("dagger" . 8224)
     ("Dagger" . 8225)
     ("bull" . 8226)
     ("hellip" . 8230)
     ("permil" . 8240)
     ("prime" . 8242)
     ("Prime" . 8243)
     ("lsaquo" . 8249)
     ("rsaquo" . 8250)
     ("oline" . 8254)
     ("frasl" . 8260)
     ("euro" . 8364)
     ("image" . 8465)
     ("weierp" . 8472)
     ("real" . 8476)
     ("trade" . 8482)
     ("alefsym" . 8501)
     ("larr" . 8592)
     ("uarr" . 8593)
     ("rarr" . 8594)
     ("darr" . 8595)
     ("harr" . 8596)
     ("crarr" . 8629)
     ("lArr" . 8656)
     ("UArr" . 8657)
     ("rArr" . 8658)
     ("dArr" . 8659)
     ("hArr" . 8660)
     ("forall" . 8704)
     ("part" . 8706)
     ("exist" . 8707)
     ("empty" . 8709)
     ("nabla" . 8711)
     ("isin" . 8712)
     ("notin" . 8713)
     ("ni" . 8715)
     ("prod" . 8719)
     ("sum" . 8721)
     ("minus" . 8722)
     ("lowast" . 8727)
     ("radic" . 8730)
     ("prop" . 8733)
     ("infin" . 8734)
     ("ang" . 8736)
     ("and" . 8743)
     ("or" . 8744)
     ("cap" . 8745)
     ("cup" . 8746)
     ("int" . 8747)
     ("there4" . 8756)
     ("sim" . 8764)
     ("cong" . 8773)
     ("asymp" . 8776)
     ("ne" . 8800)
     ("equiv" . 8801)
     ("le" . 8804)
     ("ge" . 8805)
     ("sub" . 8834)
     ("sup" . 8835)
     ("nsub" . 8836)
     ("sube" . 8838)
     ("supe" . 8839)
     ("oplus" . 8853)
     ("otimes" . 8855)
     ("perp" . 8869)
     ("sdot" . 8901)
     ("lceil" . 8968)
     ("rceil" . 8969)
     ("lfloor" . 8970)
     ("rfloor" . 8971)
     ("lang" . 9001)
     ("rang" . 9002)
     ("loz" . 9674)
     ("spades" . 9824)
     ("clubs" . 9827)
     ("hearts" . 9829)
     ("diams" . 9830)))

Sorry about non-printable entities, I couldn't get my macros to format them correctly.

PythonNut
  • 10,243
  • 2
  • 29
  • 75
  • It would be great to submit these entities to `web-mode`'s maintainers. Also see `lisp/org/org-entities.el` which is part of Emacs currently and has a much better list. It seems that this could be abstracted into a general list that all modes could use... – Ted Zlatanov Feb 12 '15 at 10:26
  • This is really cool! Thank you! How is something like this not part of standard emacs or at least something like web mode?! – Max Feb 12 '15 at 12:27
  • I would accept all the entities that web-mode users need. I really do not need to have "all" the entities – fxbois Feb 12 '15 at 13:46