I must be overlooking something obvious, but I can't find a way to encode a string/region into HTML, for example "Hölle" to "Hölle". Is there anything in emacs or web-mode or any other plugin that can do that?
Asked
Active
Viewed 823 times
2
-
2Not answering the question: it is only mandatory to encode `&`, `>`, `<` and `"` (it is even possible to leave `&` and `>` as is some times). All other HTML entities are only needed if you use single-byte encoding. So, if your goal is to simply have these characters in HTML, it's better to encode HTML using Unicode encoding (such as UTF-8). – wvxvw Feb 12 '15 at 06:31
-
Is there a reason to encode things this way? It makes the source hard to search. – Clément Feb 03 '16 at 19:14
2 Answers
2
I just use an external utility (recode) for that
(defun my-html-entity-encode (b e)
(interactive "r")
(call-process-region b e "recode" t t nil "..HTML_4.0"))
E.g. select a region, then M-x my-html-entity-encode
Return.

Alexander Gromnitsky
- 151
- 4
-
In the same spirit, one could press `C-u M-|`, enter "`recode ..HTML`", and press `RET`. (`C-u M-|` calls `shell-command-on-region` and replaces the current region.) – Constantine Feb 03 '16 at 17:01
1
This solution is amazingly ugly, but as far as I can tell, it works. Call it on the active region.
(defun my-replace-symbols-with-entity-names (start end)
(interactive "r")
(let ((count (count-matches "&")))
(replace-string "&" "&" nil start end)
(setq end (+ end (* count 4))))
(dolist (pair web-mode-html-entities)
(unless (= (cdr pair) 38)
(let* ((str (char-to-string (cdr pair)))
(count (count-matches str start end)))
(setq end (+ end (* count (1+ (length (car pair))))))
(replace-string str
(concat "&" (car pair) ";")
nil start end)))))
Unfortunately, this is limited to web-mode
's list of entities (web-mode-html-entities
), which only contains 41 entities! (And does not seem to include accented characters).
You can easily add to this list yourself. Here is a list of all printable HTML entities that are named:
(setq web-mode-html-entities
'(("quot" . 34)
("amp" . 38)
("apos" . 39)
("lt" . 60)
("gt" . 62)
("nbsp" . 160)
("iexcl" . 161)
("cent" . 162)
("pound" . 163)
("curren" . 164)
("yen" . 165)
("brvbar" . 166)
("sect" . 167)
("uml" . 168)
("copy" . 169)
("ordf" . 170)
("laquo" . 171)
("not" . 172)
("shy" . 173)
("reg" . 174)
("macr" . 175)
("deg" . 176)
("plusmn" . 177)
("sup2" . 178)
("sup3" . 179)
("acute" . 180)
("micro" . 181)
("para" . 182)
("middot" . 183)
("cedil" . 184)
("sup1" . 185)
("ordm" . 186)
("raquo" . 187)
("frac14" . 188)
("frac12" . 189)
("frac34" . 190)
("iquest" . 191)
("Agrave" . 192)
("Aacute" . 193)
("Acirc" . 194)
("Atilde" . 195)
("Auml" . 196)
("Aring" . 197)
("AElig" . 198)
("Ccedil" . 199)
("Egrave" . 200)
("Eacute" . 201)
("Ecirc" . 202)
("Euml" . 203)
("Igrave" . 204)
("Iacute" . 205)
("Icirc" . 206)
("Iuml" . 207)
("ETH" . 208)
("Ntilde" . 209)
("Ograve" . 210)
("Oacute" . 211)
("Ocirc" . 212)
("Otilde" . 213)
("Ouml" . 214)
("times" . 215)
("Oslash" . 216)
("Ugrave" . 217)
("Uacute" . 218)
("Ucirc" . 219)
("Uuml" . 220)
("Yacute" . 221)
("THORN" . 222)
("szlig" . 223)
("agrave" . 224)
("aacute" . 225)
("acirc" . 226)
("atilde" . 227)
("auml" . 228)
("aring" . 229)
("aelig" . 230)
("ccedil" . 231)
("egrave" . 232)
("eacute" . 233)
("ecirc" . 234)
("euml" . 235)
("igrave" . 236)
("iacute" . 237)
("icirc" . 238)
("iuml" . 239)
("eth" . 240)
("ntilde" . 241)
("ograve" . 242)
("oacute" . 243)
("ocirc" . 244)
("otilde" . 245)
("ouml" . 246)
("divide" . 247)
("oslash" . 248)
("Ugrave" . 249)
("Uacute" . 250)
("Ucirc" . 251)
("Uuml" . 252)
("yacute" . 253)
("thorn" . 254)
("yuml" . 255)
("OElig" . 338)
("oelig" . 339)
("Scaron" . 352)
("scaron" . 353)
("Yuml" . 376)
("fnof" . 402)
("circ" . 710)
("tilde" . 732)
("Alpha" . 913)
("Beta" . 914)
("Gamma" . 915)
("Delta" . 916)
("Epsilon" . 917)
("Zeta" . 918)
("Eta" . 919)
("Theta" . 920)
("Iota" . 921)
("Kappa" . 922)
("Lambda" . 923)
("Mu" . 924)
("Nu" . 925)
("Xi" . 926)
("Omicron" . 927)
("Pi" . 928)
("Rho" . 929)
("Sigma" . 931)
("Tau" . 932)
("Upsilon" . 933)
("Phi" . 934)
("Chi" . 935)
("Psi" . 936)
("Omega" . 937)
("alpha" . 945)
("beta" . 946)
("gamma" . 947)
("delta" . 948)
("epsilon" . 949)
("zeta" . 950)
("eta" . 951)
("theta" . 952)
("iota" . 953)
("kappa" . 954)
("lambda" . 955)
("mu" . 956)
("nu" . 957)
("xi" . 958)
("omicron" . 959)
("pi" . 960)
("rho" . 961)
("sigmaf" . 962)
("sigma" . 963)
("tau" . 964)
("upsilon" . 965)
("phi" . 966)
("chi" . 967)
("psi" . 968)
("omega" . 969)
("thetasym" . 977)
("Upsih" . 978)
("piv" . 982)
("ensp" . 8194)
("emsp" . 8195)
("thinsp" . 8201)
("zwnj" . 8204)
("zwj" . 8205)
("lrm" . 8206)
("rlm" . 8207)
("ndash" . 8211)
("mdash" . 8212)
("lsquo" . 8216)
("rsquo" . 8217)
("sbquo" . 8218)
("ldquo" . 8220)
("rdquo" . 8221)
("bdquo" . 8222)
("dagger" . 8224)
("Dagger" . 8225)
("bull" . 8226)
("hellip" . 8230)
("permil" . 8240)
("prime" . 8242)
("Prime" . 8243)
("lsaquo" . 8249)
("rsaquo" . 8250)
("oline" . 8254)
("frasl" . 8260)
("euro" . 8364)
("image" . 8465)
("weierp" . 8472)
("real" . 8476)
("trade" . 8482)
("alefsym" . 8501)
("larr" . 8592)
("uarr" . 8593)
("rarr" . 8594)
("darr" . 8595)
("harr" . 8596)
("crarr" . 8629)
("lArr" . 8656)
("UArr" . 8657)
("rArr" . 8658)
("dArr" . 8659)
("hArr" . 8660)
("forall" . 8704)
("part" . 8706)
("exist" . 8707)
("empty" . 8709)
("nabla" . 8711)
("isin" . 8712)
("notin" . 8713)
("ni" . 8715)
("prod" . 8719)
("sum" . 8721)
("minus" . 8722)
("lowast" . 8727)
("radic" . 8730)
("prop" . 8733)
("infin" . 8734)
("ang" . 8736)
("and" . 8743)
("or" . 8744)
("cap" . 8745)
("cup" . 8746)
("int" . 8747)
("there4" . 8756)
("sim" . 8764)
("cong" . 8773)
("asymp" . 8776)
("ne" . 8800)
("equiv" . 8801)
("le" . 8804)
("ge" . 8805)
("sub" . 8834)
("sup" . 8835)
("nsub" . 8836)
("sube" . 8838)
("supe" . 8839)
("oplus" . 8853)
("otimes" . 8855)
("perp" . 8869)
("sdot" . 8901)
("lceil" . 8968)
("rceil" . 8969)
("lfloor" . 8970)
("rfloor" . 8971)
("lang" . 9001)
("rang" . 9002)
("loz" . 9674)
("spades" . 9824)
("clubs" . 9827)
("hearts" . 9829)
("diams" . 9830)))
Sorry about non-printable entities, I couldn't get my macros to format them correctly.

PythonNut
- 10,243
- 2
- 29
- 75
-
It would be great to submit these entities to `web-mode`'s maintainers. Also see `lisp/org/org-entities.el` which is part of Emacs currently and has a much better list. It seems that this could be abstracted into a general list that all modes could use... – Ted Zlatanov Feb 12 '15 at 10:26
-
This is really cool! Thank you! How is something like this not part of standard emacs or at least something like web mode?! – Max Feb 12 '15 at 12:27
-
I would accept all the entities that web-mode users need. I really do not need to have "all" the entities – fxbois Feb 12 '15 at 13:46