is it feasible to make one?
Since this is emacs, yes.
My approach is to use a 3rd party tools that can take HTML and convert to plain text or even directly to Org format. I think this is an ugly hack, and there may be better ways to do this, but it looks like it works for my test cases.
(defun kdm/html2org-clipboard ()
"Convert clipboard contents from HTML to Org and then paste (yank)."
(interactive)
(kill-new (shell-command-to-string "osascript -e 'the clipboard as \"HTML\"' | perl -ne 'print chr foreach unpack(\"C*\",pack(\"H*\",substr($_,11,-3)))' | pandoc -f html -t json | pandoc -f json -t org | sed 's/ / /g'"))
(yank))
Unfortunately, HTML is incredibly complex now - no longer some simple hand-written tags. This complex HTML tagging requires the complicated shell command above. It does the following:
osascript
gets the HTML text from the clipboard. It is hex encoded, so
- perl converts the hex to a string
- We could convert that HTML to Org directly with pandoc, but the HTML is full of complicated tags and therefore produces a ton of Org code. In order to simply the HTML to the minimal set of tags needed to capture the formatting, I
- Convert the HTML to json, and then
- Convert the json to Org (these two steps simplify the HTML).
- Replace non-standard spaces with standard ones.
Note that osascript
is for MacOS. To modify steps 1-2 for Linux, replace the argument of shell-command-to-string with
"xclip -o -t text/html | pandoc -f html -t json | pandoc -f json -t org"
In any case, the output of the pandoc
command is returned to emacs, and inserted into the buffer.
Bind the new Emacs command to a key similar to "paste" but that means "paste-and-convert-from-html" to you, and it should work.
Alternatively, if you don't want to think about which paste command to use, here is a Linux version that will convert HTML when that is available on the clipboard and will otherwise fall back to plain text:
"xclip -o -t TARGETS | grep -q text/html && (xclip -o -t text/html | pandoc -f html -t json | pandoc -f json -t org) || xclip -o"
I've seen somebody suggest using
`. If I copy this paragraph, I want to be able to reproduce its formatting in `orgmode`. – xji May 05 '15 at 11:22eww
to browse the web and copy the content viaeww-org
. However that is really tedious(I don't think there would be a lot of people browsing the web usingeww
instead of modern browsers nowadays. I'll have to open that link again ineww
and do the copying, not to mention sometimeseww
doesn't render the contents nicely).