is it feasible to make one?
Since this is emacs, yes.
My approach is to use a 3rd party tools that can take HTML and convert to plain text or even directly to Org format. I think this is an ugly hack, and there may be better ways to do this, but it looks like it works for my test cases.
(defun kdm/html2org-clipboard ()
"Convert clipboard contents from HTML to Org and then paste (yank)."
(interactive)
(kill-new (shell-command-to-string "osascript -e 'the clipboard as \"HTML\"' | perl -ne 'print chr foreach unpack(\"C*\",pack(\"H*\",substr($_,11,-3)))' | pandoc -f html -t json | pandoc -f json -t org | sed 's/ / /g'"))
(yank))
Unfortunately, HTML is incredibly complex now - no longer some simple hand-written tags. This complex HTML tagging requires the complicated shell command above. It does the following:
osascript gets the HTML text from the clipboard. It is hex encoded, so
- perl converts the hex to a string
- We could convert that HTML to Org directly with pandoc, but the HTML is full of complicated tags and therefore produces a ton of Org code. In order to simply the HTML to the minimal set of tags needed to capture the formatting, I
- Convert the HTML to json, and then
- Convert the json to Org (these two steps simplify the HTML).
- Replace non-standard spaces with standard ones.
Note that osascript is for MacOS. To modify steps 1-2 for Linux, replace the argument of shell-command-to-string with
"xclip -o -t text/html | pandoc -f html -t json | pandoc -f json -t org"
In any case, the output of the pandoc command is returned to emacs, and inserted into the buffer.
Bind the new Emacs command to a key similar to "paste" but that means "paste-and-convert-from-html" to you, and it should work.
Alternatively, if you don't want to think about which paste command to use, here is a Linux version that will convert HTML when that is available on the clipboard and will otherwise fall back to plain text:
"xclip -o -t TARGETS | grep -q text/html && (xclip -o -t text/html | pandoc -f html -t json | pandoc -f json -t org) || xclip -o"
I've seen somebody suggest using
`. If I copy this paragraph, I want to be able to reproduce its formatting in `orgmode`. – xji May 05 '15 at 11:22ewwto browse the web and copy the content viaeww-org. However that is really tedious(I don't think there would be a lot of people browsing the web usingewwinstead of modern browsers nowadays. I'll have to open that link again inewwand do the copying, not to mention sometimesewwdoesn't render the contents nicely).