From the solution linked to by ista (direct link to the solution), you can create a pandoc filter, say in file nodivs-filter.hs
import Text.Pandoc.JSON
main = toJSONFilter nodivs
where nodivs (Div _ bs) = bs
nodivs b = [b]
You then compile the filter with ghc: ghc nodivs-filter.hs
. Finally, you use the filter when converting, as follows:
pandoc --filter ./nodivs-filter input-file.html -o output.org
In order to compile the pandoc filter, you need to have the relevant libraries. For instance, on Ubuntu, you'd need the libghc-pandoc-types-dev
package (sudo apt-get install libghc-pandoc-types-dev
). More generally, you could also try installing via cabal
(cabal install pandoc
).
To understand the haskell filter
The relevant hackage documentation is here and here.
Re-writing the program in long form, and adding comments (starting with --
and hopefully useful for somebody not used to haskell):
import Text.Pandoc.JSON
main = toJSONFilter nodivs
-- Type signature (convert a block to a list of blocks)
nodivs :: Block -> [Block]
--- Case when our input block is a Div
-- Div constructors have the form
-- Div Attr [Block]
-- _ means we ignore the attribute (Attr)
nodivs (Div _ bs) = bs
--- Fall through (any other type of block)
-- bs (above) is a list of blocks, so to have consistent types
-- we must convert our fall though block into a one-member list of blocks
nodivs b = [b]
Some alternatives
These all come from this thread on pandoc's github.
Disable the native_divs
extension
In your case:
pandoc -f html-native_divs -t org -o output.org R\ Seminar:\ Introduction\ to\ ggplot2.htm
(-f html-native_divs
means from html, without native_divs)
Use pandoc 2.0
AFAICT from the above-mentioned thread, the defaults will become slightly more convenient.