5

I like to read the Emacs manual on a hard copy but I would like to have a copy in org format to kill parts that I have understood and so to only keep what's important to remember (titles, functions and keybindings mostly).

I have tried with pandoc from the Html version to Org but there is a lot of garbage in the file. I suppose the solution is to convert the texi files to html and after to org with pandoc to get a cleaner result.

I have seen that there is many HTML Customization Variables that can be passed to texi2any translator. Which ones do I have to use to get a nice output for pandoc?

Thx.

Drew
  • 75,699
  • 9
  • 109
  • 225
  • 1
    I think you'll get better results if you go from `.texi` to `.docbook` and then via `pandoc` to `.org`. – Zeta Feb 06 '19 at 21:03
  • Consider @Zeta's suggestion. `pandoc` can take TeXinfo as input, so you don't need to convert from HTML, and it might yield better results. –  Feb 06 '19 at 21:13
  • 1
    @DoMiNeLa10 Unfortunately `pandoc` doesn't have a TeXinfo *reader*. It only supports TeXinfo as output format, see https://pandoc.org/MANUAL.html#general-options for more information. An intermediate step is still necessary. – Zeta Feb 06 '19 at 21:24
  • My bad then, sorry. –  Feb 06 '19 at 22:02
  • @gusbrs That's the *Org* manual, but OP wants the *Emacs* manual. – Zeta Feb 07 '19 at 07:40
  • Can you give some more details on which parts of pandoc's output you'd like to avoid? Pandoc allows a lot of customizations. E.g., if you disable div-parsing by writing `pandoc -f html-native_divs -t org`, then you won't be bothered with anchors and the like. – tarleb Feb 07 '19 at 19:35

2 Answers2

4
  1. Download the Emacs source from a GNU mirror and unpack it with tar xf.
  2. Run makeinfo --docbook doc/emacs/emacs.texi -o emacs.docbook to create an intermediate DocBook.
  3. Run pandoc --from docbook --to org --out emacs.org emacs.docbook to create your org file. Note that you really should add the document types explicitly, at least for the input type. Otherwise Pandoc will use a lot of memory as it tries to guess the correct type. At least in my virtual machine (4GB memory), Pandoc 2.6 crashed without --from.

Unfortunately, there will still be some noise, as the @cindex and @kindex directives aren't filtered out by Pandoc. You can remove them beforehand with sed or other similar tools:

sed -i 's#^@[ck]index .*$##g'

All in all, you should get your desired results with the following script.

#!/bin/sh
EMACS_VERSION=26.1
EMACS_DIRECTORY=emacs-${EMACS_VERSION}
EMACS_DL_FILE=${EMACS_DIRECTORY}.tar.xz
EMACS_PACKAGE_URL=http://ftpmirror.gnu.org/emacs/${EMACS_DL_FILE}

# Get Emacs source
wget ${EMACS_PACKAGE_URL}
tar xf ${EMACS_DL_FILE}

# Remove keyboard and concept indices
sed -i 's#^@.index .*$##g' ${EMACS_DIRECTORY}/doc/emacs/*.texi

# Create the DocBook
makeinfo --docbook ${EMACS_DIRECTORY}/doc/emacs/emacs.texi -o emacs.docbook

# Create the Org file
pandoc --from docbook --to org -o emacs.org emacs.docbook

Keep in mind that you can heavily customize Pandocs output with Lua, so if you don't like the output, try your hands on a Lua filter.

Zeta
  • 1,045
  • 9
  • 18
2

...only keep what's important to remember...

This doesn't answer your question directly, but it suggests some alternatives you might want to consider.

  1. If you use library Info+ then you can easily create a virtual manual of nodes you want "save". In Info mode:

    • . (Info-save-current-node) adds the name of the current node to the list value of option Info-saved-nodes.
    • v (Info-virtual-book) opens a virtual Info manual of the nodes (from any number of manuals) you saved using .. With prefix arg C-u the virtual book includes bookmarked Info nodes.
  2. If you use library Info+ then you can use minor mode Info-persist-history-mode to save the list of your visited Info nodes between Emacs sessions (i.e., persist them). Together with L (Info-history), this gives you a persistent virtual manual of the nodes you have visited in the past.

  3. If you use library Info+ then you can use C-x DEL (Info-change-visited-status) to toggle or set the visited status of the node at point or the nodes in the active region.

    This is useful if you use non-nil Info-fontify-visited-nodes to show you which nodes you have visited (and thus to control the content of the virtual book that L shows you). No prefix arg: toggle. Non-negative prefix arg: set to visited. Negative prefix arg: set to unvisited. Use it to not consider some nodes as already visited.

  4. If you use library Bookmark+ then you can bookmark Info nodes, including automatically. This records how many times you have visited each bookmarked node and when you last did so (which can give you an idea of how important given nodes are to you).

Drew
  • 75,699
  • 9
  • 109
  • 225