4

I use Pandoc to convert from HTML and Markdown to org-mode. The conversion leaves lots of extraneous org-mode PROPERTIES drawers. How can I direct org-mode to delete all PROPERTIES drawers?

ADDITION: I'm using this shell script to run the conversion.

function pandoc2org () {
    basename=${1%%.*}                       # remove old file extension
    pandoc -s -S "$1" -o $basename.org      # name file as oldfile.org 
}
incandescentman
  • 4,111
  • 16
  • 53
  • 1
    Why not a shell script to remove all lines starting with : ? You can pipe the output of pandoc directly into it. Or are there other drawers after the conversion too? – Daniel Apr 03 '17 at 23:24
  • @dangom How would I do that? – incandescentman Apr 03 '17 at 23:26
  • 1
    Tried `pandoc -t org file.md | sed '/^:/ d'` to do that right now. Problem is the indentation (pandoc adding whitespaces). I'll get it working and post as an answer in a min – Daniel Apr 03 '17 at 23:32

2 Answers2

4

If you don't mind using a shell script to accomplish what you want, you can try:

pandoc -t org file.md | sed -E "/^[[:space:]]+:/ d" > file.org

This will remove all lines starting with one (or more) spaces followed by a semicolon.

I believe you can fit it into your script by doing the following:

function pandoc2org () {
    basename=${1%%.*}                      
    pandoc -s -S "$1" -t org | sed -E "/^[[:space:]]+:/ d" > $basename.org
}

You can find an explanation of the function here.

Daniel
  • 3,563
  • 16
  • 41
  • 1
    Let me know if it doesn't work – Daniel Apr 03 '17 at 23:37
  • Thanks! How would I add this to my existing script? – incandescentman Apr 03 '17 at 23:44
  • @incandescentman just added an edit – Daniel Apr 03 '17 at 23:57
  • Do you also know how to get the sed command to replace non-breaking spaces (`codepoint 160, #o240, #xa0`) with regular spaces? I tried adding `sed 's/\xA0/ /g'` but it didn't work. – incandescentman Apr 04 '17 at 03:08
  • Here's the script. The `;` format is correct but it's not finding the non-breaking space. https://gist.github.com/2aa13fbda1c12c7253bbedd6e6333a7a – incandescentman Apr 04 '17 at 03:13
  • Maybe have a look [at this superuser thread](https://superuser.com/questions/517847/use-sed-to-replace-nbsp-160-hex-00a0-octal-240-non-breaking-space)? – Daniel Apr 04 '17 at 09:08
  • 1
    Reading the [documentation](http://pandoc.org/MANUAL.html) you'll see that the -S option you are using in your script forces the insertion of non-breaking spaces. If there is no particular reason for you to use it, remove it. – Daniel Apr 04 '17 at 09:14
  • Wow you're right. `-S` is definitely the wrong option for converting to org-mode, which already handles all that typographic stuff when exporting. – incandescentman Apr 04 '17 at 17:01
0

I use

perl -0777 -pe 's/^\s*(:PROPERTIES:(.|\n)*?:END:)|(<<.*?>>)\s*$//gm'

to remove the extra stuff.

Also see this question.

HappyFace
  • 751
  • 4
  • 16