I think this is best solved on the level of pandoc conversion, with a pandoc filter.
Create a file, say noattrs-filter.hs
containing:
import Text.Pandoc.JSON
main = toJSONFilter noAttrs
noAttrs :: Block -> Block
noAttrs (Header n _ i) = Header n nullAttr i
noAttrs (Div _ b) = Div nullAttr b
noAttrs b = b
Compile the file with ghc:
ghc noattrs-filter.hs
and run your conversion with:
pandoc -s --filter ./noattrs-filter test.docx -o test.org
In order to compile the pandoc filter, you need to have the relevant libraries. For instance, on Ubuntu, you'd need the libghc-pandoc-types-dev
package (sudo apt-get install libghc-pandoc-types-dev
). More generally, you could also try installing via cabal
(cabal install pandoc
).
To understand the haskell filter
The relevant hackage documentation is here and here.
Adding comments to the code (starting with --
and hopefully useful for somebody not used to haskell):
import Text.Pandoc.JSON
main = toJSONFilter noAttrs
-- Type signature (convert a block into a slightly modified block)
noAttrs :: Block -> Block
-- Header constructors have the form:
-- Header Int Attr [Inline]
-- _ means we ignore the attribute (Attr), since we're discarding it, anyway
-- nullAttr is, as the name suggests, an attribute containing no info
noAttrs (Header n _ i) = Header n nullAttr i
-- Div constructors have the form:
-- Div Attr [Block]
noAttrs (Div _ b) = Div nullAttr b
-- for completeness could also deal with 'CodeBlock's, which can also
-- have an attribute — left as an exercise for the reader.
--
-- we need a fallthrough
noAttrs b = b
(This is based on my own answer to a relatively similar question here.)