2

Using many tools like pandoc, we can convert a text or HTML file to DOCX file. But there is a problem, the output files must be multi-page. There must be something in the input file that indicates where to insert page breaks.

Is there any utility to use from terminal to convert .TXT or .HTML files to .DOCX with a markup (or any other method) which split pages?

I have a system which extracts text from other sources. I don't have any access to DOCX generators on the system but I can create text files. So my idea is to generate text files like this:

Page 1 from 2:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

________________________ [NEWLINE_HERE]

Page 2 from 2:

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.

And send them to another server which acts as an API to convert this file to DOCX file format but instead of inserting entire text file as a single page, turn them into a single file with multiple pages marked with a NEWLINE feed.

Please let me know if I couldn't explain it.

joyaware
  • 21
  • 2
  • 1
    http://stackoverflow.com/a/16972170/5207302 – Julie Pelletier Aug 23 '16 at 15:48
  • What do you mean by “those files must be multi-page”? You can convert a text file with a single page just fine and of course HTML files don't even have a notion of pages. What is wrong with the output of pandoc? – Gilles 'SO- stop being evil' Aug 23 '16 at 22:16
  • @Gilles I have updated my question, please read the updated section. – joyaware Aug 24 '16 at 01:53
  • @joware I still don't understand “single file with multiple pages marked with a NEWLINE feed”. Hmmm. Do you mean that you want to have something in the text file to force a page break in the output document at specific locations? E.g. in your example, is the difficulty to force a page break between the mock Latin and the English? – Gilles 'SO- stop being evil' Aug 24 '16 at 06:45
  • @Gilles Yes this is exactly what I want. The text file will have some marks (not really as a multi-page file because .TXT can't be multi-page) and force the output in DOCX format to convert those marks to real page breaks. – joyaware Aug 24 '16 at 10:26
  • Ok, this clarifies it. I've voted to reopen. This isn't possible in pandoc, the author discussed some patches but they weren't applied. – Gilles 'SO- stop being evil' Aug 24 '16 at 11:03

0 Answers0