Using many tools like pandoc
, we can convert a text or HTML file to DOCX file. But there is a problem, the output files must be multi-page. There must be something in the input file that indicates where to insert page breaks.
Is there any utility to use from terminal to convert .TXT or .HTML files to .DOCX with a markup (or any other method) which split pages?
I have a system which extracts text from other sources. I don't have any access to DOCX generators on the system but I can create text files. So my idea is to generate text files like this:
Page 1 from 2:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
________________________ [NEWLINE_HERE]
Page 2 from 2:
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.
And send them to another server which acts as an API to convert this file to DOCX file format but instead of inserting entire text file as a single page, turn them into a single file with multiple pages marked with a NEWLINE feed.
Please let me know if I couldn't explain it.