1

On Ubuntu, how to reformat text to fit width (except final line), adding spaces where necessary? The closest I can get is with fmt --width=64, but this does not add the spaces between words.

Input

  • snippet taken from man zip with all line breaks removed and double spaces turned into single spaces
Do not operate on files modified prior to the specified date, where mm is the month (00-12), dd is the day of the month (01-31), and yyyy is the year. The ISO 8601 date format yyyy-mm-dd is also accepted. For example:

fold --width=64 output

  • breaks words, which is undesirable
Do not operate on files modified prior to the specified date, wh
ere mm is the month (00-12), dd is the day of the month (01-31),
 and yyyy is the year. The ISO 8601 date format yyyy-mm-dd is al
so accepted. For example:

fmt --width=65 output

  • almost perfect, but also need to add spaces between words
Do not operate on files modified prior to the specified date,
where mm is the month (00-12), dd is the day of the month
(01-31), and yyyy is the year. The ISO 8601 date format
yyyy-mm-dd is also accepted. For example:

Wanted output

  • snippet taken from man zip
  • where the double/triple spaces are inserted is not important to me, as long as the line fits the specified width and the words are more or less evenly-spaced
Do not operate on files modified prior to  the  specified  date,
where  mm  is  the  month  (00-12),  dd  is the day of the month
(01-31), and  yyyy  is  the  year.   The  ISO 8601  date  format
yyyy-mm-dd is also accepted.  For example:
  • that's such an old problem, and there's solution with built-in language features, but it seems nobody wrapped them in a tool. Hm! – Marcus Müller Mar 13 '24 at 00:14
  • Related? https://unix.stackexchange.com/a/766277/227738 – jubilatious1 Mar 13 '24 at 02:59
  • Maybe closer? https://blogs.perl.org/users/damian_conway/2019/08/greed-is-good-balance-is-better-beauty-is-best.html – jubilatious1 Mar 13 '24 at 03:03
  • GNU/awk is a possibilty. (a) Use RS="" to read whole paragraphs at once. (b) Trim off the first 65 chars, and backtrack to a word gap. (c) Insert required n spaces by replacing one space in front of fields (NF - n + 1) to (NF). (d) Repeat {a..c}. (e) Avoid changing the last short line of a paragraph. – Paul_Pedant Mar 13 '24 at 09:04
  • @Paul_Pedant well, the art is in actually implementing that :) I think your comment makes a nice answer, but you'd have to provide a bit of a skeleton for that script – Marcus Müller Mar 13 '24 at 12:20
  • @MarcusMüller That is the skeleton -- the rest is just implementation details ;-) I am wondering how to put in more than one space per word -- some more arithmetic needed there. I read the perl link above -- very deep analysis there. – Paul_Pedant Mar 13 '24 at 14:10
  • @Paul_Pedant yeah, that's why I opted for a python implementation; basically, how (I think) you'd approach that is 1) if the "deficit" in characters on this line is larger than the number of existing spaces, simply multiply these spaces by an integer factor so that the deficit is between 0 and (number of space - 1). – Marcus Müller Mar 13 '24 at 14:16
  • @Paul_Pedant 2) After that, you find the ratio of spaces to deficit; if it's larger larger than an integer N, you first add another space to every Nth space. You choose an offset for the first so that long runs of spaces on this line don't align with long runs on the previous. – Marcus Müller Mar 13 '24 at 14:18
  • @Paul_Pedant 3) After that, you randomly distribute the remaining deficit on the shortest runs of spaces. – Marcus Müller Mar 13 '24 at 14:19
  • @Paul_Pedant I found that 2) indeed makes the whole thing look prettier, but that skipping 2) after doing 1) was good enough, seeing we're not going to do context-aware spacing nor any kind of ligature / kerning on a monospace font console. – Marcus Müller Mar 13 '24 at 14:21

2 Answers2

3

You may need to consider using nroff for that job. Here is an example of how to use it with your text in a file named infile:

$ 2>/dev/null nroff <(echo .pl 1 ; echo .ll 40) infile
Do  not  operate on files modified prior
to the specified date, where mm  is  the
month  (00‐12),  dd  is  the  day of the
month (01‐31), and yyyy is the year. The
ISO  8601 date format yyyy‐mm‐dd is also
accepted. For example:

.pl 1 roff markup sets the page height to a single line, thus disabling pagination.

.ll 40 sets the line length to 40 characters.

nroff is a special markup formatting utility with excellent possibilities for customization.

user9101329
  • 1,004
  • really nice! I must admit I've never gotten used to roff, but this is exactly what it's for! – Marcus Müller Mar 13 '24 at 13:17
  • Is there any reason to use 2>/dev/null? I see no difference with and without it. – FromTheStackAndBack Mar 13 '24 at 16:01
  • Is there also any way to not break on hyphenation? (so that yyyy‐mm‐dd is on one line with a width of 64 characters per line) I will check the man page for nroff in a bit to see if I can answer this myself. – FromTheStackAndBack Mar 13 '24 at 16:02
  • 2>/dev/null is not strictly necessary. It simply redirects the standard error to /dev/null thus suppressing any possible error messages. – user9101329 Mar 13 '24 at 18:32
  • I believe you can add the .nh directive to disable hyphenation. Try this: 2>/dev/null nroff <(echo .pl 1 ; echo .ll 64 ; echo .nh) infile – user9101329 Mar 13 '24 at 18:42
  • @user9101329 the .nh directive prevents adding additional hyphenation (for example, breaking acknowledging into acknowledg-<newline>ing), and still breaks yyyy-mm-dd into yyyy-mm-<newline>dd – FromTheStackAndBack Mar 13 '24 at 21:36
0

You won't have much joy with this if you don't add support for localized hyphenation; 65 letters is just too short, especially if your text, unlike the excerpt you've used, consists of longer composite words. German and Finnish speakers will hate you if you try to justify a line with 10 letters on it, because it's two words between a 35-letter and a 40-letter word.

Anyways, assuming you don't care that much about typography, it's not that hard: Python brings the textwrap module, and that does the "breaking into lines of at most 65 characters", and all you need to do is add the missing whitespaces.

Something like this script. (You're welcome!)

Download the script justify.py, (put it somewhere in your $PATH, if you don't want to specify the full path to the script), and chmod 755 /path/to/justify.py. Then you can run

echo 'Do not operate on files modified prior to the specified date, where mm is the month (00-12), dd is the day of the month (01-31), and yyyy is the year. The ISO 8601 date format yyyy-mm-dd is also accepted. For example:' \
     | /path/to/justify.py 65 \
     | cowsay -n

to get

 ___________________________________________________________________ 
/ Do  not operate  on files modified  prior to the specified  date, \
| where  mm  is  the  month (00-12),  dd  is the  day  of the month |
| (01-31), and yyyy is the year. The ISO 8601 date format  yyyy-mm- |
\ dd is also accepted. For example:                                 /
 ------------------------------------------------------------------- 
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

I chose to allow it to break on -; if you don't want that, modify the wrapper = textwrap.Textwrapper(… line to include break_on_hyphens=False.