How can I wrap text at a certain column size?

Question

I know that I can use something like cat test.txt | pr -w 80 to wrap lines to 80 characters wide, but that puts a lot of space on the top and bottom of the printed lines and it does not work right on some systems

What's the best way to force a text file with long lines to be wrapped at a certain width?

Bonus points if you can keep it from breaking words.

score 265 · Accepted Answer · edited Feb 26 '15 at 15:44

265

You are looking for

fold -w 80 -s text.txt

-w tells the width of the text, where 80 is standard.
-s tells to break at spaces, and not in words.

This is the standard way, but there are other systems, which need "-c" instead of "-w".

edited Feb 26 '15 at 15:44

Stéphane Chazelas

544,893

answered Nov 23 '11 at 06:07

Rainer Bendig

4,166

Works on OS X, too, but filename needs to be after args. Thanks! – rdrey Sep 02 '14 at 22:13
3

On a side note, to nicely format e-mails for text-only reply, I use: fold -s -w 80 email.txt | sed 's/^.*$/> &/' – Marcello Romani Feb 10 '15 at 21:10
3

@MarcelloRomani, shouldn't you use a width of 78 since you're prepending two characters? – nanny Feb 26 '15 at 14:59
1

Hmm... I guess so. Thanks for pointing that out :) – Marcello Romani Feb 27 '15 at 12:35
3

Is there something like fold that lets you specify a string to wrap on? – will Feb 01 '17 at 02:30
5

Note that fold breaks urls, while fmt does not. – Skippy le Grand Gourou Mar 28 '17 at 11:05
1

any idea what happens when the folded file is markdown and contains URLs that are longer than width specified. – Richard Dec 13 '19 at 12:32
2

Works nicely in mingw, but it leaves trailing whitespace at the end of some lines. Is there a neat way of fixing this? – Max Barraclough Feb 09 '20 at 16:32
1

@Richard if you're asking what it does with really long words, it would seem to force breaks on non-spaces where necessary. Try fold -w 5 -s <<< 123456 – mwfearnley Mar 12 '21 at 12:25
1

@MaxBarraclough fmt doesn't leave trailing whitespace. If you need to use fold instead of fmt, you can add a bit of Perl at the end to strip out trailing whitespace. fold -s -w 80 file.txt | perl -pe 's/ +$//' – Jonathan Jul 05 '22 at 12:28
limitation: fold -w80 -s fails on unicode text. better: pandoc input.txt -t plain --wrap=auto --columns=80. but pandoc modifies the text: strips xml tags, replaces ascii quotes with unicode quotes, ... see pandoc issue: add input format plain – milahu Oct 05 '23 at 08:54
Know a way to use fold but with a delimiter that I can create columns (not using column). I want to have multilined columns.. So almost using fold to set the width of columns with multiple lines within each of the column spaces – ikwyl6 Oct 25 '23 at 23:14
@milahu if you've found a Unicode solution, could you post a full answer? I've put up a bare-bones, Unicode-aware answer written in Raku (a.k.a. Perl6), and would like to compare output (See: https://unix.stackexchange.com/a/766277/227738 ). Thx! – jubilatious1 Jan 08 '24 at 07:04

score 80 · Answer 2 · edited Sep 14 '15 at 08:09

80

In addition to fold, take a look at fmt. fmt tries to choose line breaks intelligently to make text look good. It doesn't break long words, rather it wraps only by spaces. It will also join adjacent lines, which is good for prose but bad for log files or other formatted text.

edited Sep 14 '15 at 08:09

user2683246

685

answered Nov 23 '11 at 18:28

Jonathan

1,304

5

I especially like fmt -t compared to fold – lkraav Dec 24 '12 at 21:26
fmt works well with markdown-style paragraphs as well (though I would recommend manual wrapping on sub-sentence borders) – hoijui Nov 26 '21 at 20:04
1

It also looks like fmt does not leave trailing spaces behind like fold does. – Taylor D. Edmiston Jan 25 '22 at 17:57
limitation: fmt -w80 fails on unicode text – milahu Oct 05 '23 at 08:18

user2683246 · Answer 3 · 2024-03-22T20:37:02.403

$ cat shxp.txt

O, they have lived long on the alms-basket of words, I marvel thy
master hath not eaten thee for a word; for thou art not so long by the
head as honorificabilitudinitatibus: thou art easier swallowed than a
flap-dragon.

1) Assured fixed line width with word breaking:

fold -w 20 <shxp.txt
O, they have lived l
ong on the alms-bask
et of words, I marve
l thy master hath no
t eaten thee for a w
ord; for thou art no
t so long by the hea
d as honorificabilit
udinitatibus: thou a
rt easier swallowed
than a flap-dragon.

2) Assured fixed line width with extraordinary word breaking. A word gets broken only if it is too large to fit in a line:

fold -sw 20 <shxp.txt
O, they have lived
long on the
alms-basket of
words, I marvel thy
master hath not
eaten thee for a
word; for thou art
not so long by the
head as
honorificabilitudini
tatibus: thou art
easier swallowed
than a flap-dragon.

3) Promising fixed line width without any word breaking. If word is too large to fit in a line, it is still left as it is, so finally some lines may be larger in size than you need:

fmt -w 20 <shxp.txt
O, they have
lived long on the
alms-basket of
words, I marvel
thy master hath
not eaten thee
for a word; for
thou art not so
long by the head as
honorificabilitudinitatibus:
thou art easier
swallowed than
a flap-dragon.

Note that fmt also tries to balance ragged paragraph lines unlike fold -s.

4) Perhaps, the most typographically sophisticated way of solving the problem due to a special markup language and formatting utility used under the hood of the man program. Great possibilities for additional customization:

2>/dev/null nroff <(echo .pl 1 ; echo .ll 20) shxp.txt
O,  they  have lived
long  on  the  alms‐
basket  of  words, I
marvel  thy   master
hath  not eaten thee
for a word; for thou
art  not  so long by
the head as  honori‐
ficabilitudinitati‐
bus: thou art easier
swallowed   than   a
flap‐dragon.

.pl 1 roff markup sets the page height to a single line, effectively disabling pagination.

.ll 20 sets the line length to 20 characters.

Putting the markup in a separate file will simplify the command:

$ cat markup.roff
.pl 1
.ll 20
$ 2>/dev/null nroff markup.roff shxp.txt

In order for nroff to work with Unicode, the text can be pre-converted using preconv:

$ 2>/dev/null nroff markup.roff <(preconv shxp.txt)

I really appreciate seeing a real text example with different options. I have been trying to write a Python version of wrap, but was unsatisfied with the long-word handling. Having the wrap and the fwt option for longer-than-specified words is very nice. — bballdave025, Sep 04 '20 at 18:31
nroff looks nice in your example, but turns å into Ã¥ and mangles ansi colouring. Can the nroff thing be made to deal with unicode etc. or is this a hack where the input actually should be formatted like a man page source file? — unhammer, Jan 03 '24 at 14:46
@unhammer Using preconv of the text makes nroff unicode enabled. — user2683246, Mar 22 '24 at 20:39
Nice, thanks @user2683246 . preconv+nroff seems like the winner for human-readable word wrapping with standard unix tools :) — unhammer, Mar 24 '24 at 21:23

score 15 · Answer 4 · answered Jun 24 '13 at 15:34

15

Another (less known) tool that does what you want is wrap from GNU Talkfilters:

wrap -w 80 < textfile

Also (off topic):

but that puts a lot of space on the top and bottom of the printed lines

add -t when invoking pr to omit headers/trailers:

   -t, --omit-header
          omit page headers and trailers

answered Jun 24 '13 at 15:34

don_crissti

82,805

1

Does not appear to be apart of Ubuntu 20.04. – Dave Apr 27 '22 at 01:02
1

limitation: wrap -w80 fails on unicode text – milahu Oct 05 '23 at 09:02

score 11 · Answer 5 · answered Apr 21 '13 at 13:45

11

And for more formatting options, look at par -- http://www.nicemice.net/par/

answered Apr 21 '13 at 13:45

sendmoreinfo

2,573

5

Currently the web site is down, there is the Internet Archive and Google's cache but still this shows why it's important to post more than just links, you could have at least posted the examples from the official documentation. – phk Dec 27 '16 at 16:31

jubilatious1 · Answer 6 · 2024-01-09T06:29:03.787

Using Raku (formerly known as Perl_6)

[ Posting this because a number of U&L users have commented that some previous answers don't work with Unicode ].

Raku is a programming language in the Perl-family that features high-level support for Unicode. Raku normalizes all non-filename/non-filepath text to Normalization Form C (NFC) by default. Thus "graphemes, which are user-visible forms of the characters, will use a normalized representation" (i.e. normalized codepoints/width, see Unicode links at bottom for details).

Immediately below is an approach to solving the easier of the OP's requests (i.e. break text exactly at a desired column-width, irrespective of words/whitespace. The code is based on Raku's comb routine, and is written such that paragraphs (\n\n-separated or greater) are maintained separate with a single blank line in between. (Thanks to @user2683246 for the example text):

1. Break text/words at a desired column-width:

Sample Input:

~$ cat shxp_X2.txt
O, they have lived long on the alms-basket of words, I marvel thy
master hath not eaten thee for a word; for thou art not so long by the
head as honorificabilitudinitatibus: thou art easier swallowed than a
flap-dragon.
O, they have lived long on the alms-basket of words, I marvel thy
master hath not eaten thee for a word; for thou art not so long by the
head as honorificabilitudinitatibus: thou art easier swallowed than a
flap-dragon.

Code with Sample Output (wrapped to <= 40 characters wide):

~$ raku -e 'my $wrap = 40; for slurp.split(/ \n**2..* /) { .subst(:global, / \n /, " ") andthen .put for $_.comb($wrap); put ""; };'   shxp_X2.txt
O, they have lived long on the alms-bask
et of words, I marvel thy master hath no
t eaten thee for a word; for thou art no
t so long by the head as honorificabilit
udinitatibus: thou art easier swallowed 
than a flap-dragon.
O, they have lived long on the alms-bask
et of words, I marvel thy master hath no
t eaten thee for a word; for thou art no
t so long by the head as honorificabilit
udinitatibus: thou art easier swallowed 
than a flap-dragon.

2. Break between words (i.e. on whitespace) at desired column-width:

The code immediately below uses Raku's words routine which breaks on whitespace. Below are example lines in over 30 Unicode Scripts, wrapped to <= 72 characters wide:

~$ raku -e 'my  $wrap = 72; my   $tmp = 0; 
            for lines() {   my $ln-ch = $_.chars;  
                if  $ln-ch == 0 { "\n".say; $tmp = 0; next };    
                for $_.words -> $w {   my  $w-ch = $w.chars;  
                    $wrap >=  ($tmp + $w-ch)        
                    ?? (   "$w".print andthen $tmp += $w-ch )  
                    !! ( "\n$w".print andthen $tmp  = $w-ch );  
                    if ($wrap > $tmp) { " ".print andthen ++$tmp };  
                }   
            };'   file

Sample Input (from The Kermit Project):

English: The quick brown fox jumps over the lazy dog.
Jamaican: Chruu, a kwik di kwik brong fox a jomp huova di liezi daag de, yu no siit?
Irish: "An ḃfuil do ċroí ag bualaḋ ó ḟaitíos an ġrá a ṁeall lena ṗóg éada ó ṡlí do leasa ṫú?" "D'ḟuascail Íosa Úrṁac na hÓiġe Beannaiṫe pór Éava agus Áḋaiṁ."
Dutch: Pa's wĳze lynx bezag vroom het fikse aquaduct.
German: Falsches Üben von Xylophonmusik quält jeden größeren Zwerg. (1)
German: Im finſteren Jagdſchloß am offenen Felsquellwaſſer patzte der affig-flatterhafte kauzig-höf‌liche Bäcker über ſeinem verſifften kniffligen C-Xylophon. (2)
Norwegian: Blåbærsyltetøy ("blueberry jam", includes every extra letter used in Norwegian).
Swedish: Flygande bäckasiner söka strax hwila på mjuka tuvor.
Icelandic: Sævör grét áðan því úlpan var ónýt.
Finnish: (5) Törkylempijävongahdus (This is a perfect pangram, every letter appears only once. Translating it is an art on its own, but I'll say "rude lover's yelp". :-D)
Finnish: (5) Albert osti fagotin ja töräytti puhkuvan melodian. (Albert bought a bassoon and hooted an impressive melody.)
Finnish: (5) On sangen hauskaa, että polkupyörä on maanteiden jokapäiväinen ilmiö. (It's pleasantly amusing, that the bicycle is an everyday sight on the roads.)
Polish: Pchnąć w tę łódź jeża lub osiem skrzyń fig.
Czech: Příliš žluťoučký kůň úpěl ďábelské ódy.
Slovak: Starý kôň na hŕbe kníh žuje tíško povädnuté ruže, na stĺpe sa ďateľ učí kvákať novú ódu o živote.
Slovenian: Šerif bo za domačo vajo spet kuhal žgance.
Greek (monotonic): ξεσκεπάζω την ψυχοφθόρα βδελυγμία
Greek (polytonic): ξεσκεπάζω τὴν ψυχοφθόρα βδελυγμία
Russian: Съешь же ещё этих мягких французских булок да выпей чаю.
Russian: В чащах юга жил-был цитрус? Да, но фальшивый экземпляр! ёъ.
Bulgarian: Жълтата дюля беше щастлива, че пухът, който цъфна, замръзна като гьон.
Sami (Northern): Vuol Ruoŧa geđggiid leat máŋga luosa ja čuovžža.
Hungarian: Árvíztűrő tükörfúrógép.
Spanish: El pingüino Wenceslao hizo kilómetros bajo exhaustiva lluvia y frío, añoraba a su querido cachorro.
Spanish: Volé cigüeña que jamás cruzó París, exhibe flor de kiwi y atún.
Portuguese: O próximo vôo à noite sobre o Atlântico, põe freqüentemente o único médico. (3)
French: Les naïfs ægithales hâtifs pondant à Noël où il gèle sont sûrs d'être déçus en voyant leurs drôles d'œufs abîmés.
Esperanto: Eĥoŝanĝo ĉiuĵaŭde
Esperanto: Laŭ Ludoviko Zamenhof bongustas freŝa ĉeĥa manĝaĵo kun spicoj.
Hebrew: זה כיף סתם לשמוע איך תנצח קרפד עץ טוב בגן.
Japanese (Hiragana):
いろはにほへど　ちりぬるを
わがよたれぞ　つねならむ
うゐのおくやま　けふこえて
あさきゆめみじ　ゑひもせず (4)
Japanese (Kanji):
色は匂へど 散りぬるを
我が世誰ぞ 常ならむ
有為の奥山 今日越えて
浅き夢見じ 酔ひもせず

Sample Output (wrapped to 72 characters):

English: The quick brown fox jumps over the lazy dog. Jamaican: Chruu, a
kwik di kwik brong fox a jomp huova di liezi daag de, yu no siit? Irish:
"An ḃfuil do ċroí ag bualaḋ ó ḟaitíos an ġrá a ṁeall lena ṗóg éada ó ṡlí
do leasa ṫú?" "D'ḟuascail Íosa Úrṁac na hÓiġe Beannaiṫe pór Éava agus
Áḋaiṁ." Dutch: Pa's wĳze lynx bezag vroom het fikse aquaduct. German:
Falsches Üben von Xylophonmusik quält jeden größeren Zwerg. (1) German:
Im finſteren Jagdſchloß am offenen Felsquellwaſſer patzte der
affig-flatterhafte kauzig-höf‌liche Bäcker über ſeinem verſifften
kniffligen C-Xylophon. (2) Norwegian: Blåbærsyltetøy ("blueberry jam",
includes every extra letter used in Norwegian). Swedish: Flygande
bäckasiner söka strax hwila på mjuka tuvor. Icelandic: Sævör grét áðan
því úlpan var ónýt. Finnish: (5) Törkylempijävongahdus (This is a
perfect pangram, every letter appears only once. Translating it is an
art on its own, but I'll say "rude lover's yelp". :-D) Finnish: (5)
Albert osti fagotin ja töräytti puhkuvan melodian. (Albert bought a
bassoon and hooted an impressive melody.) Finnish: (5) On sangen
hauskaa, että polkupyörä on maanteiden jokapäiväinen ilmiö. (It's
pleasantly amusing, that the bicycle is an everyday sight on the roads.)
Polish: Pchnąć w tę łódź jeża lub osiem skrzyń fig. Czech: Příliš
žluťoučký kůň úpěl ďábelské ódy. Slovak: Starý kôň na hŕbe kníh žuje
tíško povädnuté ruže, na stĺpe sa ďateľ učí kvákať novú ódu o živote.
Slovenian: Šerif bo za domačo vajo spet kuhal žgance. Greek (monotonic):
ξεσκεπάζω την ψυχοφθόρα βδελυγμία Greek (polytonic): ξεσκεπάζω τὴν
ψυχοφθόρα βδελυγμία Russian: Съешь же ещё этих мягких французских булок
да выпей чаю. Russian: В чащах юга жил-был цитрус? Да, но фальшивый
экземпляр! ёъ. Bulgarian: Жълтата дюля беше щастлива, че пухът, който
цъфна, замръзна като гьон. Sami (Northern): Vuol Ruoŧa geđggiid leat
máŋga luosa ja čuovžža. Hungarian: Árvíztűrő tükörfúrógép. Spanish: El
pingüino Wenceslao hizo kilómetros bajo exhaustiva lluvia y frío,
añoraba a su querido cachorro. Spanish: Volé cigüeña que jamás cruzó
París, exhibe flor de kiwi y atún. Portuguese: O próximo vôo à noite
sobre o Atlântico, põe freqüentemente o único médico. (3) French: Les
naïfs ægithales hâtifs pondant à Noël où il gèle sont sûrs d'être déçus
en voyant leurs drôles d'œufs abîmés. Esperanto: Eĥoŝanĝo ĉiuĵaŭde
Esperanto: Laŭ Ludoviko Zamenhof bongustas freŝa ĉeĥa manĝaĵo kun
spicoj. Hebrew: זה כיף סתם לשמוע איך תנצח קרפד עץ טוב בגן. Japanese
(Hiragana): いろはにほへど ちりぬるを わがよたれぞ つねならむ うゐのおくやま けふこえて あさきゆめみじ ゑひもせず (4)
Japanese (Kanji): 色は匂へど 散りぬるを 我が世誰ぞ 常ならむ 有為の奥山 今日越えて 浅き夢見じ 酔ひもせず

Paragraphs (\n\n-separated or greater) are maintained separate with a single blank line in between. All lines in the Sample Output wrap to 72 characters or less. The only visual problem is with Japanese Hiragana/Kanji, but in fact the last two lines of the "wrapped" output contain 71 and 65 characters, respectively.
Custom words can be defined, based upon Unicode properties. For example, the .words routine can be replaced by .comb(/ <-:Zs>+ /) to split on Unicode 'Space-Separator' as defined in Unicode® Standard Annex #44.
Right now the code doesn't hyphenate or otherwise break individual words that are longer than the desired $wrap column width. (This may be the desired behavior, otherwise you indeed might see issues with excessively long words and/or short column-widths).
A single trailing whitespace is left at the end of lines less that $wrap. This can be corrected by running ~$ raku -ne '.trim-trailing.put;' over the wrapped output.

https://unicode.org/reports/tr15/#Canon_Compat_Equivalence
https://docs.raku.org/language/unicode
https://docs.raku.org/type/Str#routine_words
https://docs.raku.org/type/Str#routine_comb
https://raku.org

instead of testing different human languages, it would make more sense to test different unicode whitespace characters — milahu, Jan 08 '24 at 08:42
@mihalu Edited, thanks. Raku's .words routine is basically the same as (Unicode-aware) $input.comb(/ \S+ /, $limit) where \S+ is one-or-more non-whitespace character and $limit equals Inf. So Raku .combs on the Unicode definition of whitespace (.comb is essentially the inverse of .split). If a user needs to create their own .words definition then they can use Unicode properties to .comb on a delimiter of their choice. Cheers. — jubilatious1, Jan 08 '24 at 10:20

score 1 · Answer 7 · answered Jan 08 '24 at 08:28

pandoc can wrap unicode text

pandoc -f plain.lua -t plain \
  --wrap=auto --columns=78 input.txt

you only need a plain text reader in plain.lua
because by default, pandoc cannot parse plain text

-- A sample custom reader that just parses text into blankline-separated
-- paragraphs with space-separated words.
-- For better performance we put these functions in local variables:
local P, S, R, Cf, Cc, Ct, V, Cs, Cg, Cb, B, C, Cmt =
  lpeg.P, lpeg.S, lpeg.R, lpeg.Cf, lpeg.Cc, lpeg.Ct, lpeg.V,
  lpeg.Cs, lpeg.Cg, lpeg.Cb, lpeg.B, lpeg.C, lpeg.Cmt
local whitespacechar = S(" \t\r\n")
local wordchar = (1 - whitespacechar)
local spacechar = S(" \t")
local newline = P"\r"^-1 * P"\n"
local blanklines = newline * (spacechar^0 * newline)^1
local endline = newline - blanklines
-- Grammar
G = P{ "Pandoc",
  Pandoc = Ct(V"Block"^0) / pandoc.Pandoc;
  Block = blanklines^0 * V"Para" ;
  Para = Ct(V"Inline"^1) / pandoc.Para;
  Inline = V"Str" + V"Space" + V"SoftBreak" ;
  Str = wordchar^1 / pandoc.Str;
  Space = spacechar^1 / pandoc.Space;
  SoftBreak = endline / pandoc.SoftBreak;
}
function Reader(input)
  return lpeg.match(G, tostring(input))
end

How can I wrap text at a certain column size?

7 Answers7

Linked

Related