7

How can I search for bolded or underlined text? This would often be useful to search for keywords, which are often highlighted as such. For example, in this excerpt from bash(1):

enter image description here

I might want to search for read or timeout, but searching for only that will give me dozens of useless results which I will have to n past. There are some "tricks" you can use to reduce this (e.g. searching for <Space>read<Space> or read \[), but that doesn't always work for every manpage or keyword.

Note that I'm not particularly attached to less as such; using a different pager is fine. less just happens to be the default pager.

Martin Tournoij
  • 1,715
  • 2
  • 15
  • 35
  • Use their escape code, Bold is 1, and 4 for underline. – PersianGulf Mar 22 '16 at 21:12
  • However (nroff or [tg]roff ) may be store another way, find way of (nroff or [tg]roff ). – PersianGulf Mar 22 '16 at 21:15
  • Use info instead of man and use the index. i – Stéphane Chazelas Mar 22 '16 at 21:30
  • 1
    @StéphaneChazelas I've never been able to use info without getting seriously lost all over the place. e.g. typing info ls and searching for hello brings me to ... I have no idea where ... but it's not the ls documentation ... Pressing Pageup a few times brings me to ... somewhere else ... but also not the ls documentation ... I find it seriously confusing :-/ But perhaps I need to re-investigate (it's been years since I seriously looked at info) – Martin Tournoij Mar 22 '16 at 21:41
  • You could start with the tutorial. The index with completion is really a killer feature, especially for a manual the size of bash's – Stéphane Chazelas Mar 22 '16 at 21:42

5 Answers5

5

Use Vim as MANPAGER. With some creative use of conceal and iskeyword, this can be done:

setlocal nowrap
setlocal conceallevel=3
setlocal concealcursor=nvic
exe "setlocal iskeyword+=\b,_"
syntax match BACKHIDE '.\b' conceal contained
syntax match BOLD '\(.\)\b\1' contains=BACKHIDE
syntax match Underlined '_\b.' contains=BACKHIDE
highlight BOLD cterm=bold

enter image description here

Since the Backspace, etc., are still there, a search for the word using * will only match similarly highlighted words:

enter image description here

Note how the bold man is found, but the normal man in the current line is not.

With some more settings (shameless plug), Vim provides a comfortable pager for man, and a better one than less.

To get Vim to apply the relevant settings, here's what I do:

  1. In a suitable place for environment variables, MANPAGER='vim -'.
  2. In ~/.vim/vimrc, have a minimum of:

    set nocompatible
    filetype plugin on
    syntax on
    
    if !empty($MAN_PN)
        autocmd StdinReadPost * set ft=man | file $MAN_PN
    endif
    

    For a command started by man using MANPAGER, the manpage name is provided in the MAN_PN environment variable. We can take advantage of this to detect when Vim is being used as MANPAGER and for finding out the manpage name.

  3. In ~/.vim/ftplugin/man.vim:

    setlocal nolist 
    setlocal buftype=nofile
    setlocal bufhidden=hide
    setlocal noswapfile
    
    setlocal readonly
    setlocal nomodifiable
    
    setlocal nowrap
    setlocal conceallevel=3
    exe "setlocal iskeyword+=\b,_"
    setlocal concealcursor=nvic
    
    nnoremap q :q!<CR>
    nnoremap <Space> <PageDown>
    

    The options create a read-only, unmodifiable, scratch buffer (see How is a scratch buffer created? in Vim Wikia), disabling swap files. Then it applies the settings listed at the start of this post, and adds some mappings for convenience - q will close the current manpage, and Spacebar will move one page down, like in less.

  4. In ~/.vim/after/syntax/man.vim:

    syntax match BACKHIDE '.\b' conceal contained
    syntax match BOLD '\(.\)\b\1' contains=BACKHIDE
    syntax match Underlined '_\b.' contains=BACKHIDE
    highlight BOLD cterm=bold
    

    These are the syntax and highlighting commands from the start of the post.

With just these minimum settings:

enter image description here

Note how the top line is highlighted - Vim itself ships with some manpage syntax highlighting, which you can see if you removed the backspaces (using col -b -x, for example). However, you lose a lot more than you gain by that way, since Vim has no way to know everything that might have been underlined or bold.

My own personal settings use the molokai colorscheme, set number, the airline plugin, and a different highlighting for BOLD:

highlight link BOLD Constant

enter image description here

And because I have transparency enabled in the terminal settings (not visible in the screenshot), the colours are softer and more pleasing than seen here.

If you enable line numbers (:set number) like I have, set MANWIDTH to a value less than COLUMNS so that you won't have to scroll sideways to see the entire text. MANWIDTH=75 works well for 80-column terminals. I use a drop-down terminal as wide as the screen (160-240 columns depending on the resolution), so a fixed MANWIDTH=80 works fine for me.

muru
  • 72,889
  • 2
    That's great! Thanks a lot for the effort. (shouldn't that be ~/.vim/after/syntax/man.vim ?) I suppose one downside is that if you search for /man, you only see the unformatted ones. Maybe an addition could be a key binding to search for the text in the col -b-like processed version of the text. – Stéphane Chazelas Mar 24 '16 at 14:28
  • @StéphaneChazelas yes, that should be after/syntax/man.vim, and, oddly, my settings do use just \(.\) without repetition. I'll correct both. \b didn't work with iskeyword the first time I tried it, I'll test that again when I get the chance. There are other things for which the presence of backspaces causes problems - Vim's manpage settings include a mapping to open the current word's manpages using Ctrl-], like with tag definitions. – muru Mar 24 '16 at 14:36
  • \(.+\) might have been to cover for bold+underline. – Stéphane Chazelas Mar 24 '16 at 15:30
4

There's no clean way I know of.

You can turn off the special handling of backspaces which less uses to display underline/bold, and then use escaped backspaces to search for the string you want:

  1. Open less (e.g. man less)
  2. Turn off UNDERLINE-SPECIAL (type -U<Enter>)
  3. Type in your search string, using <C-v> to escape the backspace characters.

For underlined text, for example, you could type

/_<C-v><C-h>l_<C-v><C-h>e_<C-v><C-h>s_<C-v><C-h>s<Enter>

...to search for the underlined word "less".

For bold text, you could type

/l<C-v><C-h>le<C-v><C-h>es<C-v><C-h>ss<C-v><C-h>s<Enter>

...to search for the bold word "less".

As I said, there's no clean way.


EDIT: As Stephane points out in the comments, you can use a dot (which matches any character) instead of a literal <C-h>, which makes typing it easier.

/l.le.es.ss.s

to search for bold, and

/_.l_.e_.s_.s

to search for underlined.

You still have to turn off UNDERLINE-SPECIAL first, which makes the underlined/bold text not very readable.

Wildcard
  • 36,499
  • 1
    Or /r.re.ea.ad.d to search for read a bit more easily. Interestingly when you do -U again, read is still highlighted – Stéphane Chazelas Mar 22 '16 at 20:23
  • See also GROFF_SGR=1 man bash, use -R to see the escape sequences if need be and search for 1mread for read – Stéphane Chazelas Mar 22 '16 at 21:43
  • I always assumed that man uses ANSI escape codes, but apparently not (roff/man seems to pre-date that). Meh. – Martin Tournoij Mar 24 '16 at 07:26
  • @Carpetsmoker, on GNU systems, it does when GROFF_SGR=1 is in the environment. The problem with ANSI escape codes is that pagers have to be configured to recognise them (like with -R for less). – Stéphane Chazelas Mar 24 '16 at 07:53
  • @StéphaneChazelas another problem is that the codes aren't matched properly. The sequence could look like: ^[[1mfoo ^[[4m^[[22mbar^[[24m, which doesn't cause problems for less, but it does cause problems if you try to process it elsewhere (in Vim, for example). – muru Mar 24 '16 at 11:45
3

If you can see it as bold or underlined, you probably cannot search for it, because what you see is rendered.

For instance, if the text comes from any manpage formatted, it is produced by interpreting overstruck characters as

  • bold (when each character is overstruck by backspacing over it and repeating) or
  • underlined (when each character is written over an underline character).

The less pager FAQ comments that it interprets the bold/underline. It uses terminal video attributes to show the actual bold/underline.

In the process of rendering, a typical pager such as less pretends that it only holds the text (the bold/underline parts are not text).

In a text-editor, you can search for the backspace patterns. Perhaps some specific editor (such as emacs) has the ability to do this, i.e., search for text (while it is rendered as bold/underline) but taking the bold/underline into account as an attribute of the search.

Reading backspaces is less pleasant. Here is the beginning of the manpage cited in the question:

       r^Hre^Hea^Had^Hd [-^H-e^Her^Hrs^Hs] [-^H-a^Ha _^Ha_^Hn_^Ha_^Hm_^He] [-^H>
       _^Hp_^Hr_^Ho_^Hm_^Hp_^Ht] [-^H-t^Ht _^Ht_^Hi_^Hm_^He_^Ho_^Hu_^Ht] [-^H-u>
              One  line  is  read  from  the  standard input, or from the file  
              descriptor _^Hf_^Hd supplied as an argument to the -^H-u^Hu optio>
              first word is assigned to the first _^Hn_^Ha_^Hm_^He, the second >
              second _^Hn_^Ha_^Hm_^He, and so on, with leftover words and their>
              ing  separators  assigned  to the last _^Hn_^Ha_^Hm_^He.  If ther>
              words read from the input stream than names, the remaining names
Thomas Dickey
  • 76,765
3

less -U makes the text pretty illegible. An alternative could be to pre-process the text to use unicode combining characters to implement bold and underline. For instance using U+0332 for underline and bold represented as a double underline with U+0333.

You could create a upager script like:

#! /bin/bash -
ul=$'\u0332' bold=$'\u0333' bs=$'\b'
sed "s/\(.\)$bs\1/\1$bold/g;s/_$bs\(.\)/\1$ul/g" | less

Then:

export MANPAGER=upager

man bash would render as:

   r̳e̳a̳d̳  [-̳e̳r̳s̳]  [-̳a̳  a̲n̲a̲m̲e̲] [-̳d̳ d̲e̲l̲i̲m̲] [-̳i̳ t̲e̲x̲t̲] [-̳n̳ n̲c̲h̲a̲r̲s̲]
   [-̳N̳ n̲c̲h̲a̲r̲s̲] [-̳p̳ p̲r̲o̲m̲p̲t̲] [-̳t̳ t̲i̲m̲e̲o̲u̲t̲] [-̳u̳ f̲d̲] [n̲a̲m̲e̲ ...]
          One line is read from the standard input,  or  from
          the  file  descriptor f̲d̲ supplied as an argument to

And you could search for a bold or underline read with /r.e.a.d for instance.

Not all terminal emulators seem to render those combining characters correctly though. I found konsole gave the best results so far.


A different and maybe better approach that also works in xterm could be to keep the roff formatting but postfix every bold or underline character with an invisible character. That way you can search for a normal read with /read and a bold/underline read with /r.e.a.d like above while the formatting is unaffected.

Invisible characters like U+200B the zero-width space are rendered by less as <U+200B> so are not an option. A character that seems to work though is U+034F the Combining Grapheme Joiner. That one is a combining character and is invisible and generally has no effect at least on English text.

So you could make a gcjpager pager like:

#! /bin/bash -
cgj=$'\u34f' bs=$'\b'
sed "s/.$bs./&$cgj/g" | less

(and export MANPAGER=/path/to/gcjpager).

  • This doesn't seem to work well in xterm but it's an interesting idea and I might be able to base something off of it... Thanks. – Martin Tournoij Mar 24 '16 at 07:26
  • @Carpetsmoker, yes, AFAICT (and Thomas would be able to confirm), xterm only supports combining characters by converting sequences with them to their combined form (like e\u301 -> \ue9, which can be confusing btw as what you copy-paste is different from what was being output) so it only works for sequences that have a combined form so generally not for underline combining characters. Another approach could be to insert invisible characters and keep the roff formatting, I'll add an example. – Stéphane Chazelas Mar 24 '16 at 07:50
  • short answer: yes, xterm converts, which affects copy/paste - see patch #279. – Thomas Dickey Mar 24 '16 at 08:15
3

Not a direct answer to your question, but to more easily find documentation in a large manual like bash's, you could try these alternatives:

Using a different format like info

The bash manual like the manual of most GNU software is written in texinfo, from which several formats are derived (man, info, pdf, html...).

A man page is called a page for a reason. It's just one flat text file where the only structuring is done via font formatting (indentation, bold, underline, all-caps).

For a manual this size, you'd rather want a book than a page.

While man implements the page paradigm, info implements the book paradigm. It has concepts of chapter/sections, table of content, references and index, all of which searchable with completion.

In a book about bash, to learn about the read builtin, you'd look at the index. In info, you type i, and then enter read (completion available) which will bring you directly to the documentation of the read builtin (use , to jump to the next index entry that contains read). You can also start info as info bash read.

In a book, if you wanted to see the section about builtins, you'd check the index again, or look at the table of contents. Same in info with i and g.

Search the web

HTML is another hypertext format (note that info predates the web and HTML) well fitted for larger manuals. Web browsers can usually only search in a single page at a time which makes it not as good as info, but if you're online, you can make use of search engines like duckduckgo or google to search manuals.

bash read builtin site:gnu.org

would likely take you to the section that contains the documentation for read. Or you can use the index: https://www.gnu.org/software/bash/manual/html_node/Builtin-Index.html#Builtin-Index

search man page based on other formatting

Instead of searching for bold/underline text which is not easy to do with current man pagers, you could also try:

  • search for read at the beginning of the line: /^\s*read
  • also as a whole word: /^\s*read\>/
  • you can also use the fact that section headers are less indented, to get a form of table of contents.

    In the most pager, that can be done with 1:od to hide text that is indented, 4:od to hide text indented by at least 4 columns.

    With less, you can do the same with &^\S and &^ {,3}\S, which would show something like:

    [...]
    RESERVED WORDS
    SHELL GRAMMAR
       Simple Commands
       Pipelines
       Lists
       Compound Commands
       Coprocesses
       Shell Function Definitions
    COMMENTS
    QUOTING
    [...]
    

    and let you navigate more easily to a section of interest (and then just enter an empty & to see the full text again, or :od in most).