4

I'm trying to extract all command synopses from manpages in /usr/share/man/man1 using:

#!/usr/bin/env bash
## synopses - extract all synopses in /usr/share/man/man1

cd /usr/share/man/man1
for i in *.gz; do
    echo "$i:" | sed -E "s/.1.gz|.gz//g"
    man "./$i" | sed -n '/^SYNOPSIS/,/^[A-Z][A-Z][A-Z]/p' | sed -e '1d; $d' | tr -s [:space:]
done

...which provides some measure of success - I get complete output for commands from a to z. But I'm also getting many errors on stderr using both for i in ./*.gz; do man "$i" and for i in *.gz; do man "./$i"as I output to file (synopses > file)1:

<standard input>:27: expected `;' after scale-indicator (got `o')
<standard input>:29: expected `;' after scale-indicator (got `o')
<standard input>:283: name expected (got `\{'): treated as missing
<standard input>:674: warning: macro `as',' not defined (possibly missing space after `as')
<standard input>:174: name expected (got `\{'): treated as missing
<standard input>:161: warning [p 1, 5.5i]: can't break line
<standard input>:594: warning [p 5, 3.8i, div `an-div', 0.0i]: can't break line
<standard input>:569: warning [p 6, 0.0i]: can't break line
<standard input>:147: warning [p 1, 1.8i]: can't break line
<standard input>:205: warning [p 2, 0.2i]: can't break line
<standard input>:525: warning [p 5, 4.5i]: can't break line
<standard input>:157: warning [p 1, 4.8i]: can't break line
<standard input>:351: warning [p 3, 1.8i, div `an-div', 0.0i]: can't break line
<standard input>:147: a space character is not allowed in an escape name
man: can't open man1/zshmisc.1: No such file or directory
man: -:423: warning: failed .so request
man: can't open man1/zshexpn.1: No such file or directory
man: -:423: warning: failed .so request
man: can't open man1/zshparam.1: No such file or directory
man: -:423: warning: failed .so request
man: can't open man1/zshoptions.1: No such file or directory
man: -:423: warning: failed .so request
man: can't open man1/zshbuiltins.1: No such file or directory
man: -:423: warning: failed .so request
man: can't open man1/zshzle.1: No such file or directory
man: -:423: warning: failed .so request
man: can't open man1/zshcompwid.1: No such file or directory
man: -:423: warning: failed .so request
man: can't open man1/zshcompsys.1: No such file or directory
man: -:423: warning: failed .so request
man: can't open man1/zshcompctl.1: No such file or directory
man: -:423: warning: failed .so request
man: can't open man1/zshmodules.1: No such file or directory
man: -:423: warning: failed .so request
man: can't open man1/zshcalsys.1: No such file or directory
man: -:423: warning: failed .so request
man: can't open man1/zshtcpsys.1: No such file or directory
man: -:423: warning: failed .so request
man: can't open man1/zshzftpsys.1: No such file or directory
man: -:423: warning: failed .so request
man: can't open man1/zshcontrib.1: No such file or directory
man: -:423: warning: failed .so request
<standard input>:423: can't open `man1/zshmisc.1': No such file or directory
<standard input>:424: can't open `man1/zshexpn.1': No such file or directory
<standard input>:425: can't open `man1/zshparam.1': No such file or directory
<standard input>:426: can't open `man1/zshoptions.1': No such file or directory
<standard input>:427: can't open `man1/zshbuiltins.1': No such file or directory
<standard input>:428: can't open `man1/zshzle.1': No such file or directory
<standard input>:429: can't open `man1/zshcompwid.1': No such file or directory
<standard input>:430: can't open `man1/zshcompsys.1': No such file or directory
<standard input>:431: can't open `man1/zshcompctl.1': No such file or directory
<standard input>:432: can't open `man1/zshmodules.1': No such file or directory
<standard input>:433: can't open `man1/zshcalsys.1': No such file or directory
<standard input>:434: can't open `man1/zshtcpsys.1': No such file or directory
<standard input>:435: can't open `man1/zshzftpsys.1': No such file or directory
<standard input>:436: can't open `man1/zshcontrib.1': No such file or directory

What are those <standard input> errors about (something escaped?) and why is man ending up not finding some files? How could I make this more robust/efficient?


1. It seems the errors on stderr are the same whatever the implementation/solution I use for the same data. It is striking.

  • For busybox: for i in $(busybox --list); do busybox "$i" --help; done 2>&1 –  Jun 17 '14 at 07:09

2 Answers2

4

You can't just run man foo.gz It looks like you can run man foo.1.gz but using the -l seems cleaner. From man man:

   -l, --local-file
          Activate `local' mode.  Format and display  local  manual  files
          instead  of  searching  through  the system's manual collection.
          Each manual page argument will be interpreted as an nroff source
          file in the correct format.  No cat file is produced.  If '-' is
          listed as one of the arguments, input will be taken from  stdin.
          When  this  option  is  not used, and man fails to find the page
          required, before displaying the error message,  it  attempts  to
          act as if this option was supplied, using the name as a filename
          and looking for an exact match.

So, your script should be something like:

#!/usr/bin/env bash
## synopses - extract all synopses in /usr/share/man/man1

## No need to cd into the directory, you can just use globs     
for i in /usr/share/man/man1/ajc*.gz; do
    ## This will print the name of the command.      
    basename "${i//.1.gz}"
    man -l "$i"  | 
       awk '/^SYNOPSIS/{a=1; getline}
            (/^[a-zA-z0-9_]/ && a==1){a=0} 
            (a==1 && /./){print}' | tr -s [:space:]

done

The awk command I give works better than your approach (test it on man ajc for example) and now also works on multi-line synopses. Most of the errors you see are irrelevant, others were due to the way you were handling file names. Let me know if this one works better.

terdon
  • 242,166
  • @illuminÉ ah! You want those? OK, hang on. – terdon Jun 14 '14 at 13:10
  • 1
    @illuminÉ I think those are just issues with formatting he output and will depend on things like the size of your terminal. I wouldn't worry about it. Anyway, try the updated version. – terdon Jun 14 '14 at 13:26
  • Works nicely! Your intuition was right - man formatting and the size of the terminal. Furthermore, as someone explained to me synopses need not follow a strict grammar, and manuals vary, so there are exceptions such as bc for instance with a "syntax" header instead of "synopsis". I find your solution is quite robust otherwise. I was able to figure out that on my system the as, objcopy and cpio commands have the longest synopses. With development and perl related doc. Thanks again! –  Jun 20 '14 at 11:55
2

Regarding the errors you encounter, those are all addressed here:

man man

MANWIDTH - If $MANWIDTH is set, its value is used as the line length for which manual pages should be formatted. If it is not set, manual pages will be formatted with a line length appropriate to the current terminal (using an ioctl(2) if available, the value of $COLUMNS, or falling back to 80 characters if neither is available). Cat pages will only be saved when the default formatting can be used, that is when the terminal line length is between 66 and 80 characters.

MAN_KEEP_FORMATTING - Normally, when output is not being directed to a terminal (such as to a file or a pipe), formatting characters are discarded to make it easier to read the result without special tools. However, if $MAN_KEEP_FORMATTING is set to any non-empty value, these formatting characters are retained. This may be useful for wrappers around man that can interpret formatting characters.

MAN_KEEP_STDERR - Normally, when output is being directed to a terminal (usually to a pager), any error output from the command used to produce formatted versions of manual pages is discarded to avoid interfering with the pager's display. Programs such as groff often produce relatively minor error messages about typographical problems such as poor alignment, which are unsightly and generally confusing when dis- played along with the manual page. However, some users want to see them anyway, so, if $MAN_KEEP_STDERR is set to any non-empty value, error output will be displayed as usual.

And now about how you might do the other thing:

I think this does what you want:

for f in /usr/share/man/man1/*gz ; do
    man -P "sed -ne '1,/^[Nn]/d;/^ /{H;b}
    /^[Ss]..[Yy]..[Nn]/{g;:n
    N;/\n\(\n\)[^ ].*/!bn;s//\1/
    s/.\x08//g;s/\(\n\)  */\1/g;
    w /dev/stderr' -ne '};/./q'" -l "$f"
done 2>~/file

It specifies that sed be the PAGER and then outputs only the line following NAME and those following SYNOPSIS until it encounters any other line beginning with anything other than a <space>. It prints nothing if the first line not beginning with <space> that follows NAME does not match begin [Ss][Yy][Nn]. In every case it quits reading the file altogether on the second line it encounters following NAME that does not begin with <space>. It clears leading <spaces> and all \backslashes from the output.

I ran it in the for loop just now and it looped over my entire man library in only a minute.

man adjusts its output based on whether it writes to a terminal or a pipe/file. So if you tell it to do that it forgoes the PAGER altogether. That was unexpected. But I tricked it and used sed's write function to write out to >&2 and redirected that so it was none the wiser.

A note - though - it may be @terdon's is the better way to go. While you can tailor this easier because you get a sed per file, and the formatting is a little better because it doesn't try to fit a terminal width, apparently man doesn't write those \backslashes to a |pipe.

mikeserv
  • 58,310
  • 1
    @illuminÉ What you might find useful is nl - it's handy because it works kind of like grep but - depending on your regex - rather than returning only the line you specify it instead indents the entire file and numbers only the regex matches. It's also very fast. It's an excellent way to anchor some more in-depth regexes on a second pass at it. – mikeserv Jun 17 '14 at 00:23