31

Many a times when manually grepping through a file, there are so many comments that your eyes glaze over and you start to wish there was a way in which you just could show it to display only those lines which have no comments.

Is there a way to skip comments with cat or another tool? I am guessing there is a way and it involves a regular-expression. I want it just to display and not actually remove any lines or such.

Comments are in the form of # and I'm using zsh as my xterm.

shirish
  • 12,356

7 Answers7

35

Well, that depends on what you mean by comments. If just lines without a # then a simple:

grep -v '#'

might suffice (but this will call lines like echo '#' a comment). If comment lines are lines starting with #, then you might need:

grep -v '^#'

And if comment lines are lines starting with # after some optional whitespace, then you could use:

grep -v '^ *#'

And if the comment format is something else altogether, this answer will not help you.

14

Just grepping will never be able to remove all comments (or comments only) because grep does not understand the language that it is going through. To understand what is a comment and what isn't one you need a lexer that understand that particular language.

There are several answers on SO about how to remove all comments from specific programming languages. I'll add two examples here.

For C the answer by Josh Lee argues:

gcc -fpreprocessed -dD -E test.c

Which runs the preprocessor but keeps the macros.

For python the answer by unutbu (with a small adaptation by myself) writes a small lexer using tokenize:

import tokenize
import io
import sys

def nocomment(s):
    result = []
    g = tokenize.generate_tokens(io.BytesIO(s).readline)  
    for toknum, tokval, _, _, _  in g:
        # print(toknum,tokval)
        if toknum != tokenize.COMMENT:
            result.append((toknum, tokval))
    return tokenize.untokenize(result)

print(nocomment(sys.stdin.read()))

You can then write one of these for each programming language and use a case. Assuming that the python lexer is called remove-comments.py

#!/bin/sh
case "$1" in
  *.py)
    remove-comments.py < "$1"
    break
    ;;
  *.c|*.C|*.cc)
    gcc -fpreprocessed -dD -E "$1"
    break
    ;;
  *)
    echo I do not know how to remove comments from $1, sorry
    break
    ;;
esac

Give a name to the script and add the lexers for the languages you need/use. This should be a more-or-less robust design for comment removal from different file types. (Using file instead of a case on filenames would be more robust too).

grochmal
  • 8,657
  • Though I can understand where you are coming from with this answer, a lexer won't be necessary as the OP clearly states that the comments are strictly formatted with # and exist in a shell script file. So a solitary grep solution will be perfectly fine in this case. – Yokai Jan 12 '18 at 08:16
  • @Yokai - Shells scripts may be complex, just one example is ${VAR#*/}, which is a way to achieve the same as basename. To strip space you would have the string # (hash plus space) and that would not be a comment. – grochmal Jan 15 '18 at 10:57
14
grep -v "^#" your_file | grep -v "^$" | less

Remove the lines starts with "#" and also remove the empty lines, than send the result to less for a better display.

4

In case of bash scripts, it is possible via set -vn command. -v tells bash to enter verbose mode, where commands read will also be printed out. -n tells bash to only read script file without executing anything.

Example:

$ cat ./testscript.sh                                                                                                    
#!/bin/bash

# comment
set -vn
echo "Hello World" # another comment
$ ./testscript.sh                                                                                                        
echo "Hello World" # another comment

As you can see, it ignores lines that start with #, but the in-line comments are still printed out. This is of course not ideal, but at least doesn't require any external tools such as grep. I'm not aware of such features in other scripting languages

3

As mentioned in comments above, what format 'comments' take in your use-case makes a difference. Still, for several cases, this may be enough, without having to create a script.

The solution:

Reading the question suggests you're using grep already to search the files anyway, so pipe that through anothergrep; like this:

grep your_pattern your_file | grep --perl-regexp --invert-match '(?:^;)|(?:^\s*/\*.*\*/)|(?:^\s*#|//|\*)'

What is not trapped:

This will still allow lines or that have a 'trigger' character elsewhere in the line, that have comments at the end, as in echo "Hello World" # another comment, or that are part of a multi-line comment (except as noted in the explanation below.

If this is used as a post-filter to your grep these limitations should be negligible as most of the comments will still be filtered out and you won't worry "that your eyes glaze over" anymore.

The Explanation:

There are three patterns, which you can modify to suit your use-case if needed. The first (?:^;) catches lines beginning with the ; character. Must be first, without white space. The second catches lines that begin with the `/* ... */` comment style, with or without leading white space. The third catches lines, with or without leading white space, that begin with #, //, or *. The * in the last pattern helps to catch the line inside a multi-line comment in the /* ... */ style where common style is to run a column of * to connect the first and last line together. For example:

/************
 *
 * This is my
 * multi-line
 * comment.
 *
 ************/

The (? ... ) notation around each pattern makes them 'non-capturing' patterns, hopefully to increase speed and reduce resource consumption. The -Pv arguments to grep tell it to use Perl regular expression rules --perl-regexp which allows the non-capturing grouping and allows the | alternation operator to work, neither of which work in CLI grep. The grep man page does warn that the -P option is experimental, so do test before relying on it in your system. The --invert-match tells grep to reverse the match, returning lines that fail the pattern. These can be combined, and shortened to -vP instead.

The reason to use this as a post-filter to your normal grep is three-fold. First, you can do your normal grepping, and only add the extra work of using this when you run into your problem of too many comments in the output. (Less typing and fewer resources used.) Second, you have probably already developed the patterns you commonly use, and the habits that go with them, and adding more complexity to them could break them. Adding more work to debug patterns when you don't have to is wasted work. Third, It doesn't do well with multi-line comments at all, but if you've already grepped the file for what you want, then it'll remove most, if not all, comment from the results, and serve your purpose.

Chindraba
  • 1,478
  • 1
    @StephenRauch If the commenting # is preceded by white space it is still a comment, most of the time, but not stripped by ^#, need to allow for white space with ^\s#. – Chindraba Jan 17 '17 at 10:12
3

To do this for bash (or bourne shell files) : you can take advantage of bash's "declare -f functionname", which displays functionname with both proper indentation AND with comments removed (so you'd get your comments removed, and as a bonus the indentation would be good too) :

BEAUTIFIER () {
  for f in "$@"; do
    printf "%s" "
      F_from_sh () {
        $(cat "$f")
      }
      echo ___ beautified version of $f : _________________
      declare -f F_from_sh | awk ' (NR > 2) && length>2' | sed -e 's/^  //'
    " | bash
  done
}

Then use as:

BEAUTIFIER script1.sh  script2.bash  etc

Please note : that it will get rid of all comments of the script, even the "shebang" first line ! You may want to also display the first line of $f.

  • Indeed, the best parser is the shell itself. You could simplify this to a one-liner:

    echo "f() { $(< ~/Documents/shellscripts/avidemux2ffmpeg.sh); }; type f | tail -n +4 | head -n -1;" | bash

    The tail and head commands just serve to remove the function envelope from the output. You can add a hash-bang type comment by an extra echo "#\!/bin.bash; that is not piped to the bash command.

    – db-inf Nov 16 '21 at 23:43
2

Here is a simple process to remove comments, ie everything comes after '#' using sed and awk.

[root@master]# cat hash
This is a program to remove comments from this file
#!/bin/bash
# comment
set -vn # comment
echo "Hello World" # another comment
echo "testscript for removing comments"
echo "Hello World" # another comment
echo 'This is a # sign' #comment
echo "This is a # sign" #comment

[root@master]# awk -F '#' 'BEGIN{OFS="#";} { if (!/#/) ;else $NF="";print $0}' hash | sed -n 's/#$//g;p'
This is a program to remove comments from this file
set -vn
echo "Hello World"
echo "testscript for removing comments"
echo "Hello World"
echo 'This is a # sign'
echo "This is a # sign"
terdon
  • 242,166
  • 1
    What if there's echo "This is a # sign" in the script ? – Sergiy Kolodyazhnyy Jan 17 '17 at 07:40
  • Its not a script, I just show it as by editing a file. You need to use only this much. cat | sed -n 's/#.*$//g;p' – Aljo Antony Jan 17 '17 at 07:43
  • you misunderstood my question. What your sed command does is strip # and anything after it till the end of line. If there is a legitimate command , such as echo "Hello # world" , it will chop off portion of the command, thus introducing bugs if user wants to copy uncommented version of the script somewhere else. See this : http://paste.ubuntu.com/23815189/ – Sergiy Kolodyazhnyy Jan 17 '17 at 08:08
  • 1
    In other words, this approach will work, but only if there's no # sign within commands themselves – Sergiy Kolodyazhnyy Jan 17 '17 at 08:10
  • I updated my query, I use zsh. Also you need to make format your code a bit please, as right now its a bit to parse. I have edited a bit so that's its easier to parse now. – shirish Jan 17 '17 at 10:33
  • I had updated my query and now it will remove everything after # from end. – Aljo Antony Jan 17 '17 at 10:41