how to check if the first line of file contain a specific string?

Question

I need to write a shell script that find and print all files in a directory which starts with the string: #include. Now, I know how to check if a string is in the file, by using:

for f in `ls`; do
    if grep -q 'MyString' $f; then:
        #DO SOMETHING
    fi

but how can I apply this to the first line? I thought to maybe create a variable of the first line and check if it starts with #include, but I'm not sure how to do this. I tried the read command but I fail to read into a variable.

I'd like to hear other approaches to this problem; maybe awk? Anyway, remember, I need to check if the first line starts with #include, not if it contains that string. That's why I found those questions: How to print file content only if the first line matches a certain pattern? https://stackoverflow.com/questions/5536018/how-to-print-matched-regex-pattern-using-awk they are not completely helping.

Tip: don't use ls in scripts, it invariably leads to problems. — ctrl-alt-delor, Nov 09 '18 at 10:02
I think https://unix.stackexchange.com/a/232655/117549 is reasonably close -- just anchor the pattern. — Jeff Schaller, Nov 09 '18 at 11:28
@ctrl-alt-delor so I shall use "in *" instead? no problem... but what kind of problem 'ls' can cause? — Z E Nir, Nov 10 '18 at 15:55
It is just that ls is hard to parse, it was not designed for the computer to read, but for humans. It is often used when not needed. There are other tools: just that are designed for the job: *, find, … — ctrl-alt-delor, Nov 10 '18 at 19:31
It's a fine line for me on the duplicate, since it's 99% the same idea, with this question requiring the anchor. — Jeff Schaller, Nov 11 '18 at 17:26

score 22 · Accepted Answer · 2018-12-23T18:57:56.590

22

It is easy to check if the first line starts with #include in (GNU and AT&T) sed:

sed -n '1{/^#include/p};q'   file

Or simplified (and POSIX compatible):

sed -n '/^#include/p;q'   file

That will have an output only if the file contains #include in the first line. That only needs to read the first line to make the check, so it will be very fast.

So, a shell loop for all files (with sed) should be like this:

for file in *
do
    [ "$(sed -n '/^#include/p;q' "$file")" ] && printf '%s\n' "$file"
done

If there are only files (not directories) in the pwd.

If what you need is to print all lines of the file, a solution similar to the first code posted will work (GNU & AT&T version):

sed -n '1{/^#include/!q};p'  file

Or, (BSD compatible POSIXfied version):

sed -ne '1{/^#include/!q;}' -e p  file

Or:

sed -n '1{
           /^#include/!q
         }
         p
       '  file

edited Dec 23 '18 at 18:57

answered Nov 09 '18 at 09:53

it prints the name of the file/s, not its content... I just replaced the 'printf' with "cat $file" and it worked! Thanks! – Z E Nir Nov 09 '18 at 09:58
@ZENir - so, you've found a question that has answers that show how to print the whole content, you've even included a link to that question in your post, you ask about something slightly different and then you complain in the comment above that the solution prints the file name instead of its content ? I mean, really... I have to bookmark this page... What is your actual question ? Do you want the file names or their content ? – don_crissti Nov 09 '18 at 17:58

Stéphane Chazelas · Answer 2 · 2018-12-23T09:25:38.490

for file in *; do
  [ -f "$file" ] || continue
  IFS= read -r line < "$file" || [ -n "$line" ] || continue
  case $line in
    ("#include"*) printf '%s\n' "$file"
  esac
done

To print the content of the file instead of its name, replace the printf command with cat < "$file".

If your awk supports the nextfile extension, and you don't care about the potential side effects of opening non-regular files:

awk '/^#include/{print substr(FILENAME, 3)}; {nextfile}' ./*

Above, we're adding a ./ prefix which we're stripping afterwards in FILENAME to avoid problems with file names containing = characters (or a file called -).

With zsh, you can replace ./* with ./*(-.) to only pass regular files (or symlinks to regular files like for the [ -f ... ] approach above) to awk.

Or to print the file contents instead of name:

awk 'FNR == 1 {found = /^#include/}; found' ./*

(that one is portable).

I tried this script, for some reason it does nothing, I have a file starts with '#include' and still nothing printes, If I press 'Enter' nine times i'm returning to the bash — Z E Nir, Nov 09 '18 at 09:53
@ZENir, in my initial version of the script, I had forgotten the < "$file". Try reloading the page. — Stéphane Chazelas, Nov 09 '18 at 10:44

score 2 · Answer 3 · edited Nov 09 '18 at 12:05

2

for file in *
do
  [ -f "$file" ] && head -n 1 < "$file" | grep -q '^#include' && cat < "$file"
done

Beware of the fact that, with -q option enabled, grep will exit with a zero status even if an error occurred.

edited Nov 09 '18 at 12:05

Stéphane Chazelas

544,893

answered Nov 09 '18 at 09:56

francescop21

318

score 0 · Answer 4 · answered Nov 09 '18 at 17:04

0

This question is a perfect example where bash and sed solutions are quite complex, but the task can be much simpler with (GNU) awk:

gawk 'FNR==1 && /^#include/{print FILENAME}{nextfile}' *

answered Nov 09 '18 at 17:04

user000001

3,635

A similar solution has already been given. Yours has issues with file names that contain = characters. Note that a few other awk implementations beside GNU awk support nextfile these days. Here, because you're using FNR == 1, it would also work in implementations that don't support nextfile. – Stéphane Chazelas Nov 09 '18 at 17:14
@StéphaneChazelas You're right, I read your answer too fast and didn't notice your awk examples, sorry about that – user000001 Nov 09 '18 at 18:29

how to check if the first line of file contain a specific string?

4 Answers4