0

So i want to match a string/or word on a file, but without using any external tools (grep, sed etc), with only pure bash...

Essentially i want the equivalent of:

grep "string" file

or

grep -w "string" file

in pure bash.

PS: I only care about matching an exact string (with or without newline) on a file, so full regex support isn't needed (which other external tools may support).

  • What do you want as output? The line, the word, or if there's a match or not? – schrodingerscatcuriosity Mar 23 '21 at 18:45
  • Mainly if there a match or not :) @schrodigerscatcuriosity – Nordine Lotfi Mar 23 '21 at 18:46
  • 4
    May I ask why? This means you would need to write an actual program in bash (rarely a good idea) that will open the file and run a regex match on each line. This will be incredibly slow and just worse than grep in all ways. Are you sure you really want to do this? If this is part of a larger issue, I suspect it may be an xy problem,. – terdon Mar 23 '21 at 18:49
  • This isn't part of any issue, really. Just was curious to know how to do it, since i seen some pretty complex regex script made in pure bash, and noticed that no one made any (afaik) project or post that do it, so yeah @terdon – Nordine Lotfi Mar 23 '21 at 18:59
  • Nobody in their right mind would make any kind of complex project in any shell language! The only reason you would do something like this is because you were forced to. There is a very good reason why you haven't seen such projects: bash is the wrong tool for the job. It would be like using a screwdriver to hammer a nail. Sure, it might eventually work, but it will be very hard, very slow, and the end result will be not as good as if you had just used a hammer in the first place. – terdon Mar 23 '21 at 19:01
  • I mean, that's only if you take into account that it would be "slower", or "less safe" but, the sole reason that it can be done is enough for it to exist. that though is beside the point of this post i think @terdon – Nordine Lotfi Mar 23 '21 at 19:02
  • 2
    Many things can be done but should not be done. You're free to do it, and I showed you one way, but the more important message here is that the shell is not a general programming language and should not be used as a programming language for arbitrary problems. If you want to play around with this sort of thing (and you should!) please use an actual programming language and don't try to force the shell into things it was never designed for. – terdon Mar 23 '21 at 19:08
  • I'm well aware of that? I appreciate your concern, really, but i never mentioned anything about using "only bash" for everything. I didn't deny other tools were faster either :) I did said "I'm just curious" so yeah @terdon – Nordine Lotfi Mar 23 '21 at 19:09
  • 1
    TBH, looking for a line matching a regex is probably the silliest thing to do manually in the shell, since there's no less than three standard tools that can already do it rather trivially: grep, awk and sed. (With caveats on RE variants and the exact functionality): – ilkkachu Mar 23 '21 at 19:32
  • Not really, i seen far worse. eg: cat, base64, lz4, and so on made in pure bash. OF COURSE, probably unadvised to use but, it doesn't hurt to make it, be it as a learning exercises, code golf, or curiosity (as this post tried to convey)... @ilkkachu – Nordine Lotfi Mar 23 '21 at 19:35
  • @NordineLotfi, I know you're doing it out of curiosity, and that's a fine reason. I don't mean to argue about the downsides really, but just can't help the feeling that redoing it in the shell, or in any language is a bit redundant. Base64 and lz4 are slightly more understandable in that they might not exist as standard tools. (though POSIX has compress, and Busybox probably could help too.) Of course, even if not silly, esp. doing compression in the shell would be totally hideous. – ilkkachu Mar 23 '21 at 19:43
  • 1
    Yeah, again i don't mind when people argue at my somewhat, questionable posts; I'm actually thankful there always some who do their best to convey what they think, whether it's an advice or thought. It's better than silence TBH @ilkkachu that aside, yeah, compression in pure bash would be way too slow... – Nordine Lotfi Mar 23 '21 at 19:45

1 Answers1

4

You can do it. But it is a really, really bad idea. It will be far slower (as in orders of magnitude slower) than grep and less portable since it depends on features of a specific shell (Bash).

This would print out lines matching a regex pattern given as the first argument, similarly to grep pattern:

#!/bin/bash -

regexp="$1" ret=1 while IFS= read -r line || [ -n "$line" ]; do if [[ $line =~ $regexp ]]; then printf '%s\n' "$line" ret=0 fi done exit "$ret"

Save that as foo.bash and run like this:

foo.bash pattern < inputFile

Or using standard sh syntax, looking for a fixed string and not a regex:

#!/bin/sh -

string="$1" ret=1 while IFS= read -r line || [ -n "$line" ]; do case $line in ("$string") printf '%s\n' "$string" ret=0 esac done exit "$ret"

(Replace the printf with exit 0 to get behaviour similar to grep -q.)

Just to give you an idea of how slow it is, I created a file with just 10001 lines, the first 5000 being foo, then a single bar and then another 5000 foo :

perl -e 'print "foo\n" x 5000; print "bar\n"; print "foo\n" x 5000;' > file

Now, compare the times for grep and the script above:

$ time grep bar < file
bar

real 0m0.002s user 0m0.002s sys 0m0.000s

$ time ./foo.bash bar < file bar

real 0m0.116s user 0m0.101s sys 0m0.016s

As you can see, even with this tiny file, the difference is noticeable. If we try with a more substantial one, the time the script takes turns almost unbearable:

$ perl -e 'print "foo\n" x 500000; print "bar\n"; print "foo\n" x 500000;' > file

$ time grep bar < file bar

real 0m0.004s user 0m0.000s sys 0m0.004s

$ time ./foo.bash bar < file bar

real 0m11.306s user 0m10.117s sys 0m1.188s

However, this is partly because Bash is slow. The standard sh version runs a bit faster with Dash:

$ time dash foo2.sh bar < file
bar

real 0m3.467s user 0m2.113s sys 0m1.353s

However, it's still a difference of three orders of magnitude. Multiple seconds for the scripts, against the near-instant grep. And this is still a file with only a million lines and ~4MB in size. I hope you see the problem...

ilkkachu
  • 138,973
terdon
  • 242,166
  • not that new, Bash 3.2 already has regex matching in [[ ]]. Also it works on Ksh and Zsh too, but I didn't check if there's differences between the shells here. – ilkkachu Mar 23 '21 at 19:09
  • @ilkkachu yes, that's why I wrote "newer", not "new". Point is it won't even be portable across bash, let alone POSIX shells. – terdon Mar 23 '21 at 19:10
  • @ilkkachu, regexps were added in 3.1 but changed in 3.2. zsh behaviour is closer to bas 3.1's. ksh93's is pretty broken. yash as well in [[...]], though is getting better. – Stéphane Chazelas Mar 23 '21 at 19:11
  • @StéphaneChazelas, the CHANGES file seems to say they were added in 3.0-alpha and the quoting behaviour changed in 3.2. – ilkkachu Mar 23 '21 at 19:13
  • @terdon, "newer" than what? 3.2 is 14+ years old now. Somehow I suspect they're not running a 15-year old Solaris if they're playing around with this sort of thing. – ilkkachu Mar 23 '21 at 19:22
  • I can confirm I'm not! Using Bash version 5.0.17 here @ilkkachu – Nordine Lotfi Mar 23 '21 at 19:24
  • 2
    If the need is just to check whether a match exists, [[ $(<file) =~ pattern ]] is also going to work - but I'm not posting this as a separate answer to avoid repeating all the caveats and warnings. – fra-san Mar 23 '21 at 19:27
  • @fra-san, would match the pattern over more than one line, though. – ilkkachu Mar 23 '21 at 19:44
  • @ilkkachu newer than bash versions that don't support it. I just wanted to clarify that one shouldn't expect this to work on any bash. I am not too tied to the newer/older terms, feel free to change them. – terdon Mar 23 '21 at 19:45
  • FWIW, I'd rip off the line number counting from the script. grep doesn't do that by default, and if you're comparing against grep -q, might as well have the script do the same thing. (The biggest difference does come from exiting early, of course, but in any case) – ilkkachu Mar 23 '21 at 19:46
  • 1
    @fra-san awww. But that's much cooler than mine! Mine's completely boring. – terdon Mar 23 '21 at 19:46
  • @ilkkachu not my doing. I just wrote a quick and dirty proof of concept. Stéphane made it robust. – terdon Mar 23 '21 at 19:46
  • @terdon, I'm just wondering what the chances are for anyone to realistically run across a Bash older than 3.2... Now, I appreciate being pointed out that Bash 4.x for x < 4 still exist in some long-term supported systems, but going over 10 years and doing it for a not-so-serious purpose would seem to make it a bit different. – ilkkachu Mar 23 '21 at 19:51
  • On small datasets, a pure bash solution might easily beat grep due to the overhead of forking a new process. – 1N4001 Jul 14 '21 at 21:00