1

I have a rather long case statement (about 150 patterns to match).

case "$line" in
  pattern1) ret=result1;;
  pattern2) ret=result2;;
  pattern3) ret=result3;;
   ...
   ...
esac

However, this is not really appropriate as the patterns are really user modifiable data .. So everytime a new pattern is needed, I need to go into and modify the case statement.

What i would like to do is create a separate text file something like:

pattern1=result1
pattern2=result2
pattern3=result3
...
...

The text file would be read and used to create the case statement. This has the advantage of not needing to modify the code every time a new pattern is required.

Is there a way to automatically create a case statement from input from a text file? I'm open to other options, but am looking for a solution that can do the pattern matching to the input line very fast as I also have a large number of input lines to match against.

daniel
  • 169
  • 1
    Are the patterns to match against actual patterns or are they fixed strings? (Either way—I suspect you should just be using awk.) – Wildcard Jan 12 '16 at 07:14
  • From the comments below you mention you have about 10,000 lines to check, each with 150 patterns. I recommend two things: (1) Use awk rather than a shell case switch—see Why is using a shell loop to process text considered bad practice?. (2) If the list of patterns doesn't change very often, write a code generator to accept the pattern list and output an awk script with the patterns hard-coded. This is very likely the most efficient given the numbers of lines you mention. – Wildcard Jan 12 '16 at 07:37
  • I'd like to post this as an answer, but I'm missing one thing: What do you want to actually do with the values result1, result2, etc.? – Wildcard Jan 12 '16 at 07:38
  • They are about 80% fixed strings and a rest are patterns. The list does not change so often. I do not really understand this suggestion: "write a code generator to accept the pattern list and output an awk script with the patterns hard-coded" Can you explain the suggestion in more detail? – daniel Jan 12 '16 at 18:16
  • "What do you want to actually do with the values result1, result2, etc.?" ...The best way to describe it .. this is really a sort of translation table .. I'm matching a line based upon the string/pattern and then the line is replaced by the result. – daniel Jan 12 '16 at 18:19
  • Also see man bash regarding the performance issue. At the very bottom: BUGS It’s too big and too slow. – Wildcard Jan 14 '16 at 20:45

3 Answers3

2

It's a fairly odd thing to do, dynamic case statement generation...but yes, it's possible.

I would do it something like as follows (if I had explored all other approaches I could think of; I don't know that I would recommend this):

. <( awk -F= 'BEGIN { print "case \"$line\" in" }
                    { print $1 ") ret=" $2 ";;" }
              END   { print "esac" }' patternfile )

If patternfile contains

pattern1=result1
pattern2=result2
pattern3=result3

the awk command by itself would output

case "$line" in
pattern1) ret=result1;;
pattern2) ret=result2;;
pattern3) ret=result3;;
esac

and with the . <( awk ... ) construct this will be sourced into your script just as though you had written the case switch directly in your script file.


That answers "dynamic case statement creation" in a shell script. However, since you're planning on doing this within a loop, the above would be a very bad way to do it—since you would run awk 10,000 times on your patternfile, thus outweighing any possible performance benefit.

Instead it would be better to write the code generator to create a case switch within a function definition—create and source a whole function definition, in other words:

# At the top of your script file:
. <( awk -F= 'BEGIN { print "my_case_switch_function() {"
                      print "case \"$1\" in" }
                    { print $1 ") ret=" $2 ";;" }
              END   { print "esac"
                      print "}"   }' patternfile )

Within your while loop (or wherever else you want):

my_case_switch_function "$line"

This would allow you to re-use the generated case switch over and over (10,000 times if you like) after only processing patternfile once. So the performance is as good as you would get if you manually created the case switch from the patternfile and hard-coded it in your script (except for the small overhead of running awk once on a 150 line file, which is negligible next to processing 10,000 lines).


However, it must be reiterated: shell script case switches are not the tool for processing a 10,000 line file line by line. So even though this solution closely approaches the performance you could get with a hardcoded case switch, it will probably still be slow.

To quote Stephane Chazelas:

As said earlier, running one command has a cost. A huge cost if that command is not builtin, but even if they are builtin, the cost is big.

And shells have not been designed to run like that, they have no pretension to being performant programming languages. They are not, they're just command line interpreters. So, little optimisation has been done on this front.

Wildcard
  • 36,499
  • I accept your critique about using the case statement vs speed .... Right now I'm thinking the: ret=$(grep "^$line" file.txt | cut -d = -f 2) is simple and may indeed be the best solution for me .. I'm going to give that a try. Thanks for the direction. – daniel Jan 12 '16 at 18:29
  • I have to recant support for the grep solution ... I don't think that it will work .. I can not grep the $line because it does not exist in the string/pattern file. The string/pattern in the file is a substring of $line. It is this that drove me to look at the case statement. – daniel Jan 12 '16 at 18:43
  • @daniel, I think this is a case where you should actually look into Perl. Practical Extraction and Reporting Language. That's almost definitely the best tool for this job. – Wildcard Jan 13 '16 at 04:16
  • ok. I've not used Perl but this can be a learning experience. – daniel Jan 13 '16 at 09:35
1

I think this is an XY problem. You don't really need to generate a dynamic case statement. You just want to use a file as a simple key-value store. One solution is to search through the file with grep and then extract the value with cut:

ret=$(grep "^$line" file.txt | cut -d = -f 2)
gardenhead
  • 2,017
0

With this pattern file:

a*c=abc
def=def

and assuming that there is only one = per line:

#! /bin/bash

line=abc
while IFS= read -r patternline; do
        [[ $patternline =~ .=. ]] || continue
        pattern="${patternline%=*}"
        result="${patternline##*=}"
        if [[ "$line" == $pattern ]]; then
                ret="$result"
                echo "$patternline"
                break
        fi
done <patterns.txt

If many lines have to be checked then the file content would be read into an array, though:

#! /bin/bash

patterns_file="patterns.txt"

patterns=()
results=()
patternindex=0


### BEGIN: init
while IFS= read -r patternline; do
        [[ $patternline =~ .=. ]] || continue
        patterns[patternindex]="${patternline%=*}"
        results[patternindex]="${patternline##*=}"
        ((patternindex++))
done <"$patterns_file"
pattern_count="$patternindex"
### END:   init

patterncheck () {
        local line="$1" i=0 pattern= result=
        for((i=0;i<pattern_count;i++)); do
                pattern="${patterns[i]}"
                result="${results[i]}"
                if [[ "$line" == $pattern ]]; then
                        ret="$result"
                        echo "$line"
                        break
                fi
        done
} # patterncheck ()

# tests

patterncheck abc
patterncheck def
Hauke Laging
  • 90,279
  • I have about 10,000 lines to check and the patterns are a little long so I am concerned about speed .... Using your suggested approach, I would end up with nested while/do loops as I read thru the 10,000 lines ... I was hoping to avoid this by dynamically creating the case statement because i thought the case statement would be immensely faster then the nested while/do statements. – daniel Jan 12 '16 at 06:40
  • @Wildcard I haven't suggested using the shell; that was the question. – Hauke Laging Jan 12 '16 at 07:24
  • @daniel See the update. – Hauke Laging Jan 12 '16 at 07:26
  • Although the proposed solution will technically process the data, it is just too slow ... and that is the reason that I was looking for a way to dynamically build a case statement. – daniel Jan 12 '16 at 12:52
  • Although the proposed solution will technically process the data, it is just too slow ... and that is the reason that I was looking for a way to dynamically build a case statement. For example, I thought I could do something like: case "$line" in "${case_statements[@]}" esac. Where I have an array pulled from the text file that would expand into the separate case statements. But I've not been able to get this working. – daniel Jan 12 '16 at 12:59