Find everything between FROM and WHERE words excluding the words itself, pattern can have multiple occurrences

Question

I need to extract everything between the SQL keywords FROM and WHERE from a file. There can be multiple combination of FROM and WHERE. Also this has to be case insensitive match. The SQL file is as below:

    SELECT col1 as column1,
    Col2 as column2,
    Col3 as column3,
      (SELECT t1.col4
      from table_1 t1, table_3 t3
      WHERE t1.col5 = t2.col6
      AND t1.col2 = t3.col11
      ) as column4
    FROM
    table_2 t2,
    table_4 t4
    where t2.col7 ='Active'
    AND t2.col12 = t4.col13
    AND t2.col8 IN ('abc','def','ghi')
    AND t2.col8||''||t2.col9 <> 'jkl'
    AND t2.col10 IS NULL;

Output desired should be

table_1 t1
table_3 t3
table_2 t2
table_4 t4

I have tried the following which almost solves the problem, except that if the table name is present in the next line where "FROM" is, this breaks and doesn't prints as output.

#!/bin/sh
    awk '
    BEGIN {IGNORECASE=1} { found = 0; }
    /FROM/ {
        if (!found) {
            found = 1;
            $0 = substr($0, index($0, "FROM") + 4);
        }
    }
    /WHERE/ {
        if (found) {
            found = 2;
            $0 = substr($0, 0, index($0, "WHERE") - 1);
        }
    }   
        { if (found) {
            print;
            if (found == 2)
                found = 0;
        }
    }
    '

Sequence of the output doesn't matter much. I tried using Thomas Dickey's answer here : https://unix.stackexchange.com/questions/273496/how-do-i-display-all-the-characters-between-two-specific-strings
The only problem with the that solution is it doesn't work if the FROM keyword and the table names are in different line like in the example sql above. — Mohammed Arif, Jun 03 '20 at 10:45
Welcome to U&L SE, I'm sorry, but this is not a code writing service website. We're glad to help though if you have issues with an own approach. So, please add relevant information (e.g. what you have tried) to the Q, not in the comments. — pLumo, Jun 03 '20 at 10:48
@pLumo If you are referring to the SQL code in the question, it's actually a dummy code and i just wanted to post the contents of the file so that the solutions can be tested against this file. — Mohammed Arif, Jun 03 '20 at 11:17
No i'm referring to the non-existant code that solves your issue, not the SQL code. Posting an example input is just fine :-). — pLumo, Jun 03 '20 at 11:26
@AdminBee : That solution is too simple to solve this problem. I actually went through almost all the pattern matching questions on this platform but couldn't find a similar question and solution for this one. — Mohammed Arif, Jun 03 '20 at 17:24

bu5hman · Answer 1 · 2020-06-04T08:48:48.683

0

How about .... with gawk

awk -F"from|FROM|where|WHERE" 'BEGIN{RS=""}{for (i=2;i<=NF;i+=2) print $i}' file |
    tr "," "\n" | column -t

table_1  t1
table_3  t3
table_2  t2
table_4  t4

As long as there are no mixed case like From/Where, although you could then

awk -F"[fF]rom|FROM|[wW]here|WHERE"

Or ultimately

awk -F"[fF][rR][oO][mM]|[wW][hH][eE][rR][eE]"

But then I would have to go and seek professional medical attention

Update The default column separator is 2 spaces (see the man page). If you only want one space then just use

column -o" " -t

edited Jun 04 '20 at 08:48

answered Jun 03 '20 at 20:42

bu5hman

4,756

Excellent! This works. How can i get the output separated by single space and not by tab character as this code does? – Mohammed Arif Jun 04 '20 at 05:49

Ed Morton · Answer 2 · 2020-06-04T01:09:01.480

0

With GNU awk for various extensions (IGNORECASE, \s, \<, \>, multi-char RS, and gensub()):

$ cat tst.awk
BEGIN {
    IGNORECASE = 1
    RS = "\\s*\\<where\\>\\s*"
}
sub(/.*\<from\>\s*/,"") {
    print gensub(/\s*,\s*/,ORS,"g")
}

$ awk -f tst.awk file
table_1 t1
table_3 t3
table_2 t2
table_4 t4

edited Jun 04 '20 at 01:09

answered Jun 04 '20 at 01:03

Ed Morton

31,617

1

Thanks, i will try out this one also and let you know. – Mohammed Arif Jun 05 '20 at 15:19

Find everything between FROM and WHERE words excluding the words itself, pattern can have multiple occurrences

2 Answers2