-2

I'm trying to reproduce this code in Python to Bash

import re
w = open("filetest.txt")
for item in re.findall(r'STRING:\s*(.+)"', w.read()):
  print item

In Bash, but I do not know if it's correct, this is returning nothing.

while read line; do
    if [[ $line =~ r'STRING:\s*(.+)"' ]]; then
        echo $line
    fi
done < filetest.txt

filetest iso.3.6.1.4.1.25355.3.2.6.3.2.1.11.1.1.1 = STRING: "785c7208dcf0"

Output: 785c7208dcf0

Chris Davies
  • 116,213
  • 16
  • 160
  • 287
  • 1
    (1) It looks like all that code could be replaced by a simple grep command. (2) If you provide some representative sample input and corresponding desired output, you will get better answers. – John1024 Apr 04 '19 at 18:54
  • before make regex, I have a function which populate this text file with SNMP. Are you saying that you need this for loop ??? – Shinomoto Asakura Apr 04 '19 at 18:56
  • 1
    It looks to me like you do not need a loop at all. So that we can be sure that answers here really solve your problem, provide a brief but representative sample of that input text file and your corresponding desired output. We don't need the SNMP code, just a representative part of filetest.txt. – John1024 Apr 04 '19 at 19:00
  • 1
    Please provide the info John1024 requests so we can be sure. To be perfectly clear, it sounds like you are trying to recreate a basic function of GREP by reversing a piece of Python that does more or less what GREP fundamentally does by design already. Desired output and intention are important, if this isn't just a "I did not know GREP existed" question. – 0xSheepdog Apr 04 '19 at 19:04
  • The output from your python code is different from the desired output that you show. Which output do you really want? – John1024 Apr 04 '19 at 19:17
  • No, I'm not posted the python output, filetest is the string to be analysed and ouput is the resulted – Shinomoto Asakura Apr 04 '19 at 19:45
  • The existing sample python code includes the leading double-quote in the output (i.e. using your sample input, I get "785c7208dcf0 as the output), but the desired output does not. Which is required? – Chris Davies Apr 04 '19 at 21:09

1 Answers1

2
sed -n 's/.*STRING:[[:blank:]]*\(..*\)/\1/p' filetest.txt

You wouldn't do it in a shell loop as these are generally not ideal for parsing text (see "Why is using a shell loop to process text considered bad practice?").

Instead, the above single command uses sed to match the regular expression (here rewritten as a basic regular expression rather as a PCRE, a Perl compatible regular expression). The editing command used with sed replaces the matching line with the captured text and outputs it.

Another way:

awk -F ':[[:blank:]]*' '/STRING/ { print $2 }' filetest.txt

This treats each line of the file as a record with fields delimited by : followed by any number of spaces or tabs. When the STRING pattern is found on a line, the second such field is printed.

Would you nonetheless want to do it with a bash loop:

while IFS= read -r line; do
    if [[ $line =~ 'STRING:'[[:blank:]]*(.+) ]]; then
        printf '%s\n' "${BASH_REMATCH[1]}"
    fi
done <filetest.txt

The BASH_REMATCH array will contain the various captured bits from the match. The regular expression itself (which should be an extended regular expression) should not be quoted, apart form the bits that needs to be interpreted literally. Note: This is where you went wrong; you quoted the regular expression and did not look in BASH_REMATCH for the captured data. You also tried to use the regular expression exactly the way you would write the expression in Python. bash is not Python.

Or,

while IFS= read -r line; do
    match=$(expr "$line" : '.*STRING:[[:blank:]]*\(..*\)')
    if [ -n "$match" ]; then
        printf '%s\n' "$match"
    fi
done <filetest.txt

Given the input that you have in the question, the various variations above will all output

"785c7208dcf0"

See also:

Kusalananda
  • 333,661