3

I want to validate a text file with a script.

The file to validate is:

FDFHDK JKL
1545665 152
HDKFHDK UHG
YRYRUBH DFG
867HDKE WER

Valid lines must match the regex '[A-Z]{7}+[[:space:]]+[A-Z]{3}'.

If all the lines are valid, the script shows a message saying that the file is OK.

If at there is at least one line that doesn't match the regex, the script should show a message and display the lines that don't match the regex.

The script is:

#!/usr/bin/env bash
result=""
output=$(grep -vE '[A-Z]{7}+[[:space:]]+[A-Z]{3}' "$1" |wc -l)
if [[ $output > 0 ]]
then
  echo "These lines don't match:"
  result="${resultado} $(grep -vE '[A-Z]{7}+[[:space:]]+[A-Z]{3}' "$1") \n"
  echo -e $result
else
  echo "The text file is valid"
fi  

The expected output is

These lines don't match
FDFHDK JKL
1545665 152
867HDKE WER

But I'm getting

These lines don't match:
FDFHDK JKL 1545665 152 867HDKE WER

So the actual script is not taking the line break into account.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255

2 Answers2

5

There is absolutely no reason to use an intermediate variable to store output of commands just to perform a test or output that data.

#!/bin/sh -

if grep -q -v -x -E -e '[A-Z]{7}[[:space:]]+[A-Z]{3}' -- "$1" then echo 'Does not verify. Bad lines follow...' grep -v -x -E -e '[A-Z]{7}[[:space:]]+[A-Z]{3}' -- "$1" fi

The regular expression has been corrected to delete the extra + after {7}. The if statement tests the exit status of grep directly. The grep command in the if statement, and later, use -x to force a whole-line match, and the first grep statement uses -q to stop at the first match without outputting anything.

The actual issue in your code is using $result unquoted, which causes the shell to split the value on spaces, tabs, and newlines, and then do filename globing on the generated words. The final set of words are then given as arguments to echo which prints them with spaces as delimiters.


If you are concerned about running grep twice, then run it only once and store the output of it to e.g. a temporary file:

#!/bin/sh -

tmpfile=$(mktemp)

if grep -v -x -E -e '[A-Z]{7}[[:space:]]+[A-Z]{3}' -- "$1" >"$tmpfile" then echo 'Does not verify. Bad lines follow...' cat -- "$tmpfile" fi

rm -f -- "$tmpfile"

Kusalananda
  • 333,661
4

I propose this alternative:

match="$(grep -vEx '[A-Z]{7}[[:space:]]+[A-Z]{3}' "$1")"
[[ "${#match}" -ne 0 ]] && printf "%b\n" "Bad lines:\n${match[@]}"
Bad lines:
FDFHDK JKL
1545665 152
867HDKE WER


Note from @they's answer:

The regular expression has been corrected to delete the extra + after {7}