-1

I have a script that takes a local log file, reads each line, ID's the card and the IO type, extracts the relevant data string, compares that string to definition files and assigns a T or F value to each bit. It then creates a new log file with human readable bit names and T/F value.

Here are some sample lines from my input:

[09:00:15] STA8   09:58:47 28DEC23  I/O In  07 0000  0000  0000  0000
[09:00:15] STA8   09:58:47 28DEC23  I/O In  08 0000  0010  0000  0000
[09:00:15] STA8   09:58:47 28DEC23  I/O Out 07 --00  ++++  ++++  ++00
[09:00:15] STA8   09:58:47 28DEC23  I/O Out 08 ++++  ++0+  ++++  0000

Here are a some sample lines of ouput:

[09:00:15] STA8   09:58:47 28DEC23  I/O In  07 1:F 2:F 3:F ...
[09:00:15] STA8   09:58:47 28DEC23  I/O In  08 1:F 2:F 3:F ...
[09:00:15] STA8   09:58:47 28DEC23  I/O Out 07 1:F 2:F 3:F 4:F 5:T 6:T 7:T  8:T 9:T 10:T 11:T 12:T ...
[09:00:15] STA8   09:58:47 28DEC23  I/O Out 08 1:T 2:T 3:T 4:T 5:T 6:T 7:F  8:T ...

The script works just fine, but for what is not a very long input file, it can take about 18 seconds to complete. The same input file running a sed on with a long list of changes to find and make is almost instant. I don't know why this one is taking so much longer. This one does reference txt files I have made that contain the names for each bit so I'm wondering if it is doing too much unnecessary reads each time and maybe I can just load them in once instead?

Here is my script:

#!/bin/bash

#Variables d=$(date +%Y-%m-%d_%H. -d '1 hour ago')

Function to read label files and store labels in arrays

read_label_files() { local station_id="$1" local io_type="$2" local card_number="$3"

local label_file="s${station_id:3}${io_type}${card_number}.txt"
local label_array=()

if [ -f "$label_file" ]; then
    IFS=',' read -ra label_array < "$label_file"
fi

echo "${label_array[@]}"

}

Function to add extracted string into an array

data_string_array=()

for ((i=0; i<${#data_string}; i++)); do data_string_array[i]="${data_string:i:1}" done

Function to convert input values to T or F

convert_input_values() { local data_string="$1"

local output_string=&quot;&quot;

for (( i=0; i&lt;${#data_string}; i++ )); do
    if [ &quot;${data_string:$i:1}&quot; == &quot;1&quot; ]; then
        output_string+=&quot;T &quot;
    else
        output_string+=&quot;F &quot;
    fi
done

echo &quot;$output_string&quot;

}

Function to convert output values to T or F

convert_output_values() { local data_string="$1"

local output_string=&quot;&quot;

for (( i=0; i&lt;${#data_string}; i++ )); do
    if [ &quot;${data_string:$i:1}&quot; == &quot;+&quot; ]; then
        output_string+=&quot;T &quot;
    else
        output_string+=&quot;F &quot;
    fi
done

echo &quot;$output_string&quot;

}

Initialize arrays for inputs and outputs

declare -A input_changes declare -A output_changes

Process the input file

while read -r line; do timestamp=$(echo "$line" | awk '{print $1}') station_id=$(echo "$line" | awk '{print $2}') io_type=$(echo "$line" | awk '{print $6}') card_number=$(echo "$line" | awk '{print $7}') logdate=$(echo "$line" | awk '{print $4}')

# Remove spaces from the data string
data_string=$(echo &quot;$line&quot; | awk '{$1=$2=$3=$4=$5=$6=$7=&quot;&quot;; print $0}' | tr -d ' ')

if [ &quot;$io_type&quot; = &quot;In&quot; ]; then
    # Read input label file and store labels in an array
    input_labels=($(read_label_files &quot;$station_id&quot; &quot;i&quot; &quot;$card_number&quot;))

    # Convert input values to T or F
    converted_values=($(convert_input_values &quot;$data_string&quot;))

    # Store current values for comparison
    current_values=&quot;${input_changes[$station_id,$card_number]}&quot;
    input_changes[$station_id,$card_number]=&quot;$data_string&quot;

    # Print the output line with labels and T/F values
    output_line=&quot;$timestamp $station_id $logdate I/O In  $card_number -&quot;
    if [ -z &quot;$current_values&quot; ]; then

echo "NEW"

        for ((i=0; i&lt;${#input_labels[@]}; i++)); do
            output_line+=&quot; ${input_labels[i]}:${converted_values[i]}&quot;
        done
    else

echo "UPDATE"

        for ((i=0; i&lt;${#current_values}; i++)); do
        current_array[i]=&quot;${current_values:i:1}&quot;
        done
        #current_array=($current_values)
        for ((i=0; i&lt;${#data_string}; i++)); do
        data_string_array[i]=&quot;${data_string:i:1}&quot;
        done

#echo "old ${current_array[@]}" #echo "new ${data_string_array[@]}"

        for ((i=0; i&lt;${#input_labels[@]}; i++)); do

#echo "Comparing index $i: data_string_array[${i}] = ${data_string_array[i]}, current_array[${i}] = ${current_array[i]}" if [ "${data_string_array[i]}" != "${current_array[i]}" ]; then output_line+=" ${input_labels[i]}:${converted_values[i]}" fi done

#echo "Output Line: $output_line"

    fi
    echo &quot;$output_line&quot; &gt;&gt; /home/logs/IO.$d.txt

elif [ &quot;$io_type&quot; = &quot;Out&quot; ]; then
    # Read output label file and store labels in an array
    output_labels=($(read_label_files &quot;$station_id&quot; &quot;o&quot; &quot;$card_number&quot;))

    # Convert output values to T or F
    converted_values=($(convert_output_values &quot;$data_string&quot;))

    # Store current values for comparison
    current_values=&quot;${output_changes[$station_id,$card_number]}&quot;
    output_changes[$station_id,$card_number]=&quot;$data_string&quot;

    # Print the output line with labels and T/F values
    output_line=&quot;$timestamp $station_id $logdate I/O Out $card_number -&quot;
    if [ -z &quot;$current_values&quot; ]; then

echo NEW

        for ((i=0; i&lt;${#output_labels[@]}; i++)); do
            output_line+=&quot; ${output_labels[i]}:${converted_values[i]}&quot;
        done
    else

echo UPDATE

        for ((i=0; i&lt;${#current_values}; i++)); do
        current_array[i]=&quot;${current_values:i:1}&quot;
        done
        #current_array=($current_values)
        for ((i=0; i&lt;${#data_string}; i++)); do
        data_string_array[i]=&quot;${data_string:i:1}&quot;
        done

#echo "old ${current_array[@]}" #echo "new ${data_string_array[@]}" for ((i=0; i<${#output_labels[@]}; i++)); do #echo "Comparing index $i: data_string_array[${i}] = ${data_string_array[i]}, current_array[${i}] = ${current_array[i]}" if [ "${data_string_array[i]}" != "${current_array[i]}" ]; then output_line+=" ${output_labels[i]}:${converted_values[i]}" fi done fi echo "$output_line" >> /home/logs/IO.$d.txt fi done < /home/logs/DataLog.$d.txt

ditch
  • 9
  • 3
  • your question belongs at https://codereview.stackexchange.com/ – jsotola Dec 28 '23 at 19:08
  • 11
    It seems you're just processing text there, so the answer is "stop using Bash for that". The shell is made for running commands, there are other tools for text processing, like AWK and Perl. Something like that could be done in a single process, instead of launching, umm. six or more? copies of awk for each. individual. input. line. See Why is using a shell loop to process text considered bad practice? – ilkkachu Dec 28 '23 at 19:15
  • @ilkkachu I see what you mean now. Thanks for the link, makes sense why its slow now. I'll have to learn some programming in something that will work for this. I'm only familiar with bash (and not too much obviously). – ditch Dec 28 '23 at 19:39
  • If it was me, I would add a timestamp to each block before and after, this way you can see where the bottleneck is. Then work from there. – TechLoom Dec 28 '23 at 19:49
  • 1
    Consider learning the basics of awk. It made my life far easier in parsing text. Here is a link: https://opensource.com/article/20/9/awk-ebook – td211 Dec 29 '23 at 13:51
  • 1
    Maybe explain the logic behind what you want to do? – td211 Dec 29 '23 at 13:54

1 Answers1

0

The answer is not to use bash for this kind of thing. I'm going to learn Python for it instead.

ditch
  • 9
  • 3
  • 1
    This isn't much of an answer to your question as asked. I've closed the question since you appear to be going in a different direction. (One that I agree with, by the way!). Good luck! – Jeff Schaller Dec 30 '23 at 20:32