1

I created my first shell script few days ago and while testing it during creation on few files, it worked flawlessly. However now in practice, I have over 12000 files to edit with it and it is going VERY slow. So is it possible to make it faster? I tried to shorten this part:

grep -rl "${id[$j]}" ../usage --exclude-dir="*/.git*" --exclude=*.{png,jpg,pdf} --include=*.dita | xargs sed -i "s/_[0-9]\+\"/_$apps.$title\"/g";

grep -rl "${id[$j]}" ../usage --exclude-dir="*/.git*" --exclude=*.{png,jpg,pdf} --include=*.dita | xargs sed -i "s/_[0-9]\+\//_$apps.$title\//g";

But I wasn't able to make it work with operators:

grep -rl "${id[$j]}" ../usage --exclude-dir="*/.git*" --exclude=*.{png,jpg,pdf} --include=*.dita | xargs sed -i "s/_[0-9]\+\"/_$apps.$title\"/g" | xargs sed -i "s/_[0-9]\+\//_$apps.$title\//g";

I also tried with && operator it works on the files where I have both cases, but I need the second sed to work even if the first one failed.

I would appreciate your suggestions. Here is my script:

len_1=($(find . -name "*.dita" -not -path "*/.git*"))
len=${#len_1[@]}
echo -e "${CYAN}Found $len objects for modifying...${OUTPUT}"
#echo $len

for ((i=0; i<len; i++)); do
    id=($(grep -Po 'id="\K[^"]+' ${len_1[$i]}))
    echo -e "${CYAN}Modifying ${len_1[$i]}${OUTPUT}"
    apps=$(grep -Po 'appname="\K[^"]+' ${len_1[$i]}) && title=$(grep -Po '<title>\K.*?(?=</title>)' ${len_1[$i]} | head -1) && sed -i "s/_[0-9]\+/_$apps.$title/g" ${len_1[$i]} && sed -i "s/id=\"[0-9]\+\"\+/id=\"$apps.$title\"/g" ${len_1[$i]};

    if [ ${#id[@]} -gt 0 ]
    then
        for ((j=0; j<${#id[@]}; j++)); do
            echo -e "${RED}Searching for ${id[$j]}...${OUTPUT}"
            grep -rl "${id[$j]}" ../usage --exclude-dir="*/.git*" --exclude=*.{png,jpg,pdf} --include=*.dita | xargs sed -i "s/_[0-9]\+\"/_$apps.$title\"/g" ;
            grep -rl "${id[$j]}" ../usage --exclude-dir="*/.git*" --exclude=*.{png,jpg,pdf} --include=*.dita | xargs sed -i "s/_[0-9]\+\//_$apps.$title\//g";
        done
    else
        echo -e "${RED}Didn't found IDs...${OUTPUT}";
    fi
done
revaljilji
  • 35
  • 4
  • How many files are there in the ../usage and . directories, excluding files in .git? Note that you will be reading the files in ../usage twice for each file in . so if there are 6,000 files in each and there is one line matching id=".*" in each you will be reading 72,000,000 files. You can also be launching a very large number of sed processes. Cutting this number in half is good, but 36,000,000 is still a lot! Can you edit the question to show some typical input files and desired output? – icarus Sep 02 '19 at 10:27
  • if you want it not to be abysmally slow, don't do it in a shell loop. use awk or perl or python (or almost anything except shell) for the entire job. See Why is using a shell loop to process text considered bad practice? – cas Sep 02 '19 at 11:23

1 Answers1

2

What about matching " or / and capture them?

sed -i "s/_[0-9]\+\([\"\/]\)/_$apps.$title\1/g"

or, more readably as

sed -i "s=_[0-9]\+\([\"/]\)=_$apps.$title\1=g"
choroba
  • 47,233