I created my first shell script few days ago and while testing it during creation on few files, it worked flawlessly. However now in practice, I have over 12000 files to edit with it and it is going VERY slow. So is it possible to make it faster? I tried to shorten this part:
grep -rl "${id[$j]}" ../usage --exclude-dir="*/.git*" --exclude=*.{png,jpg,pdf} --include=*.dita | xargs sed -i "s/_[0-9]\+\"/_$apps.$title\"/g";
grep -rl "${id[$j]}" ../usage --exclude-dir="*/.git*" --exclude=*.{png,jpg,pdf} --include=*.dita | xargs sed -i "s/_[0-9]\+\//_$apps.$title\//g";
But I wasn't able to make it work with operators:
grep -rl "${id[$j]}" ../usage --exclude-dir="*/.git*" --exclude=*.{png,jpg,pdf} --include=*.dita | xargs sed -i "s/_[0-9]\+\"/_$apps.$title\"/g" | xargs sed -i "s/_[0-9]\+\//_$apps.$title\//g";
I also tried with && operator it works on the files where I have both cases, but I need the second sed to work even if the first one failed.
I would appreciate your suggestions. Here is my script:
len_1=($(find . -name "*.dita" -not -path "*/.git*"))
len=${#len_1[@]}
echo -e "${CYAN}Found $len objects for modifying...${OUTPUT}"
#echo $len
for ((i=0; i<len; i++)); do
id=($(grep -Po 'id="\K[^"]+' ${len_1[$i]}))
echo -e "${CYAN}Modifying ${len_1[$i]}${OUTPUT}"
apps=$(grep -Po 'appname="\K[^"]+' ${len_1[$i]}) && title=$(grep -Po '<title>\K.*?(?=</title>)' ${len_1[$i]} | head -1) && sed -i "s/_[0-9]\+/_$apps.$title/g" ${len_1[$i]} && sed -i "s/id=\"[0-9]\+\"\+/id=\"$apps.$title\"/g" ${len_1[$i]};
if [ ${#id[@]} -gt 0 ]
then
for ((j=0; j<${#id[@]}; j++)); do
echo -e "${RED}Searching for ${id[$j]}...${OUTPUT}"
grep -rl "${id[$j]}" ../usage --exclude-dir="*/.git*" --exclude=*.{png,jpg,pdf} --include=*.dita | xargs sed -i "s/_[0-9]\+\"/_$apps.$title\"/g" ;
grep -rl "${id[$j]}" ../usage --exclude-dir="*/.git*" --exclude=*.{png,jpg,pdf} --include=*.dita | xargs sed -i "s/_[0-9]\+\//_$apps.$title\//g";
done
else
echo -e "${RED}Didn't found IDs...${OUTPUT}";
fi
done
../usage
and.
directories, excluding files in.git
? Note that you will be reading the files in../usage
twice for each file in.
so if there are 6,000 files in each and there is one line matchingid=".*"
in each you will be reading 72,000,000 files. You can also be launching a very large number of sed processes. Cutting this number in half is good, but 36,000,000 is still a lot! Can you edit the question to show some typical input files and desired output? – icarus Sep 02 '19 at 10:27