How to parse a string containing multiple hyphens and/or whitespaces for line-by-line processing with grep etc?

Question

I'm working with auditd rules on RHEL 7 and 8. Considering these example files...

file2.txt:

-a always,exit -S unlink -S unlinkat -S rename -S renameat -F auid>=1000 -F auid!=4294967295 -k delete
-a always,exit -F arch=b32 -S chmod,fchmod,fchmodat -F auid>=1000 -F auid!=unset -F key=perm_mod
-a always,exit -F arch=b64 -S chmod,fchmod,fchmodat -F auid>=1000 -F auid!=unset -F key=perm_mod
-a always,exit -F arch=b32 -S lchown,fchown,chown,fchownat -F auid>=1000 -F auid!=unset -F key=perm_mod
-a always,exit -F arch=b64 -S chown,fchown,lchown,fchownat -F auid>=1000 -F auid!=unset -F key=perm_mod
-a always,exit -F arch=b32 -S setxattr,lsetxattr,fsetxattr,removexattr,lremovexattr,fremovexattr -F auid>=1000 -F auid!=unset -F key=perm_mod
-a always,exit -F arch=b64 -S setxattr,lsetxattr,fsetxattr,removexattr,lremovexattr,fremovexattr -F auid>=1000 -F auid!=unset -F key=perm_mod
-w /etc/sudoers -p wa -k actions
-w /etc/sudoers.d/ -p wa -k actions

file1.txt:

-a always,exit -S unlink -S unlinkat -S rename -S renameat -F auid>=1000 -F auid!=4294967295 -k delete
-a always,exit -F arch=b32 -S chmod,fchmod,fchmodat -F auid>=1000 -F auid!=unset -F key=perm_mod
-a always,exit -F arch=b64 -S chmod,fchmod,fchmodat -F auid>=1000 -F auid!=unset -F key=perm_mod

I'm trying to parse these files programmatically with bash such that file2.txt is checked to see if it contains any of the lines in file1.txt; if it does, those lines should be deleted from file2.txt. I do not want to modify file1.txt in this process.

Desired output:

file2.txt:

-a always,exit -F arch=b32 -S lchown,fchown,chown,fchownat -F auid>=1000 -F auid!=unset -F key=perm_mod
-a always,exit -F arch=b64 -S chown,fchown,lchown,fchownat -F auid>=1000 -F auid!=unset -F key=perm_mod
-a always,exit -F arch=b32 -S setxattr,lsetxattr,fsetxattr,removexattr,lremovexattr,fremovexattr -F auid>=1000 -F auid!=unset -F key=perm_mod
-a always,exit -F arch=b64 -S setxattr,lsetxattr,fsetxattr,removexattr,lremovexattr,fremovexattr -F auid>=1000 -F auid!=unset -F key=perm_mod
-w /etc/sudoers -p wa -k actions
-w /etc/sudoers.d/ -p wa -k actions

file1.txt (unchanged):

-a always,exit -S unlink -S unlinkat -S rename -S renameat -F auid>=1000 -F auid!=4294967295 -k delete
-a always,exit -F arch=b32 -S chmod,fchmod,fchmodat -F auid>=1000 -F auid!=unset -F key=perm_mod
-a always,exit -F arch=b64 -S chmod,fchmod,fchmodat -F auid>=1000 -F auid!=unset -F key=perm_mod

I've tried a few different approaches, but this is probably the closest I've gotten (excuse minor syntactical errors, as these are transposed by hand).

# Write deltas to a temporary file
grep -f file2.txt file1.txt >> temp_file.txt
For each line in the temporary delta file, delete that line from file2.txt
for i in $(cat temp_file.txt); do 
sed -i /"$i"/d file2.txt
done;

This gets the deltas into a temp file, but then the replacement doesn't work. I've tried -- escaping; no difference:

sed -e expression #1: expected newer version of sed
sed -e expression #1: unknown command 'u'

Double-dash escaping seems to make no difference, e.g.:

sed -i -- /"$i"/d foo.txt

For the heck of it, I've also tried unquoted:

sed -i /$i/d foo.txt

I feel like I'm probably missing something simple, but I've bashed my head against this for a few hours and I haven't unraveled it. Any idea what I'm doing wrong?

So you want to treat the lines in file2.txt as patterns and then remove the patterns in file2.txt that match anything in file1.txt? — Kusalananda, May 06 '22 at 13:22
Welcome to the site. When asking questions about text processing, please be sure to add a representative example of all input files/text, along with the desired output, so that contributors have test data they can copy-and-paste to check proposed solutions. — AdminBee, May 06 '22 at 13:23
You can just grep -vFf (or awk or sed at once), no need to parse their delta. But could you have lines with the options in different order, that will not be considered as duplicates? Or this can't be. — thanasisp, May 06 '22 at 13:24
@Kusalananda basically, yes. To be more specific, I'm trying to compare my custom.rules file with my main audit.rules file, and then delete any differences from custom.rules (since auditd won't start up with duplicate rules defined). I just tried to keep it generic for simplicity's sake. — pants_towel, May 06 '22 at 13:30
@thanasisp yeah, in theory the lines could be different, but in practice I'm modifying a default set with consistent values, so I don't expect to run into any issues like that. The one I did run into previously (key= versus -k ) I've already solved for. — pants_towel, May 06 '22 at 13:32
Thanks for the edit, but you still haven't shown us the expected output of this, and that is the single most important piece of information. So please [edit] your question and show us the output you expect after processing these two files. — terdon, May 06 '22 at 13:57
I added some representative inputs and outputs to show more clearly what I'm trying to accomplish. Hopefully that clears it up. — pants_towel, May 06 '22 at 14:10
Re. sed -i /"$i"/d vs. sed -i -- /"$i"/d vs. sed -i /$i/d one issue you get there is that your data contains slashes, so the $i will at some point expand to /etc/sudoers, so sed sees //etc/sudoers/d. Quotes are a shell thing, and what sed does with what it gets is different. See Why does my shell script choke on whitespace or other special characters?, https://mywiki.wooledge.org/Quotes and/or https://mywiki.wooledge.org/WordSplitting for the thing about the quotes. — ilkkachu, May 06 '22 at 14:56
Also for i in $(cat temp_file.txt) is seldom what you want, because you get splitting on whitespace by default, not lines (see the above links). while IFS= read -r line; do ... done < file might be better, see e.g. Understanding "IFS= read -r line" and Busy box Read file line by line — ilkkachu, May 06 '22 at 14:58

thanasisp · Accepted Answer · 2022-05-06T16:53:47.563

4

grep -vxFf file1.txt file2.txt > tmp.txt && mv tmp.txt file2.txt

-F stands for fixed strings, so you will have no issues with special characters.

-v is doing the opposite from your existing command, it is printing the lines that don't match.

Also -x is necessary if we want to match whole lines only.

The awk equivalent is like this:

awk 'FNR==NR{a[$0]; next} !($0 in a)' file1.txt file2.txt

edited May 06 '22 at 16:53

answered May 06 '22 at 14:12

thanasisp

8,122

That appears to do the trick! I knew I was overthinking this. Thanks much! – pants_towel May 06 '22 at 14:17

How to parse a string containing multiple hyphens and/or whitespaces for line-by-line processing with grep etc?

For each line in the temporary delta file, delete that line from file2.txt

1 Answers1