0

As part of my security job, I analyze dozens of Google Chrome history files each day using sqlite3 over SSH.

There are a few dozen authorized "safe" sites each user is allowed to navigate to. For my purposes, I don't care about these safe sites. To list the URLs of each history file and ignore the safe websites, I use grep -v and list each safe site as follows:

sqlite3 /home/me/HistoryDatabaseFile.db "select * from urls order by url;" | grep -v safesite1.com | grep -v safesite2.com | grep -v safesite3.com | grep -v safesite4.com

and on and on. My command has grown to at least 20 lines and is becoming unmanageable. Is there any way I could show the user's list of URLs while excluding my safe sites in a listed format? I'm imagining something like:

safesite1.com
safesite2.com
safesite3.com

and then bringing that list into the command. It can be internal or external- I don't really care as long as it ends up outputting in bash.

Thanks for any help you can give me!

4 Answers4

1

I think what you are looking for is something like

grep -vf safe_websites inputfile

-v to invert the matches you already know, and -f is to obtain patterns from the file safe_websites.

pfnuesel
  • 5,837
  • Awesome! Can you clarify the syntax in this example though? Trying "grep -vf /home/me/safesites.txt" only worked to exclude the first site in my list, and "grep -vf /home/me/safesites.txt inputfile" gave me the error: "grep: inputfile: No such file or directory" – Jared Dalton Feb 02 '16 at 00:56
  • @JaredDalton inputfile is the input file, that you would like to grep. In your question you get the input from the sqlite3 command, so you don't need it. grep -vf /home/me/safesites.txt should work, though. What's the content of /home/me/safesites.txt? – pfnuesel Feb 02 '16 at 01:00
  • In my current test, on the first line of safesites.txt I have www.silversneakers.com and then I pressed Enter for a new line, and on the second line I have www.medicare.gov – Jared Dalton Feb 02 '16 at 01:06
  • 1
    Don't forget that unless you force a fixed string match (-F or --fixed-strings), . will match any single character - potentially letting through nefarious sites of the form safesite2Xcom.com - and unless you enforce a whole-word or whole-line match, things like unsafesite2.com – steeldriver Feb 02 '16 at 01:06
0

Another option you may consider is using the egrep form of grep, which will allow you to use Extended regular expressions so you can put multiple targets in a single string thusly:

egrep -v "safesite1\.com|safesite2\.com|safesite3\.com"

Details of these and other extended REs can be had from man 7 re_format.

0

As mentioned, you should use the -f option to grep, and provide a list of patterns for grep to use.

However, you also mention having special characters in your URLs, which makes sense. The correct answer is to use the -F flag to grep to only consider the patterns as fixed strings.

So to accomplish what you want:

First, put your list of safe websites in a file, for example /tmp/safelist.txt. This should look something like:

safesite1.com
safesite2.com
safesite3.com

Next, call grep on that file like so:

sqlite3 /home/me/HistoryDatabaseFile.db "select * from urls order by url;" | grep -vFf /tmp/safelist.txt
Wildcard
  • 36,499
-1

It turns out my problem had to do with the data I was trying to parse. If I tried using the test output

a
b
c
d
e

and then using grep -vf file.txt to remove a, b, and c it worked like a charm. Since I was instead trying to ignore a bunch of websites with a variety of special characters, it never worked for me, even when trying to manipulate an outputted .txt file of my sql query.

My solution to make my command more readable, ultimately, was to use the backslash (\) to split up my command onto multiple lines and make it easier to read:

sqlite3 /home/me/HistoryDatabaseFile.db "select * from urls order by url;" | \
grep -v safesite1.com | \
grep -v safesite2.com | \
grep -v safesite3.com | \
grep -v safesite4.com | \
grep -v safesite5.com

Thanks for your help everyone!