Below are two example file listings. I need to compare files of lists (of files) - the last "X" characters of each record to the right of the last "/".
If the file NAME is not found I need the entire row sent to a third file as output.
These are file listings, might be three files in the second listing,
two thousand in the first.
FIRST:
1 /home/dev/share/Datafiles/cases.dbf
2 /home/dev/share/Datafiles/cells.csv
3 /home/dev/share/Datafiles/clusters.db
4 /home/dev/share/Datafiles/competition.csv
5 /home/dev/share/Datafiles/coplot.csv
6 /home/dev/share/Datafiles/daphnia.csv
7 /home/dev/share/Datafiles/das.txt
8 /home/dev/share/Datafiles/deaths.sas7bdat
9 /home/dev/share/Datafiles/decay.csv
10 /home/dev/share/Datafiles/example.db
11 /home/dev/share/Datafiles/fertyield.lst
12 /home/dev/share/Datafiles/fisher.csv
TWO:
1 /test/kitchen/cooks/transfer/cases.dbf
2 /test/kitchen/cooks/transfer/cells.csv
3 /test/kitchen/cooks/transfer/clusters.db
4 /test/kitchen/cooks/transfer/coplot.csv
5 /test/kitchen/cooks/transfer/das.txt
6 /test/kitchen/cooks/transfer/deaths.sas7bdat
7 /test/kitchen/cooks/transfer/decay.csv
8 /test/kitchen/cooks/transfer/example.db
9 /test/kitchen/cooks/transfer/fertyield.lst
10 /test/kitchen/cooks/transfer/fisher.csv
Two files not found in listing TWO that exist in listing ONE : "Competition.csv" (#4) and "daphinia.csv" (#6).
Sorting the files does not work, file paths can be very short or very long and multiple copies of files can be found in muliple directories.
Comm/diff/cmp produced unsatisfactory results as I'm only looking for the last 'X" number of characters (based on file name, extension) in the RIGHT of each row.
(In Microsfot EXCEL I would simply extract everything to the right of the last "/", row-by-row, save it to a another list and VLOOKUP that list with the first list.)
But this is not a Microsoft installation.
A script to awk in the contents of list (file) two, and search through list (file) one, output not matching to file three?
Also parsing out the directory names with sed and leaving only two lists of file names has been difficult - don't know what paths I'd be replacing as they would differ every time. I played around with cut, but the start of the file name could be anywhere from column 10 to column 150. My intuition is there has to be a way to isolate all characters to the right of that last "/" in the file path.
Then again, I could be wrong.
-x
and omit the second inner grep. This makes the whole command also easier to read:grep -F -v -f <(grep -o '[^/]*$' file2) file1
– Freddy Nov 18 '19 at 02:51