You could setup a SQLite database and perform SQL selects from that, which would probably be cleaner to implement and would set you up for being more portable later on.
But here's a rough idea. Say I have 2 files:
$ more index.txt new_vals.txt
::::::::::::::
index.txt
::::::::::::::
1_,2_,4_,5_
::::::::::::::
new_vals.txt
::::::::::::::
5_,2_,1_,4
2_,5_,1_,4
With this command we can match:
$ for i in $(<new_vals.txt); do nums=${i//_,/}; \
grep -oE "[${nums}_,]+" index.txt; done
1_,2_,4_,5_
1_,2_,4_,5_
This demonstrates that we can match each line from new_vals.txt
to an existing line in index.txt
.
UPDATE #1
Based on the OP's edit the following would do what he wants using a modification of the above approach.
$ for i in $(<new_vals.txt); do
nums=${i//_,/}
printf "# to check: [%s]" $i
k=$(grep -oE "[${nums}_,]+" index.txt | grep "[[:digit:]]_$")
printf " ==> match: [%s]\n" $k
done
With a modified version of test data:
$ more index.txt new_vals.txt
::::::::::::::
index.txt
::::::::::::::
1_,2_,4_,5_
0_,2_,3_,9_
::::::::::::::
new_vals.txt
::::::::::::::
5_,2_,1_,4_
2_,5_,1_,4_
1_,1_,1_,1_
1_,2_,4_,4_
Now when we run the above (put inside a script for simplicity, parser.bash
):
$ ./parser.bash
# to check: [5_,2_,1_,4_] ==> match: [1_,2_,4_,5_]
# to check: [2_,5_,1_,4_] ==> match: [1_,2_,4_,5_]
# to check: [1_,1_,1_,1_] ==> match: []
# to check: [1_,2_,4_,4_] ==> match: []
How it works
The above method works by exploiting some key characteristics exhibited by the nature of your data. For example. Only matches will end with a digit followed by a underscore. The grep "[[:digit:]]_$"
picks only these results out.
The other part of the script, grep -oE "[${nums}_,]+" index.txt
will pick out lines that contain characters from strings in the file new_vals.txt
which match strings from index.txt
.
Additional adjustments
If the nature of the data is such that strings may be variable in length then the 2nd grep will need to be expanded to guarantee that we're only picking out strings that are of sufficient length. There are several ways to accomplish this, either by expanding the pattern or by making use of a counter, perhaps using wc
or some other means, that would confirm that the matches are of a certain type.
Expanding it like so:
k=$(grep -oE "[${nums}_,]+" index.txt | \
grep "[[:digit:]]_,[[:digit:]]_,[[:digit:]]_,[[:digit:]]_$")
Would allow for the elimination of strings like this:
$ ./parser2.bash
# to check: [5_,2_,1_,4_] ==> match: [1_,2_,4_,5_]
# to check: [2_,5_,1_,4_] ==> match: [1_,2_,4_,5_]
# to check: [1_,1_,1_,1_] ==> match: []
# to check: [1_,2_,4_,4_] ==> match: []
# to check: [1_,2_,5_] ==> match: []