I have a csv file with ~4000 lines, each one containing between 2 and 30 names separated by commas. The names are including titles (for example mr. X Adams or ms. Y Sanders). Some names exist multiple times within the same line, and I would like to have the multiples within the same line removed. It is in a file "input.csv" and another file "output.csv" should be the end result.
Example, I have:
mr. 1,mr. 2,mr. 3,mr. 1,mr. 4
prof. x,prof. y,prof. x
mr. 1,prof y
which should become
mr. 1,mr. 2,mr. 3,mr. 4 (mr. 1 was already meantioned so it should be removed)
prof. x,prof. y (prof. x was already mentioned so it should be removed)
mr. 1,prof y (even though both were already mentioned in the same file, they were not mentioned within this line so they may remain)
Mr X
andmR x
as duplicates. This one would not. Also, the code is necessarily much more convoluted. – Sparhawk Oct 08 '18 at 11:39duplicated pattern/entries within each **field**
is clearly not the same asduplicated field within each **row**
. – Oct 14 '18 at 14:29