I would like to merge a variable from one file to another in linux. The first variable contains the name I want to merge files on.
I have sorted both files using both -f and -k:
sort -f -k 1,1 SCZ.N.tmp> SCZ.N.tmp.sorted
and sort -f -k 1,1 1kg.tmp > 1kG.ref_file.sorted
However, when I join both files with this command: join -1 1 -2 1 SCZ.N.tmp.sorted 1kG.ref_file.sorted> SCZ.freq.joined
I keep getting the error 'join: SCZ.N.tmp.sorted:112855: is not sorted: chr1_100002155_D D I6 0.995112 0.0184 0.7897 87016' Nevertheless, the join continues and the majority is merged. However, I am not sure whether I am losing a small proportion of cases because of mismatch between the files, or because something goes wrong with sorting these files.
Does anybody know what I am doing wrong? And what i can do to not get this error? Thank you!
I have also tried: LANG=en_EN sort -f -k 1,1 SCZ.N.tmp> SCZ.N.tmp.sorted2
and LANG=en_EN sort -f -k 1,1 1kg.tmp > 1kg.tmp.sorted2
, with then joining using: LANG=en_EN join -1 1 -2 1 SCZ.N.tmp.sorted2 1kg.tmp.sorted2> SCZ.freq.joined
. But that did not solve it.
LC_COLLATE=en_EN
. I suspect LANG only affects presentation, not sequencing. Failing that, tryLC_ALL=C
which is the ultimate sanction. – Paul_Pedant Aug 07 '20 at 14:24LC_COLLATE=en_EN
andLC_ALL=C
which made the error change to join: SCZ.N.tmp.sorted2:317251: is not sorted: MERGED_DEL_2_4660 D I5 0.98738 0.0113 0.2611 87016 but comes down the same final joined sample size... – LauraW Aug 07 '20 at 14:38LOCALE=C
and that gave the earlier error again with join: SCZ.N.tmp.sorted2:112855: is not sorted: – LauraW Aug 07 '20 at 15:00locale
command to see valid names and current settings. RunLC_ALL=C locale
to see what it changes. – Paul_Pedant Aug 07 '20 at 15:40