I have a large file that needs to be filtered by the first field (which is never repeated). Example as below:
NC_056429.1_398 2 3 0.333333 0.333333 0.333333 0.941178
NC_056429.1_1199 2 0 0.333333 0.333333 0.333333 0.941178
NC_056442.1_7754500 0 3 0.800003 0.199997 0.000000 0.000001
NC_056442.1_7754657 1 2 0.000000 0.199997 0.800003 0.888891
NC_056442.1_7754711 2 0 0.888891 0.111109 0.000000 0.800002
NC_056442.1_7982565 0 1 0.800003 0.199997 0.000000 0.666580
NC_056442.1_7982610 1 0 0.800003 0.199997 0.000000 0.000000
NC_056442.1_7985311 2 0 0.888891 0.111109 0.000000 0.000000
I am trying to use awk to filter a file in a shell script by the first column, and I need to use a variable because its in a while loop. The while loop calls in a text file such as:
NC_056442.1 7870000 # 1st field = $chrname, 2nd field = $pos
NC_056443.1 1570000
Previously in the script, I find a target value using a calculation with $pos to get $startpos and $endpos as shown below:
chrname="NC_056442.1" # column 1 in pulled file
startpos=7754657 # calculated in prior script
endpos=7982610 # calculated in prior script
start=${chrname}_${startpos} # this was an attempt to simplify the awk command
end=${chrname}_${endpos}
awk -v s="$start" -v e-"$end" '/s/,/e/' file.txt > cut_file.txt
If I manually type in the values, like below, I get a file that includes lines 5-8 only.
awk '/NC_056442.1_7754657/,/NC_056442.1_7982610/' file.txt > cut_file.txt
Output File
NC_056442.1_7754657 1 2 0.000000 0.199997 0.800003 0.888891
NC_056442.1_7754711 2 0 0.888891 0.111109 0.000000 0.800002
NC_056442.1_7982565 0 1 0.800003 0.199997 0.000000 0.666580
NC_056442.1_7982610 1 0 0.800003 0.199997 0.000000 0.000000
I am struggling because I do not know how to get the s and e variables to actually run. I have tried a variety of options including "ENVIRON[]". As someone relatively new to bash (and a first post here), I do not know how to troubleshoot this. I am open to answers outside of awk. Please let me know if I need to rephrase my question or add more information.
I need to use a variable because its in a while loop
- that's probably a bad starting point, see why-is-using-a-shell-loop-to-process-text-considered-bad-practice. – Ed Morton Jan 16 '23 at 13:49The while loop calls in a text file such as:
in this part you give values that don't match in your other files. – Marius_Couet Jan 17 '23 at 07:54