I need to filter a very large csv file in the most efficient way. I already tried csvgrep but is not efficient timewise, so I'm now trying to use AWK for this task.
The csv file has a ; separator, and I need to filter all rows that have in the 48th column a certain string that starts with a certain pattern, that I will pass through a script. So it will be something along the lines of:
pattern='59350'
awk -F ";" '$48 ~ /^59350/ input.csv > output.csv # This works
However, I need to pass $pattern
inside the regex statement, rather than explicitly write the pattern.
I have tried several combinations but all give me an empty output.csv file.
Here are some of my failed attempts:
awk -F ";" -v var="$pattern" '$48 ~ /^var/' input.csv > output.csv
awk -F ";" -v var="$pattern" '$48 ~ /^$var/ {print}' input.csv > output.csv
awk -F ";" -v var="$pattern" '$48 ~ /^${var}/ {print}' input.csv > output.csv
How do I do this?
Please, also, if you have a more efficient way that won't load the whole csv file in memory or just faster (I was thinking of grep but not sure if it is suitable and how to implement it)
Thank you in advance
^59350
and now you know that to pass a regexp in in a variable isawk -v var='<regexp>' '$42 ~ var'
so just do that -awk -v var='^59350' '$42 ~ var'
. – Ed Morton May 26 '21 at 15:20awk -v var='59350' 'index($42,var) == 1'
. – Ed Morton May 26 '21 at 15:25awk -F ";" -v var='^$pattern' '$48 ~ var' input.csv > output.csv
but output.csv is empty. I don't mind if you want to call it differently, but I need to pass it as a variable in my workflow, not hardcoded. Thanks – Margherita Di Leo May 26 '21 at 15:29awk -F ";" -v var=$pattern 'index($48,var) == 1'
worked! Thank you! PS I still believe this question is totally different from the one linked and deserves to be reopened as more people can read the correct answer. Thanks again for your time. – Margherita Di Leo May 26 '21 at 15:37pattern
is a shell variable, you need double quotes rather than single quotes to allow the value to be expanded before it is passed to awk:-v var="^$pattern"
. When you do-v var='^$pattern'
, awk is trying to match$pattern
literally. – steeldriver May 26 '21 at 15:42