Why awk does not ignore "space" as delimiter?

Question

I have a problem with my script.

Prelude Firstly, I have a list, 100 lines-file like that:

100;TEST ONE
101;TEST TWO
...
200;TEST HUNDRED

Each line have 2 arguments. For example, first line's arguments are: "645", "TEST ONE". So semicolon is a delimiter.

I need to put both arguments in two variables. Let's say it will be $id and $name. For each line, $id and $name values will be different. For example, for second line $id = "646" and $name = "TEST TWO".

After that I need to take the sample file and change predefined keywords to $id and $name values. Sample file looks like this:

xxx is yyy

And as a result I want to have 100 files with different content. Each file must contain $id and $name data from every line. And It must be named by it's $name value.

There is my script:

#!/bin/bash -x
rm -f output/*

for i in $(cat list)
    do

        id="$(printf "$i" | awk -F ';' '{print $1}')"
        name="$(printf "$i" | awk -F ';' '{print $2}')"

        cp sample.xml output/input.tmp

        sed -i -e "s/xxx/$id/g" output/input.tmp
        sed -i -e "s/yyy/$name/g" output/input.tmp

        mv output/input.tmp output/$name.xml


    done

So, I just try to read my list file line by line. For every line I'm getting two variables and then use them to replace keywords (xxx and yyy) from sample file and then save result.

But something went wrong

As a result I have only 1 output file. And debug is looking bad.

Here is debug window with only 2 lines in my list file. I got only one output file. File name is just "TEST" and it contain a string: "101 is TEST".

Two files expected: "TEST ONE", "TEST TWO" and it must contain "100 is TEST ONE" and "101 is TEST TWO".

As you can see, second variable have a space in it ("TEST ONE" for example). I think the issue is related to the space special symbol, but I don't know why. I put -F awk parameter to ";", so awk must interpret only semicolon as a separator!

What I did wrong?

Apart from splitting the input file into separate words on whitespaces before looping over those individual words, you seem to read data that comes from a DOS text file. You may want to convert your for loop into a while IFS= read -r line loop and your input file to a Unix text file. — Kusalananda, Apr 08 '20 at 18:22
you mentioned numbers 645 and 646. Did you mean 100 and 101 ? — RiaD, Apr 09 '20 at 02:03
@Kusalananda Can we make a wiki on Unix & Linux SE with bash-no-goes, mentioning looping over outputs of cat and ls, as these ugly (and error-prone) things pop up nearly every day? — rexkogitans, Apr 09 '20 at 06:48
@rexkogitans We don't have a wiki here. We do have some "canonical Q/A" though, i.e. good answers to common questions that people often refer to. In this case, it's not an issue with cat but with the way the shell splits the result of a command substitution, and relevant Q/A is https://unix.stackexchange.com/questions/108963 For the case with ls, there is https://unix.stackexchange.com/questions/128985 The Q/A at https://unix.stackexchange.com/questions/131766 is a good overall reference. Many users seems to also refer to Greg Wooledge's Wiki: https://mywiki.wooledge.org/BashPitfalls — Kusalananda, Apr 09 '20 at 07:15

schrodingerscatcuriosity · Accepted Answer · 2020-04-08T19:25:09.410

7

If I understand you correctly, you can use a while loop and variable expansion

while IFS= read -r line; do 
  id="${line%;*}"
  name="${line#*;}"
  cp sample.xml output/input.tmp
  sed -i -e "s/xxx/$id/g" output/input.tmp
  sed -i -e "s/yyy/$name/g" output/input.tmp
  mv output/input.tmp output/"$name".xml
done < file

As proposed by @steeldriver, here's a (more elegant) option:

while IFS=';' read -r id name; do 
  cp sample.xml output/input.tmp
  sed -i -e "s/xxx/$id/g" output/input.tmp
  sed -i -e "s/yyy/$name/g" output/input.tmp
  mv output/input.tmp output/"$name".xml
done < file

edited Apr 08 '20 at 19:25

answered Apr 08 '20 at 18:00

schrodingerscatcuriosity

12,396

2

Unless there's a need to preserve leading/trailing whitespace, couldn't one just do while IFS=';' read -r id name; do ? – steeldriver Apr 08 '20 at 19:09
@steeldriver added to the answer – schrodingerscatcuriosity Apr 08 '20 at 19:29
Nice answer, but I’ve started thinking we need to stop using sed to edit files in place. It’s fundamentally a stream editor. The ed command is almost identical and far more portable (given that -i is... tricky). – D. Ben Knoble Apr 09 '20 at 13:27
It probably won't make much difference, but if you have a lot of files to process, it might be better to convert the second sed into a second use of -e on the first sed so each file is processed only once. – Joe Apr 11 '20 at 10:34

score 4 · Answer 2 · 2020-04-08T18:58:49.537

Quoting !!. The quoting on this line is missing:

mv output/input.tmp output/$name.xml

It should be:

mv output/input.tmp output/"$name".xml

to avoid issues with a file name with spaces.

And, the expansion of $(cat list) is being split (and glob) by the shell, that also breaks in spaces.

Maybe you can change to this script:

#!/bin/bash -x
rm -f output/*

inputfile=output/input.tmp

while read -r line
do
    id=${line%%;*}
    name=${line##*;}

    cp sample.xml "$inputfile"
    sed -i -e "s/xxx/$id/g" "$inputfile"
    sed -i -e "s/yyy/$name/g" "$inputfile"
    mv "$inputfile"  output/"$name".xml; echo

done <list

Yeap, I edited my answer with this insight :-) – schrodingerscatcuriosity Apr 08 '20 at 18:48 — schrodingerscatcuriosity, Apr 08 '20 at 18:48

score 2 · Answer 3 · answered Apr 08 '20 at 17:57

The reason your awk is not producing expected results is due tothe way that you are iterating over the file. When you iterate using for i in $(cat file), you are iterating over words (split by IFS), not over lines. To read a file line by line, use while read:

while read -r line; do
    ...
done < file

For further reading, see the following bash FAQ: How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?

score 1 · Answer 4 · answered Apr 09 '20 at 06:09

As an alternative approach, you can do this job with awk in 1 process rather than 4 for each line. This is most likely to be beneficial if there are many lines in list but sample.xml is small.

awk -F';' 'FNR==NR{x=x $0 RS; next} 
{t=x; gsub(/xxx/,$1,t); gsub(/yyy/,$2,t); f="output/"$2".xml"; printf "%s",t >f; close(f)}
' sample.xml list
# shown with unnecessary linebreaks for clarity, but you can put it all on one line

If list has CRLF line endings (aka DOS or Windows format) as commented on your Q, and you can't (easily) or don't want to remove them first, awk can handle that also; just after the second { insert sub(/\r$/,"",$0); (or $2 if you prefer).

perl can also do this (perl can do almost everything awk can do) but a little more verbosely, and although perl is commonly available, it is not POSIX as awk is.

Why awk does not ignore "space" as delimiter?

4 Answers4