4

I have a file, words.txt on a Linux machine, containing the following lines. How can I repeat each of the strings user, apple and banana, appending a number from 1 to 4 to them?

user
apple
banana

Expected Output :

user1
user2
user3
user4
apple1
apple2
apple3
apple4
banana1
banana2
banana3
banana4

I tried the following but only worked with 1 string.

seq 1 4 | awk {'print $0 "user"'}
terdon
  • 242,166
alisa
  • 41
  • Please show us the actual expected output. Do the .. represent user2 and user3? Are there any other strings in the original input file? Do we need to skip those? – terdon Dec 02 '22 at 14:29
  • yes they represent user2 and user3 in new line . I updated the post. – alisa Dec 02 '22 at 14:30
  • Thanks. So there are no other lines in words.txt? All you want is to make 4 copies of every line in the file adding a number? – terdon Dec 02 '22 at 14:33
  • yes there are no other lines in words.txt and i want to add numbers from 1 to 4 on each words in new line. – alisa Dec 02 '22 at 14:35
  • Regarding awk {'print $0 "user"'}, we see that once in a while and I'm extremely curious - where did you get the idea to put the ' script delimiters inside the body of the script (i.e. inside the {...}) rather than outside of it awk '{print $0 "user"}'? Is there a book or a tutorial somewhere suggesting that's the right syntax? – Ed Morton Dec 03 '22 at 22:13

7 Answers7

5

awk in the standard toolchest is probably your best bet here.

awk -v min=1 -v max=4 -v increment=1 '
  {for (i = min; i <= max; i += increment) print $0 i}' words.txt

With GNU tools, taking inspiration from @JJoao's approach at taking the cartesion product of the lines of two files:

join -t $'\n' -j2 -o1.1,2.1 words.txt <(seq 4) | paste -d '\0' - -

Where we join words.txt and the output of seq 4 on the second field, but here as we define the field delimiter as newline, there can't be a second field, or in other words, the second field is empty for every line of both files, so we end up joining everything together.

2
sed 's/.*/&1\n&2\n&3\n&4/' words.txt

We are replacing (s command) everything on each line (.*), with the whole match (&) appearing multiple times with the literal numbers and newlines added.

Ángel
  • 3,589
1

With plain bash:

while IFS= read -r word; do printf "${word}%d\\n" {1..4}; done < words.txt

However, putting a variable in the printf format string makes it vulnerable to unexpected characters. For example:

$ cat words.txt
with \n newline
with %s directive

$ while IFS= read -r word; do printf "${word}%d\n" {1..4}; done < words.txt with newline1 with newline2 with newline3 with newline4 with 1 directive2 with 3 directive4

Backslash sequences will be interpreted, and % directives will be obeyed. To protect this, the simple one-line solution becomes:

while IFS= read -r word; do
    tmp1=${word//%/%%}
    tmp2=${tmp1//\\/\\\\}
    printf "${tmp2}%d\\n" {1..4}
done < words.txt

which outputs

with \n newline1
with \n newline2
with \n newline3
with \n newline4
with %s directive1
with %s directive2
with %s directive3
with %s directive4
glenn jackman
  • 85,964
1
% perl -nE 'chomp; for $c (1..4) { say "$_ $c"}' words.txt

Explanation:

  • perl -n .... words.txt will loop over all lines in words.txt, setting the variable $_ to the current line
  • -E '....' specifies the code to run for each line of input
    • chomp removes newline at the end of $_
    • for $c (1..4) will iterate variable $c from 1 to 4, running the code inside { .... }
      • say "$_ $c" will print our input line (word user, apple etc.) followed by space and counter $c.

so running it would result in:

user 1
user 2
user 3
user 4
apple 1
apple 2
apple 3
apple 4
banana 1
banana 2
banana 3
banana 4
Matija Nalis
  • 3,111
  • 1
  • 14
  • 27
0

If your file really is just 4 lines long you could do something simple like:

$ while read word; do seq 1 4 | awk -v w="$word" '{print w$0}'; done < words.txt 
user1
user2
user3
user4
apple1
apple2
apple3
apple4
banana1
banana2
banana3
banana4

But it is not a good idea to use the shell for things like this, so here's a native GNU awk (because it keeps the original order) solution:

$ gawk '{ words[$0] }END{for (word in words){ for(i=1;i<5;i++){printf "%s%d\n",word,i}}}' words.txt 
user1
user2
user3
user4
apple1
apple2
apple3
apple4
banana1
banana2
banana3
banana4

This awk approach needs to read the whole file into memory. Stéphane's answer is a much better solution, I recommend you use that one instead.

terdon
  • 242,166
  • @alisa you're welcome! If one of the answers here solved your issue, please take a moment and accept it by clicking on the checkmark on the left. That is the best way to express your thanks on the Stack Exchange sites. I strongly urge you to accept Stéphane's answer because it is a much better approach. – terdon Dec 02 '22 at 14:56
  • 2
    That words[$0] also means that you only get unique words. for (word in words) loops over them in some unspecified order. Note that the behaviour of read word depends on the current value of $IFS. read also does some special backslash processing if you omit -r. – Stéphane Chazelas Dec 02 '22 at 19:20
0
for  j in $(cat file.txt); do for ((z=1;z<=4;z++)); do echo "$j$z" >>final.txt; done; done

output

user1
user2
user3
user4
apple1
apple2
apple3
apple4
banana1
banana2
banana3
banana4
-2

Pipe the file to xargs and use bash to execute your command for every word in it:

$ cat words.txt | xargs -I % bash -c "seq 1 4 | awk {'print \"%\"\$0'}"
user1
user2
user3
user4
apple1
apple2
apple3
apple4
banana1
banana2
banana3
banana4

Note that you need to escape with a backslash the double quote and dollar sign characters within the command's double quotes.

woodengod
  • 493