3

I have a file with few million lines, all the same. just for an example:

Known
Known
Known
Known
Known
Known
...

I have another file with few thousand line numbers, for an example:

3
5
6
...

I would like to know if there is a fast way to use bash command to replace these lines with another string, for example, UnKnown. Based on the example I want to generate :

Known
Known
UnKnown
Known
UnKnown
UnKnown
...
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
Sagi
  • 49

3 Answers3

2

An awk solution:

$ awk 'NR==FNR{a[$1]++;next}
       { 
        if(FNR in a){
            print "UnKnown"
        }
        else{
            print
        }
       }' nums file
Known
Known
UnKnown
Known
UnKnown
UnKnown

Explanation

  • NR==FNR{a[$1]++;next} : NR is the current line number of the input and FNR the current line number of the current file. The two will be equal only while the first file is being read. Therefore, this expression will save each line number (the first field, $1, of the first file) as a key in the array a and then skip to the next line.
  • if(FNR in a){ print "UnKnown"} : if the current file's line number was in the 1st file, print "UnKnown".
  • else {print} : if not, print the current line.
terdon
  • 242,166
2

One possibility is to filter the lines through awk. If the list of lines to change is small, pass it to awk on the command line.

awk <original.txt >modified.txt -v lines="$(cat lines-to-change.txt)" '
    BEGIN {split(lines, a); for (i in a) change[a[i]]=1}
    NR in change {$0 = "Un" $0} # or $0 = "UnKnown"
    1
'

If the number of lines to change is very small and the file to modify is very large, sed may be faster. With sed, you need to build a script containing the replacement to apply to each line.

sed "$(<lines-to-change.txt sed 's/$/s:^:Un:/')" <original.txt >modified.txt

If a significant fraction of lines need to change, the previous two approaches will run into the command line length limit. Here's a modified approach with awk that reads the two files in parallel. If lines-to-change.txt is already sorted, you can use getline n <"lines-to-change.txt" instead of "sort -n lines-to-change.txt" | getline n.

awk <original.txt >modified.txt '
    BEGIN {"sort -n lines-to-change.txt" | getline n}
    NR==n {$0 = "Un" $0; n = 0; "sort -n lines-to-change.txt" | getline n}
    1
'
2

This is a variation on Gilles' answer for the "if the number of lines to change is small" scenario. Instead of building an inline sed expression, it creates a sed script sent via stdout/stdin pipeline to sed to read with -f -. Doing so avoids any issues with a command-line length limit. You could, alternatively, save the sed script to a "temporary" file and then point sed to that instead.

The other variation I'm bringing in is sed's "c" command, which says to replace the selected line with the given text. The syntax for the "c" command is a little unusual in that it wants a backslash, newline, and then the new text.

sed 's/$/c\\\nNew String/' line-number-file | sed -f - input-file > output-file

The first sed command creates an intermediate sed script as input for the second sed by "replacing" the end of the line ($) with the "c, backslash, newline, New String" sequence:

3c\
New String
5c\
New String
6c\
New String

To change the text that it's using as a replacement, go inside the first sed section, and replace "New String" with whatever you want.

If you want to replace the text in the original input file, and your sed supports the -i flag, then you can change the command to:

sed 's/$/c\\\nNew String/' line-number-file | sed -f - -i input-file
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
  • Thanks, @don_crissti. Always worth re-reading Why is using a shell loop to process text considered bad practice?. I replaced the while/read with sed, much as Gilles did originally. I almost deleted this answer, but thought I'd leave it here for the 'c' sed variation as well as using a sed script as stdin for sed vs a possibly-limited command-line. (1/2) – Jeff Schaller Apr 03 '16 at 21:22
  • Re: doubling the lines of the input file, I haven't done enough testing to see if that matters for sed reading from a pipeline. What did you mean by using branching? That's something I haven't used much in sed, yet. (2/2) – Jeff Schaller Apr 03 '16 at 21:23
  • Don't delete it as it's OK. Passing a sed script via stdin is what I usually do :) ... As to golfing it shorter when there are many lines... well, you could write it as e.g. (replace semicolons with newlines here): 3b u;5b u;6b u;p;d;: u;c\;new string - so that means No. of lines in the original file + 5 more lines. Now, when dealing with a sed script thousands of lines long you may want to further optimize it - we're talking speed this time: mikeserv has come up with a brilliant way to make it (sed) never backtrack its script. – don_crissti Apr 03 '16 at 21:42