This is an easy and direct way to get what you want. The problem here is that the entire large_file.txt
will be scanned. If this is too slow there are other things to try. One of which would be loading the file into a database keyed on the line numbers, which would give extremely fast retrieval compared to scanning the file.
#!/bin/sh
awk '
NR == FNR {
for (i=1; i<=NF; i++) {
linenums[$i]
}
}
NR != FNR {
if (FNR in linenums) {
print
}
}
' line_numbers.txt large_file.txt
NR
is the current record number (Number of Records), and FNR
is the current record number within the current file.
So when NR == NFR
awk is processing the first file arg, when NR != NFR
awk
is processing the second (or later) file.
This reads all of the line numbers from line_numbers.txt
and stores them as keys into an array with no data elements, only keys (the linenums
array).
When the second file, large_file.txt
, is being read, if the current record number has been stored as a key in the array linenums
, then the line from large_file.txt
will be printed.
The method of looking up the line numbers in the linenums
array is relatively fast because awk
uses an internal hashing algorithm to lookup the keys.