3
  1. I have 500 urls on my file.
  2. I need to extract all the links which appear on these urls.

How to read file with Lynx and extract links within the file ?

Sample of file.txt below, 1 link per row and 500 rows in total

https://itunes.apple.com/
https://play.google.com/

... and so on

KM.
  • 2,224

2 Answers2

2

Here's an improved script:

#!/bin/sh
cat file.txt |while read url
do
    lynx -listonly -dump "$url"
done |
awk '/^[ ]*[1-9][0-9]*\./{sub("^ [^.]*.[ ]*","",$0); print;}'| \
sort -u

allowing for any type of URL recognized by lynx (including ftp for instance). The script sorts the result, eliminates duplicates (which lynx will not do by itself).

Further reading:

Thomas Dickey
  • 76,765
0

Calling list.txt your list:

for i in $(cat list.txt) 
do 
  lynx -accept_all_cookies -dump $i |grep "http" |sed -e "s/^.*http/http/"
done

I suggest to redirect the output on some file.

matzeri
  • 961