Lynx read urls from file and download links

Question

I have 500 urls on my file.
I need to extract all the links which appear on these urls.

How to read file with Lynx and extract links within the file ?

Sample of file.txt below, 1 link per row and 500 rows in total

https://itunes.apple.com/
https://play.google.com/

... and so on

I added sample format on my post. It's just a single ulr in each row — mrSmithrobertson, Jul 14 '16 at 13:13

score 2 · Accepted Answer · answered Jul 14 '16 at 20:41

Here's an improved script:

#!/bin/sh
cat file.txt |while read url
do
    lynx -listonly -dump "$url"
done |
awk '/^[ ]*[1-9][0-9]*\./{sub("^ [^.]*.[ ]*","",$0); print;}'| \
sort -u

allowing for any type of URL recognized by lynx (including ftp for instance). The script sorts the result, eliminates duplicates (which lynx will not do by itself).

Further reading:

The Lynx User's Guide

score 0 · Answer 2 · answered Jul 14 '16 at 14:39

0

Calling list.txt your list:

for i in $(cat list.txt) 
do 
  lynx -accept_all_cookies -dump $i |grep "http" |sed -e "s/^.*http/http/"
done

I suggest to redirect the output on some file.

answered Jul 14 '16 at 14:39

matzeri

961

Why not use lynx -listonly and omit the grep step? – JigglyNaga Jul 14 '16 at 14:41
good idea, but you need still to clear the 4 line as header – matzeri Jul 14 '16 at 14:44

Lynx read urls from file and download links

2 Answers2