Extracting ALL links from soundcloud page using lynx browser

Question

To extract urls from a site, it is usually enough to run:

lynx -dump -listonly https://soundcloud.com/grubstakers > urls.txt

But I get only the latest episodes, instead of the urls of all of them (along some spurious urls).

Is it possible to do this with the lynx browser or is javascript responsible for loading the rest of the links when we scroll down in a GUI browser?

Zombo · Answer 1 · 2020-02-22T21:16:42.137

0

You can use something like this:

https://api-v2.soundcloud.com/stream/users/394696287?client_id=qWUPqUOvYPTG1SDjwXJCNm9gOwM3rNeP&limit=200

That returns 146 entries, I believe thats all they have currently. For more productive artists, youll need to use pagination. Here is an example with PHP, but you can do this with any language supporting HTTP and JSON:

<?php
$s1 = 'https://api-v2.soundcloud.com/stream/users/394696287';
$s2 = http_build_query([
   'client_id' => 'qWUPqUOvYPTG1SDjwXJCNm9gOwM3rNeP',
   'limit' => 200
]);
$s3 = file_get_contents($s1 . '?' . $s2);
$o1 = json_decode($s3);
foreach ($o1->collection as $o2) {
   echo $o2->track->permalink_url, "\n";
}

edited Feb 22 '20 at 21:16

answered Feb 22 '20 at 17:55

Zombo

1
5
44
63

How would you proceed with this link? How would you extract the urls from there? – Calculus Knight Feb 22 '20 at 19:53
Thanks for the update. But how do you get he initial url given the url of the soundcloud page? – Calculus Knight Feb 23 '20 at 08:51

Extracting ALL links from soundcloud page using lynx browser

1 Answers1