0

To extract urls from a site, it is usually enough to run:

lynx -dump -listonly https://soundcloud.com/grubstakers > urls.txt

But I get only the latest episodes, instead of the urls of all of them (along some spurious urls).

Is it possible to do this with the lynx browser or is javascript responsible for loading the rest of the links when we scroll down in a GUI browser?

1 Answers1

0

You can use something like this:

https://api-v2.soundcloud.com/stream/users/394696287?client_id=qWUPqUOvYPTG1SDjwXJCNm9gOwM3rNeP&limit=200

That returns 146 entries, I believe thats all they have currently. For more productive artists, youll need to use pagination. Here is an example with PHP, but you can do this with any language supporting HTTP and JSON:

<?php
$s1 = 'https://api-v2.soundcloud.com/stream/users/394696287';
$s2 = http_build_query([
   'client_id' => 'qWUPqUOvYPTG1SDjwXJCNm9gOwM3rNeP',
   'limit' => 200
]);
$s3 = file_get_contents($s1 . '?' . $s2);
$o1 = json_decode($s3);
foreach ($o1->collection as $o2) {
   echo $o2->track->permalink_url, "\n";
}
Zombo
  • 1
  • 5
  • 44
  • 63