8

One major shortcoming of curl is that more and more wepages are having their main piece of content painted by a JavaScript AJAX response that occurs after the initial HTTP response. curl never picks up on this post-painted content.

So to fetch these types of webpages from the command line, I've been reduced to writing scripts in Ruby that drive the SeleniumRC to fire up a Firefox instance and then return the source HTML after these AJAX calls have completed.

It would be much better to have a leaner command line solution for this type of problem. Does anyone know of any?

dan
  • 4,077
  • 5
  • 27
  • 34

2 Answers2

3

Have you considered Watir?

http://watir.com/

When you've added the package, you can run it as a standalone file or from irb, line-by-line after include 'watir-webdriver'. I've found it to be more responsive than selenium-webdriver, but without the test recording GUI to help work out complex test conditions.

Kevin
  • 40,767
2

I just recently started using the WebDriver from Selenium 2 in Java. There is a driver called HtmlUnitDriver that fully supports JavaScript but does not fire up an actual browser.

It is not a light solution but it does get the job done.

I've designed the code to run from the command line and save the web data to files.