Are there any good tools besides SeleniumRC that can fetch webpages including content post-painted by JavaScript?

Question

One major shortcoming of curl is that more and more wepages are having their main piece of content painted by a JavaScript AJAX response that occurs after the initial HTTP response. curl never picks up on this post-painted content.

So to fetch these types of webpages from the command line, I've been reduced to writing scripts in Ruby that drive the SeleniumRC to fire up a Firefox instance and then return the source HTML after these AJAX calls have completed.

It would be much better to have a leaner command line solution for this type of problem. Does anyone know of any?

No one's suggested anything else on Does anybody here have experience in automating some tasks in web applications using curl?, but that question wasn't specifically asking about scraping Javascript. — Gilles 'SO- stop being evil', Apr 28 '11 at 21:54

score 3 · Answer 1 · edited Jan 28 '12 at 20:55

3

Have you considered Watir?

http://watir.com/

When you've added the package, you can run it as a standalone file or from irb, line-by-line after include 'watir-webdriver'. I've found it to be more responsive than selenium-webdriver, but without the test recording GUI to help work out complex test conditions.

edited Jan 28 '12 at 20:55

Kevin

40,767

answered Nov 21 '11 at 12:50

Rogue_Leader

31

score 2 · Accepted Answer · answered May 19 '11 at 12:50

I just recently started using the WebDriver from Selenium 2 in Java. There is a driver called HtmlUnitDriver that fully supports JavaScript but does not fire up an actual browser.

It is not a light solution but it does get the job done.

I've designed the code to run from the command line and save the web data to files.

Are there any good tools besides SeleniumRC that can fetch webpages including content post-painted by JavaScript?

2 Answers2

Linked

Related