I want to automate a task which can only be done on a website (with prior login) on my debian server. There is no public API available, so I can't use one.
Is there a way to do so? I thought about a text-based browser or something similar.
I want to automate a task which can only be done on a website (with prior login) on my debian server. There is no public API available, so I can't use one.
Is there a way to do so? I thought about a text-based browser or something similar.
Have a look at WWW::Mechanize (Examples at http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize/Examples.pod). It takes your webpage as object and makes all elements accessible via methods.
For example
$m->get("https://lists.ccs.neu.edu/bin/admindb/$listname");
$m->set_visible( $password );
$m->click;
There are ports for (al least) ruby and python, too.
Live HTTP Headers
plugin to record a session between your browser and the website, so that you understand the full path interaction, too. What cookies do you need to store? Present? What forms are called with what variables and hidden variables, etc. Once you have all that, it should be achievable to automate the task.
– Tim Kennedy
May 24 '13 at 12:30
You can run Selenium on a headless installation on your server, e.g. by programming the actions in python using pyvirtualdisplay.
pyvirtualdisplay
allows you to use a xvfb
, xepher
or xvnc
screen so you can do screenshot (or take a remote peek to see what is going on).
On Ubuntu 12.04 install:
sudo apt-get install python-pip tightvncserver xtightvncviewer
sudo pip install selenium pyvirtualdisplay
and run the following (this is using the newer Selenium2 API, the older API is still available as well):
import subprocess
from pyvirtualdisplay import Display
from selenium import webdriver
def browse_it(port=None):
browser = webdriver.Firefox()
page = browser.get('http://unix.stackexchange.com/questions')
for question in browser.find_elements_by_class_name('question-hyperlink'):
print question.text
if port:
print '--------\nconnect using:\n vncviewer ' + \
'localhost:{}\nand click the xmessage to quit'.format(port)
subprocess.call(['xmessage', 'click to quit'])
browser.quit()
def browse_it_hidden(rfbport=5904):
with Display(backend='xvnc', rfbport=str(rfbport)) as disp:
browse_it(rfbport)
if __name__ == '__main__':
browse_it_hidden()
The xmessage
prevents the browser to quit, in testing environments you would not want this. You can also call browse_it()
directly to test in the foreground.
The results of Selenium's find_element.....()
do not provide things like selecting the parent element of an element you just found. Something that you might expect from HTML parsing packages (I read somewhere this is on purpose).
These limitations can be kind of hassle if you do scraping of pages you have no control over. When testing your own site, just make sure you generate all of the elements that you want to test with an id
or unique class
so they can be selected without hassle.
You could use either of:
Basically any language that lets you query a networked resource would do...
curl
,wget
etc, to interact with a website in just about any way you need. The only caveat of course would be with CAPTCHAs. – lynks May 24 '13 at 13:03