Possible Duplicate:
Does anybody here have experience in automating some tasks in web applications using curl?
Here is what I need to do? Wondering what platform is most suited - easy to understand and easy to code in. I may have to outsource it as this may be way above my skill level.
Some background:
I have access to some information databases and websites through my library. The way the databases and websites are accessed is by first loading a library webpage. Entering my library card number in the dialogbox and clicking on the Submit link. Then opens the authenticated (I presume by cookies or such) webpage for the service where I want to obtain data from.
What I want to achieve:
I want to create a compilation of suitably named Pdf files in a folder. Alternatively, and preferably, would like to create one PDF file, which contains all the pages saved, which pages are hyper linked from an index page in the One PDF file.
These pages are to be sourced from multiple websites. Access to the sites is either free, or with a password or Library based access (which requires as far as I can tell screen based interaction).
Also, on one of these websites which can be accessed via the library based access, the webpage address in the address bar does not change each time I go to a different page (terrible). So the many pages that I want downloaded for offline review, do not lend themselves to a simple Wget kind of command. As far as I can tell, it needs some way on clicking the right tabs on the website, so that the page loads, and once the page loads, it needs to be printed as a PDF file with a suitable name, and compiled into the One PDF file.
Wondering what platform to use to have this mini application / script developed?
Can somebody help me decide on what platform is ideally suited for this kind of application? Ideally, I would like the solution to be function calling oriented, so that if I have to add a webpage after a month of having this developed, I do not have to go running to the developer for such "configuration" changes.
The platform does not have to be Unix, although I think using a Unix platform creates the maximum flexibility. I can run it off my Mac, or off a host online, or on my Raspberry Pi :)
Thank you!!
Update:
I just heard from an IT savvy friend that http://seleniumhq.org/ or http://scrapy.org/ may be good options. Will study them also.