Using web services and XSLT to scrape RSS from HTML

After tinkering a bit with web services and XSLT-based scraping last week for generating RSS from HTML, I ripped out some work I was doing for a Java-based scraper I'd started working on last year and threw together a kit of XSLT files that does most everything I was trying to do.

I'm calling this kit XslScraper, and there's further blurbage and download links avaiable in the Wiki. Check it out. I've got shell scripts to run the stuff from as a cron job, and CGI scripts to run it all from web services.

For quick gratification, check out these feeds:

shortname=xsl_scraper

Archived Comments

ChangeLog to RSS web service  Previous Switching to a JVDS server Next