"I came across Carlos Perez's blog, manageability.org, while
Googling for some research today. Carlos had a great list of open
source web crawlers that included JSpider, a tool I have used for
error checking on web sites.
"JSpider is written entirely in Java and can be configured
extensively for spidering, error checking and downloading. It of
course obeys robots.txt files and additional options included in