Apache Nutch 2.0 indexes at web scale

The Apache Nutch developers have announced that version 2.0 of the network crawling and indexing search framework is now available. Built on top of other Apache projects including Solr, Tika, Hadoop and Gora, Nutch has been designed to crawl “at web scale” to allow organisations to create searchable indexes of their web-published content. Nutch adds web-specific functionality to Solr with a link-graph database and uses Tika to parse web pages and a number of other document formats.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends, & analysis