Apache Nutch 2.0 indexes at web scale | Linux Today

Apache Nutch 2.0 indexes at web scale

Written By
Web Webster
Web Webster
Jul 11, 2012

The Apache Nutch developers have announced that version 2.0 of the network crawling and indexing search framework is now available. Built on top of other Apache projects including Solr, Tika, Hadoop and Gora, Nutch has been designed to crawl “at web scale” to allow organisations to create searchable indexes of their web-published content. Nutch adds web-specific functionality to Solr with a link-graph database and uses Tika to parse web pages and a number of other document formats.

Web Webster

Web Webster

Web Webster has more than 20 years of writing and editorial experience in the tech sector. He’s written and edited news, demand generation, user-focused, and thought leadership content for business software solutions, consumer tech, and Linux Today, he edits and writes for a portfolio of tech industry news and analysis websites including webopedia.com, and DatabaseJournal.com.

Linux Today Logo

LinuxToday is a trusted, contributor-driven news resource supporting all types of Linux users. Our thriving international community engages with us through social media and frequent content contributions aimed at solving problems ranging from personal computing to enterprise-level IT operations. LinuxToday serves as a home for a community that struggles to find comparable information elsewhere on the web.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.