---

Content mining with Apache Tika

Wazi: Apache Tika is a content-mining library that allows you to pull both metadata and text content out of documents of many different types.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends, & analysis