Fight Image Spam With FuzzyOCR And SpamAssassin On Debian Lenny
Apr 30, 2010, 19:32 (0 Talkback[s])
(Other stories by Falko Timme)
Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
"This tutorial describes how to scan emails for image spam with
FuzzyOCR on a Debian Lenny server. FuzzyOCR is a plugin for
SpamAssassin which is aimed at unsolicited bulk mail containing
images as the main content carrier. Using different methods, it
analyzes the content and properties of images to distinguish
between normal mails (ham) and spam mails. FuzzyOCR tries to keep
the system load low by scanning only mails that have not already
been categorized as spam by SpamAssassin, thus avoiding unnecessary
"I do not issue any guarantee that this will work for you!
"1 Preliminary Note
"In this article I will use Debian Lenny for the base
"I assume that SpamAssassin is already installed and working,
with /etc/mail/spamassassin/ as its main configuration directory.
If your directory is different (e.g. if you have ISPConfig 2
installed, the directory is
this is no problem. I will annotate where to change what.
"Please make sure that your SpamAssassin version works with
FuzzyOCR. For example, the FuzzyOCR version I'm going to install
here (fuzzyocr-3.5.1-devel.tar.gz) requires SpamAssassin 3.1.4 or