"The problem of so-called 'dirty data'--data that contains
duplications, omissions or other errors--has been a serious issue
in corporate IT for many a year. Analyst firm Gartner said in March
last year that three-quarters of large enterprises will make little
to no progress towards improving data quality until 2010,
potentially costing large firms millions of dollars.
"But is there an open source solution to the problem...?"