"From there, the story text goes into a full text
search engines to retrieve specific terms or phrases, gets dumped
into a database, and becomes source material for the three simple
tools currently on the site to let people start playing with the
service. Being able to throw text against Calais and get pretty
high quantity entities and terms out of it, Zuckerman says, was a
"big step forward."
"The open source and open data project runs off the Amazon
cloud. The Berkman Center tried it on its own server first, but
with terabyte file systems and hundreds of gigabytes of relational
databases, it couldn't keep up. "It's pretty exciting that by
signing up with Amazon we were able to scale massively and very
quickly," Zuckerman says. The service hopes ultimately to scale to
15,000 RSS sources.
"What's currently live -- showing the top ten most mentioned
terms for up to three media sources at a time, or the top ten most
mentioned term for each media source that occurs in stories along
with a term you specify, or a world map of each media source that
indicates which countries get more coverage--is meant as just of a
taste of what you can do with the data."