Linux Today: Linux News On Internet Time.

Harvard's Berkman Center Seeds the MediaCloud

Mar 13, 2009, 10:33 (0 Talkback[s])
(Other stories by Jennifer Zaino)

[ Thanks to Tom Dunlap for this link. ]

"From there, the story text goes into a full text search engines to retrieve specific terms or phrases, gets dumped into a database, and becomes source material for the three simple tools currently on the site to let people start playing with the service. Being able to throw text against Calais and get pretty high quantity entities and terms out of it, Zuckerman says, was a "big step forward."

"The open source and open data project runs off the Amazon cloud. The Berkman Center tried it on its own server first, but with terabyte file systems and hundreds of gigabytes of relational databases, it couldn't keep up. "It's pretty exciting that by signing up with Amazon we were able to scale massively and very quickly," Zuckerman says. The service hopes ultimately to scale to 15,000 RSS sources.

"What's currently live -- showing the top ten most mentioned terms for up to three media sources at a time, or the top ten most mentioned term for each media source that occurs in stories along with a term you specify, or a world map of each media source that indicates which countries get more coverage--is meant as just of a taste of what you can do with the data."

Complete Story

Related Stories: