sOpen Open News News Phantom JS extracting data using the DOM https://github.com/ariya/phantomjs/wiki/Page-Automation Maybe use JSDom for node.js (Max says don't use), use Cheerio. http://maxogden.com/scraping-with-node.html Article Extraction: http://tomazkovacic.com/blog/122/evaluating-text-extraction-algorithms/ https://github.com/misja/python-boilerpipe Here's the url endpoint: http://boilerpipe-web.appspot.com/extract?url= THE DATA IS HERE: http://mozilla-oonn.s3.amazonaws.com/data/ The easiest way to share localhost web servers to the rest of the world $ gem install localtunnel $ localtunnel 8000 share this url: http://xyz.localtunnel.com http://progrium.com/localtunnel/ Github https://github.com/csvsoundsystem/OpenOpenNewsNews https://github.com/pudo/newshacks https://github.com/stdbrouw/oonn Interesting things: http://churnalism.com/ http://www.mediacloud.org/dashboard/view/1?q1=94946 OONN Master Doc https://docs.google.com/document/d/1hptAkA0B7oioqypYoGVr-8mYuyV_z0eIdPwlcvBRsuw/edit?usp=sharing Important metrics: http://www.inside-r.org/node/144312