http://commoncrawl.org/2016/10/news-dataset-available/ We should make sure it works with the current [common crawl source](https://github.com/commonsearch/cosr-back/blob/master/cosrlib/sources/commoncrawl.py)
http://commoncrawl.org/2016/10/news-dataset-available/
We should make sure it works with the current common crawl source