Google News Sources Revealed
The names of Google News sources are however no longer a secret now.Krishna Bharat, who is the brain behind Google News, recently released a massive list of 150,000+ news articles around Osama Bin Laden that were published on news web sites around the world after the Abbottabad operation was over.
Since almost every publisher had covered the bin Laden story around that time, it is extremely likely that Bharat’s list contains the URLs of more or less every news source that is crawled by Google News.
Thus, here’s a complete list of Google News Sources as available in Bharat’s list.
You may download this data as a text file for offline parsing while a copy is available on Google Docs as well for online sharing. If you are curious, this human readable list of sources was prepared using the following Linux command.
cat osama_google_news.txt | grep ENGLISH | awk '{print $11}' | cut -d "/" -f 3 | sort | uniq -c | sort
No comments:
Post a Comment