After attending a lecture at University of San Francisco by Jonathan Reichental (@Reichental) on the use of open data in the public sector, I started poking around some data sets available at Data.gov.
Data.gov is pretty impressive. The site was established in 2009 by Vivek Kundra, the first person with the title “Federal CIO” of the United States, appointed by Barack Obama. It is rapidly adding data sets; sixty-four thousand data sets have been added just in the last year.
Interestingly, there is an open-source version of data.gov itself, called the open government platform. It is built on Drupal and available on github. The initiative is spear-headed by the US and the Indian governments, to help promote transparency and citizen engagement by making data widely and easily available. Awesome.
- The data was easy to find. I downloaded it without declaring who I am or why I’m downloading the data, and I didn’t have to wait for any approval.
- The data was well-formatted and trivially easy to digest using python pandas.
- Ipython notebook and data source available below.
- iPynb (Python code to import data and generate plot): http://nbviewer.ipython.org/6417852
- Data from: http://wonder.cdc.gov/cancer.html
If you’re interested in this data, you should also check out http://statecancerprofiles.cancer.gov/ , which I didn’t know existed until I started writing this post. I was able to retrieve this map from there: