CKAN on Windows

CKAN installs on a range of Operating Systems, but no-one had tried it on Windows until I had a go for a customer last month. Of course, not everyone wants it on a *nix server and having had so much experience deploying CKAN for OKF and a range of customers, I was keen to have a crack. Here is the successful result:

The CKAN code simply checks out of GitHub and runs on standard Python v2.5-v2.7, so no worries there. It was the libraries and external tools that made it a challenge! For each python library I started off trying simply installing from source. A few times this ran into problems with C compiler dependencies and options. So to make things easier for installers I hunted down pre-built binaries for these problem ones.

Overall it installed on Windows in several different ways and I wrote an install guide with the easiest path here: http://wiki.ckan.org/CKAN_install_on_Windows

Linked Open Data and CKAN

publicdata.eu

Yesterday I made an appearance for the OKF at a meeting in Vienna of ‘LOD2’, a European project bringing together government data as linked data. CKAN is the platform being used and OKF’s contribution has included setting up site publicdata.eu to harvest metadata from governments across Europe.

A CKAN site I setup for Czech Republic

It was great to meet up with many people I’ve been involved, such as the Serbian CKAN being run by a team at the Pupin Institute, and the open government advocates in Czech Republic, for whom I setup the Czech CKAN. It is always inspiring to hear great work being done in so many countries, and our data catalogue technology is making a real difference to government transparency, as well as economies benefiting from open data.

I was provided technical answers during the session led by OKF colleagues Ira Bolychevsky and Mark Wainwright (who also blogged the session). Some particular interest in the new Elastic Search functionality, and plenty of challenging requests, such as the desire for CKAN to provide a data wrangling workbench. This was a great forum full of talented minds – a great place to develop these ideas.

Train timetables opened

Mobile technology is of course at it’s best when travelling, so what better iPhone/Android app than one that helps you get somewhere. Train journey planners are an obvious one, but through various legal wrangles, the UK train companies have been holding onto the rights to their timetables, live departure boards and fare information for years. So it is great to hear that the National rail timetables have finally been forced into the open – not just to read, but to redistribute, remix and make apps out of, because crucially they are licenced under ‘Creative Commons Attribution’ (CC-BY).

It started earlier this month, when the Association of Train Operating Companies (ATOC) started releasing weekly updates to the timetables, each one a massive 40Mb in size. It was quite a hunt around their website to find out all the details for data wranglers, such as the name of the licence, but I’ve added what I’ve found out onto thedatahub.org: http://thedatahub.org/dataset/uk-rail-timetables

What is really pleasing is that within days of the data release, a very polished web application is up and running: opentraintimes.com – superb work Peter Hicks!

Let’s just see what it took to get there. There was a really enlightening discussion on the UK Government Data email list, June 2010:

Discussions about the “National Rail Enquiries” iPhone app costing £4.99 developed by independent developer Agant who had obtained a license for the data from NRE. Unlicenced competitor apps (mostly free) were threated with legal action within days of the launch of the expensive app.

Tom Hughes: “they should really be begging to give the data away as surely all it can do is drive more traffic to their trains if people are able to choose how they want to interact with timetable data instead of being force into a small set of officially authorised ways to access it.”

Chris Gutteridge highlighted this official exchange in government between MPs:

Tom Watson (MP, Labour):
“Does the new, post-bureaucratic age of transparency extend to a commitment to publish bus and rail timetables in digital format for open public reuse?”

Theresa Villiers (Minister of State (Rail and Aviation), Transport; Conservative):
“We are looking at that issue at the moment. I think there are considerable benefits to be gained from a more open approach to timetabling, and I would be delighted to have a discussion”

There were also these interesting emails:

Prof Nigel Shadbolt (UK Transparency Board)
“We very much have this in view”

Scott Wilcox:
“I’ve had this discussion plenty of times with Transport Direct and they never really seem to want to open any data up or comply. They have bus, train and road data for most regions but again, getting them to open it up will be difficult.”

CKAN developers

I thought it would be fun to see which developers have been contributing to the CKAN releases and to which CKAN extensions (thanks to gitstats / git shortlog). Lots of new people are involved recently :-)
Here is the core CKAN repo, by recent release numbers:
Name Date Commits Authors
master 2012-01-26 421 David Read (95), John Glover (71), Ian Murray (61), Tom Rees (48), Ross Jones (42), amercader (24), kindly (23), Sean Hammond (18), rgrp (15), Rufus Pollock (14), rossjones (6), Adrià Mercader (2), Stefano Costa (1), zephod (1)
ckan-1.5.1 2012-01-04 208 David Read (69), John Glover (38), zephod (28), David Raznick (24), rgrp (13), amercader (11), Adrià Mercader (7), kindly (5), Tom Rees (5), Ian Murray (5), James Gardner (2), Ross Jones (1)
ckan-1.5 2011-11-09 574 zephod (157), David Read (137), rgrp (119), John Glover (79), Adrià Mercader (44), David Raznick (22), james.gardner@okfn.org (6), Friedrich Lindenberg (6), dread (3), Florian Marienfeld (1)
ckan-1.4.3 2011-09-13 129 David Read (54), Adrià Mercader (27), John Glover (18), David Raznick (15), Anna PS (10), zephod (2), rgrp (2), Friedrich Lindenberg (1)
ckan-1.4.2 2011-08-05 121 David Read (57), David Raznick (47), rgrp (5), John Glover (5), Friedrich Lindenberg (5), Adrià Mercader (2)
ckan-1.4.1 2011-06-27 108 David Read (37), John Lawrence Aspden (26), David Raznick (26),james.gardner@okfn.org (7), Friedrich Lindenberg (6), rgrp (5), Adrià Mercader (1)
And of course there are extensions too. Here are some of those recently worked on.
Author Commits (%) + lines – lines First commit Last commit Age Active days # by commits
David Read 162 (44.14%) 11772 6649 2011-02-03 2012-01-26 357 days, 6:17:26 76 1
david read 95 (25.89%) 12043 3624 2010-08-12 2011-02-01 172 days, 17:08:17 47 2
james gardner 35 (9.54%) 1968 320 2010-11-15 2011-12-06 386 days, 2:23:08 19 3
Adrià Mercader 32 (8.72%) 713 362 2011-03-10 2011-11-14 249 days, 1:24:54 12 4
Ian Murray 17 (4.63%) 258 49 2011-11-03 2012-01-16 74 days, 1:03:16 7 5
Author Commits (%) + lines – lines First commit Last commit Age Active days # by commits
Adrià Mercader 115 (78.77%) 9610 7222 2011-03-09 2011-11-23 258 days, 16:12:21 44 1
David Read 8 (5.48%) 645 29 2011-03-25 2012-01-25 305 days, 18:34:33 6 2
amercader 7 (4.79%) 145 35 2012-01-10 2012-01-24 14 days, 2:09:35 2 3

ckanext-qa:

Author Commits (%) + lines – lines First commit Last commit Age Active days # by commits
John Glover 147 (74.24%) 15059 15066 2011-07-05 2012-01-24 202 days, 23:05:13 26 1
Wayne Witzel III 29 (14.65%) 2224 674 2011-02-08 2011-03-10 30 days, 21:20:01 12 2
james.gardner@okfn.org 6 (3.03%) 270 228 2011-02-21 2011-04-14 52 days, 2:39:34 4 3
Adrià Mercader 6 (3.03%) 165 37 2011-03-16 2011-03-17 21:57:39 2 4

ckanext-storage:

Author Commits (%) + lines – lines First commit Last commit Age Active days # by commits
rgrp 32 (78.05%) 1630 701 2011-04-25 2011-09-14 142 days, 3:55:32 15 1
ww 5 (12.20%) 218 19 2011-01-09 2011-01-20 10 days, 6:39:29 3 2
Rufus Pollock 2 (4.88%) 7 2 2012-01-25 2012-01-25 0:05:09 1 3

ckanext-spatial:

Author Commits (%) + lines – lines First commit Last commit Age Active days # by commits
Adrià Mercader 22 (84.62%) 4799 1272 2011-04-11 2011-11-30 233 days, 0:17:42 12 1
james gardner 2 (7.69%) 1587 1576 2011-04-20 2011-04-20 0:14:05 1 2
amercader 2 (7.69%) 20 3 2012-01-20 2012-01-20 1:15:33 1 3

recline:

Author Commits (%) + lines – lines First commit Last commit Age Active days # by commits
Max Ogden 77 (43.75%) 14671 6824 2011-06-16 2011-11-04 141 days, 8:47:00 22 1
Rufus Pollock 44 (25.00%) 4635 1331 2012-01-05 2012-01-26 21 days, 14:32:39 10 2
rgrp 39 (22.16%) 16124 10270 2011-10-24 2011-12-15 52 days, 0:13:15 14 3
maxogden 15 (8.52%) 1072 141 2011-03-09 2011-04-30 51 days, 21:34:24 6 4

Wrangling political donation data

Data wrangling is great fun to do when you get some interesting data. The process of taking the raw numbers and producing an analysis, graph, visualisation or article is an essential part of the Open Data process. If governments release data which no-one does anything with, then the whole exercise is pointless! Luckily datasets tend to have real political meaning or they wouldn’t be collected.

This week whilst presenting on CKAN at the Open Government Data Camp in Poland I wanted some example data to demonstrate CKAN’s upload facility. Rather than take the easy route of the thousands of links to data on thedatahub.org or data.gov.uk, I thought I’d seek out something from the news that week that struck me.

The newspapers had a great story about the UK’s right-wing political party getting huge donations from the financial sector. I found the raw data in a news item from the Bureau of Investigative Journalism. I cut and pasted the numbers into a spreadsheet table, having to make a couple of decisions how to convert the lists into a table. But it didn’t take long and was easy to export as nicely open CSV (well it gets 3 stars in Tim Berners-Lee’s Five Stars of Openness).

I uploaded the CSV to thedatahub.org and you can see it here It was then easy to plot as a pie chart to highlight the donors, with Financial Services dwarfing influence from the next biggest sectors: Industry, Engineering and Real Estate:

Conservative Party donations by sector 2010/11

Sector Donation (£)
Financial services 4492789
Industry 913411
Automotive, Aviation and Transport 689257
Real Estate 617666
Retail 551100
Creative Media and Arts 271167
Energy 252600
Construction 224642
Leisure 220195
Services 205000
Technology 143550
Lobbying and PR (including City lobbyists) 62800
Pharmaceuticals 58500
Health 37250
Political 28000
Agriculture 27250
Education 26000

Of course, this had already been done in the newspaper. But it would be great to compare these donations with other parties and countries. I noticed on theDataHub there are details of New Zealand donations but it names individuals and companies – no sectors. So to compare them, more work is needed. I could try Open Corporates for details of the companies, but the individuals are probably untraceable.