Sir Tim Berners-Lee reckons he's glimpsed the future of journalism – and given he's the person who invented the world wide web, you might not want to bet against him.
In his view, it lies with journalists who know their CSV from their RDF, can throw together some quick MySQL queries for a PHP or Python output … and discover the story lurking in datasets released by governments, local authorities, agencies, or any combination of them – even across national borders.
That's because he thinks the future lies in analysing data. Lots of data. Speaking on Friday at the launch of the first government datasets for spending by departments of more than £25,000, he was asked who will analyse them once the geeks have moved on. What's the point? Who's really going to hold government, or anyone else, accountable?
"The responsibility needs to be with the press," Berners-Lee responded firmly. "Journalists need to be data-savvy. It used to be that you would get stories by chatting to people in bars, and it still might be that you'll do it that way some times.
"But now it's also going to be about poring over data and equipping yourself with the tools to analyse it and picking out what's interesting. And keeping it in perspective, helping people out by really seeing where it all fits together, and what's going on in the country."
If that sounds like a daunting prospect, then it's worth considering that hardly any of the journalism courses today teach any sort of data analysis – not even its simplest form, statistics.
But that might be changing. Earlier this month City University launched its MA in interactive journalism, led by Jonathan Hewett and Paul Bradshaw, which will teach "data journalism" as part of its curriculum – "sourcing, reporting and presenting stories through data-driven journalism, and visualising and presenting data (including databases, mapping and other interactive graphics)."
[. . .]
But it is probably only the beginning – and it is likely that journalists won't be the first who really dig into the data with most effect. Although the Guardian, Telegraph and Times all have data teams who aim to find stories in big datasets, such as the Guardian's geotagged coverage of the Wikileaks documents from Afghanistan and Iraq, or the Telegraph's analysis of the London Bike Hire scheme, "Most of the innovation is happening outside news organisations," Bradshaw says. "Sites like Openly Local, Charities Direct, Who's Lobbying?, Where Does My Money Go? and Scraperwiki. They're all hiding their light under a bushel. All doing great things."
But how long will it take for the methods of data journalism – where CSV (comma-separated value files, a form that any database or spreadsheet program) and RDF (Resource Description Framework, a way of linking different data sets) and MySQL (a free, open source database program able to cope with tiny or huge datasets) and PHP (a programming language widely used to write web pages) and Python (another web programming language) are part of the landscape – to filter through to everyday use in journalism? As William Gibson observed of the future, it's here already, just not very evenly distributed. Bradshaw says that the Press Association is "definitely interested" and magazine publishers also want to adopt data journalism techniques.
Tony Blair, it should be noted, denounced the United Kingdom's Freedom of Information Act in his memoirs, denouncing it as something that allowed outsiders (not the people at large) access to sensitive information which, if leaked, could harm policy-making. Britain's expense scandal is a case in point. I wonder if that particular denunciation but that particular person is justification for Berners-Lee's arguments.