Over the lifetime of the project, the team have, on a few occasions, been questioned as to why we are developing a platform, api, linked data capability etc, rather than just ‘giving people the data’.
There are a few reasons that we have taken the approach that we have, and whilst we wouldn’t claim to have got everything right, we believe that the approach is now bearing fruit as we increase the amount of data available. This post aims to give some background around what we’ve done and why.
Data locked away within applications
Much of the data that we have, and intend to publish to Data.Parliament, only exists within some kind of legacy application within Parliament, usually some kind of SQL Server Database. It is not necessarily sitting in a spreadsheet ready to be shared directly. Rather than tackle each dataset independently, serialising it to CSV or similar, we have created an internal api via which we can publish data from an application in any file based format. These files, direct from the application are made directly available to the public in the form of ATOM feeds. Now that this process is established, we can, relatively quickly, take a Parliamentary application and set it up to publish data on an ongoing basis to DDP, without ongoing user intervention.
Creating a data platform with the api capabilities, not just a repository of data
One of the goals of the project for Parliament, is that we create a sustainable, reusable tool for creating apis both within single, and across multiple, Parliamentary datasets. This will simplify the process of creating data driven applications for users both inside and outside of Parliament, including the Parliamentary website. This simplification should mean we can provide better public web services, for less effort. This is the main reason that we are converting data to RDF, and querying it to create APIs via our ELDA instance at lda.data.parliament.uk. We were pleased to be able to provide this functionality using an open approach from the world of Linked Open Data, which might also provide benefit beyond our own internal requirements.
How we publish data
Once we have established the latest priority data set to work on, we go through the following stages.
- Analysis phase: what data exists, what format is it in, what is the quality of the data like, does it reference other Parliamentary data resources (either explicitly or implicitly).
- For all datasets, the next step is to publish it via an atom feed at api.data.parliament.uk. If the data comes from a spreadsheet then this is likely to be a csv of that spreadsheet. If the data comes from a bespoke application, then it is likely to be serialised to some kind of basic XML format.
- Linked Data conversion: Where appropriate, we will also put in place a conversion which converts data into RDF. This enables us to query the data within a dataset, and in combination with data from other datasets.
- Linked Data API creation. Finally, for data that has been converted to RDF we will create some APIs at lda.data.parliament.uk to provide access to the data in useful ways, and in a variety of formats, including JSON, XML and CSV.
Accessing data in bulk, rather than via a feed or api
As you can see from the above steps, we are publishing raw data onto data.parliament for all our datasets. However, we acknowledge that at present, if you want to get a chunk of data in one go, that the process for doing this is somewhat cumbersome: either downloading multiple items from an ATOM feed via api.data.parliament.uk, or making an api call for a large timeframe from lda.data.parliament.uk. We are working on a mechanism to provide quick access to data for set time periods quickly and simply via a single download link, watch this space for progress.