Open, Shared, Closed — understanding how to connect data


Open Data can be used by anyone for anything for free [any-to-any]
(e.g. Creative CommonsOpen Government Licence)

Shared Data is data with a preemptive licence [many-to-many]
(e.g. ‘data as a service’ that can be used with certain restrictions. For example, ‘Smart Data‘ includes confidential information that can be shared with clear permission to authorised parties)

Closed data requires, if shared, a user-specific custom licence or contract for use [some-to-some or none]
(e.g. ‘bilateral contract’ for a specific project, or not shared at all)


Data: How can I use it? What is the value exchange?

Moving data around is easy, technically. The friction arrives with governance, legal terms, liabilities, intellectual property, rights, permissions, etc.

Everyone thinks their data is valuable—it is. But how we measure and exchange value is something we need to explore. The way we attribute value is embodied in how we license data: it’s we codify value.

When thinking about data-sharing, rather than saying just ‘open up all the data’, we recommend starting with the question ‘how am I allowed to use it’. This links us to the ‘so what’ question—what problems are we trying to solve?

We have spent many years as a community, as well as through organisations such as the Open Data Institute (of which our, Gavin Starks, was CEO), the Open Knowledge Foundation (of which he was also a non-executive director) and others, trying to ensure we all had one definition ≔ Open Data can be used by anyone for anything for free. See https://opendefinition.org. This has also been adopted across the EU in its PSI Directive.

The World Bank (http://opendatatoolkit.worldbank.org/en/essentials.html) further defines data or content as open if anyone is free to use, re-use or redistribute it, subject at most to measures that preserve provenance and openness. There are two dimensions of data openness:

  1. The data must be legally open, which means they must be placed in the public domain or under liberal terms of use with minimal restrictions.
  2. The data must be technically open, which means they must be published in electronic formats that are machine readable and non-proprietary, so that anyone can access and use the data using common, freely available software tools. Data must also be publicly available and accessible on a public server, without password or firewall restrictions. To make Open Data easier to find, most organizations create and manage Open Data catalogs.

Examples of open data include public data such as the human genome, a bus timetable or any of the 55,000 data sets here (you may be surprised that it took quite a bit of effort to get bus timetables to be open in the UK). We also defined that personal data is not open data—for many reasons—and that data is now covered by regulations such as GDPR. We made decisions that the ‘value exchange’ in making certain data open that has been funded by the taxpayer should be open because, as taxpayers, we’ve already paid for it.

Reciprocity = fair value exchange

Value comes in many forms. If we share data with you and you provide something back, we may choose to exchange it without a cash payment. Value, however, is exchanged. From an economic perspective, we can move the costs of certain value exchange to marginal cost (a cost we absorb as part of our usual business operations).

If there is reciprocity, with value flowing in both directions—even if it’s not a direct 1:1 exchange—we can see the broader benefit to the market in which we are operating and that can benefit our business through operational efficiency, risk management or opportunity generation.


Open Data, Open Software, Open Standards, Open Access, Open Markets?

The phrase “open data” is not the same as “open source” software. Open Source refers to software, not data. The open licensing of software can, for example, enable anyone to use, study, change, and distribute the code for free, including for any purpose, including any commercial purpose. Some software licenses mandate that any derivative software is also licensed in the same manner to ensure that future work is also accessible and reusable.

There are parallels with Open Data but with data, it is the information used in analysis rather than the tools used to perform the analysis.

With data, for example, the existence of the data and a description of what it contains can be described (as metadata) and this metadata can be licensed as Open Data. The underlying data may not be able to be studied, changed or distributed without an explicit contract (especially with commercially sensitive or personal data) and such access is likely to be restricted and potentially charged for. This means, in most cases, the underlying data will be Shared or Closed.

Example: the Data Spectrum* for Energy

For example, a corporate register might have an Open Data metadata description that details that it holds information like company name, company number, address, incorporation date, turnover, risk rating. Some of this underlying data may also be Open Data (e.g. company name, company number) and some may be Shared Data (e.g. turnover, risk rating) that can only be accessed under contract and is restricted in its usage.

Finally, and often confusingly, the word ‘open’ is also used with labels such as Open Standards (frameworks that are able to be studied, copied and used, either free or paid) and the phrase Open Access is used to describe systems that are ‘addressable’ on the web (e.g. using Open APIs that allow machines to talk to each other). Open Access can facilitate access to Open, Shared or Closed Data. Open Standards (such as Open Banking) mandate that Open Access to Open Data and Shared Data are implemented across a sector using the same principles and practices. In a digital-first economy, Open Standards help to create open markets that are accessible to all market participants.

* history of the Data Spectrum


Open banking is an example of market-wide reciprocity

Open Banking is a perfect example of this. Firstly, it mandates that banks publish their product information as Open Data. This makes it easier to find and analyse products that might fit our needs. The value exchange (reciprocity) is that by making it easier for users to find products that suit their needs, banks will get a better fit of customers-to-products which can increase the likelihood of having a happy customer. This is a win-win.

Open Banking also mandates that personal financial data (e.g. bank statements) can also be transferred between banks by the customer without a financial cost. But this data is not open and it’s not ‘free’. Firstly, it is either personal or commercially sensitive data, so it cannot be open.  Secondly, it is not free as there is a material cost to provide that scale of data management.

However, all the banks have agreed to this because (aside from it being regulated) there is a mutual benefit. The market as-a-whole benefits, the costs all ‘balance themselves out’ — there is reciprocity. Further, while some banks used to feel that holding on to their customer’s data was paramount, it’s not the customer-value point that they should be competing upon. Furthermore, with GDPR, the data is controlled by the end customer. 

The rules that govern this data exchange are encoded into the Open Banking Standard. It covers everything from the rights surrounding the data to the liability transfer as data flows. It is a commercially focussed framework that allows data-sharing. These rules are now both common and shared across the whole market. It effectively defines the rules for sharing in advance.

If we frame this as a ‘license’ (a set of rules) to share data then we can define Shared Data as that which has a preemptive licence for a specific use.

After this, what’s left? Either data that you don’t want to share outside of a specific group (e.g. people contracted to work for your company), or that is only shared using bilateral contracts, where each contract needs to be unique. We define Closed Data as that which requires a user-specific custom licence/contract for use, or isn’t shared at all.
For example, a bilateral contract for a specific project, or access to information enabled via an employment contract. 


Data increases in usage & value the more it is connected

Creative Commons defined a step-change in thinking. It enabled us all to say “it’s okay to use this image for free” in advance. As of May 2018, there were an estimated 1.4 billion works licensed using a CC licence.

With Shared Data, if stakeholders published their data descriptions and their licensing options per type of use (aka ‘preemptive licensing’), then other stakeholders can just access it — compliant to their respective licensing requirements. This can enable people to create different types of value exchange, including granular payment structures for different types of use.

We must also have clear Open Data descriptions of the Shared Data and how it might be used (how it is licensed).

Publishing open data that describes the shared data will enable search engines (and therefore you) to find it. If the licensing is clear, then the friction between discovery and usage is reduced.

Doing so will increase the size of the observable dataverse and help to unlock innovation while protecting the interests of individuals, organisations and countries to use it for both public and private good.

An interesting example is the UK Open Banking Standard, which preemptively defines and mandates ways to share personal and business data—it is now regulated with every UK high street bank engaged and over 300 fintech companies in its ecosystem.

You can read more about the evolution of the Data Spectrum here and more about the web of data here.