Stories

•

Topics

» Data Journalism

Putting the “Open” in Open Data: Creating a Global Standard

by Steven Adler • March 3, 2015

Open Data reached a new milestone last week with publication of the first working draft of Open Data Standards by the W3C (World Wide Web Consortium). Open Data is spreading across the globe and transforming the way data is collected, published, and used. But all of this is happening without well-documented standards, leading to data published with inconsistent metadata, lacking official documentation of approval processes, corroboration of sources, and with conflicting terms of use. Often Open Data is hard to compare to other sources, even across administrative departments located in the same building. Open Data from more than one source has to be aggregated, normalized, cleansed, checked for quality, verified for authenticity, and validated for terms of use, at huge expense before it can be analyzed.

The Data on the Web Best Practices Working Group (I am a co-chair) spent the better part of six months studying 26 Open Data use cases to understand how the lack of standards is retarding the growth of an industry that should be transforming government and fueling economic growth at much higher rates. We heard direct testimony on teleconferences and webinars form Open Data leaders in cities and nations across the world. We interviewed practitioners, and compiled a dossier of Open Data challenges and issues requiring standards to move the industry forward.

In six months following our Use Case analysis, we met in San Mateo, Californiam and created the outline for the Best Practices document we published last weekIt provides the Open Data industry with recommendations and guidance for the following areas:

Metadata

What kind of metadata should be considered when describing data on the Web?
How can metadata be provided in a machine readable way?

Data Identification

How can unique re-use be provided for data resources?
How should URIs be designed and managed for persistence?

Data Formats

What kind of data formats should be considered when publishing data on the Web?

Data Vocabularies

How can existing vocabularies be used to provide semantic interoperability?
How can a new vocabulary be designed if needed?

Data Licenses

How can data licenses be made machine readable?
How can license information about data published on the Web be provided/gathered?

Data Provenance

How can data provenance information about data published on the Web be provided/gathered?

Data Quality

How can data quality information about data on the Web be provided/gathered?

Sensitive Data

How can data be published without infringing a person’s right to privacy or an organization’s security?

Data Access

What kind of data access should be considered when publishing data on the Web?
What requirements should be taken into account when deciding how to make data available on the Web?

Data Versions

How can different versions of a dataset be tracked and managed?

Data Preservation

How can publishers decide when and how data on the Web should be archived?

Feedback

How can user feedback about data consumed from the Web be gathered?

This might not look very sexy, but let me tell you: Open Data that is published using these Best Practice Standards WILL BE VERY SEXY. It will be easy to collect, verify, share, analyze, monetize, and re-produce.

It will give journalists new tools in how they verify sources and document government corruption. It will provide universities with dependable Open Data that can uncover vast new areas of social, political, organizational, and scientific research. And it will provide private companies with a vast sea of free information to power new businesses and reap impressive rewards.

It isn’t magic. It is a common sense approach to standards. We are recommending things many already know and do but not everyone does it all the time or equally well.

It is not comprehensive. We are a small team and we don’t know everything. So we invite the world to read what we have written and provide feedback. We want our ideas to spark a global dialog on how to structure Open Data Best Practices – what should be included, excluded, and refined. We welcome criticism, new ideas, and debate.

Our goal is simple. We want to change the world by making Open Data the dependable free resource that illuminates, enlightens, and transforms our planet with insight and knowledge without the pre-condition of publication for a purpose or select audience.

It should be free for all, and all for free. Dependable. Reliable. Available.

Please help us by reading our work and sending in your comments.

Steven Adler (@DataGov) is chief information strategist for IBM. He is an expert in data science and an innovator who has developed billion-dollar-revenue businesses in the areas of data governance, enterprise privacy architectures, and Internet insurance. He has advised governments and large NGOs on open government data, data standards, privacy, regulation, and systemic risk.

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Republish our articles for free, online or in print, under a Creative Commons license.

Read other stories tagged with:

data journalism open data open government world wide web consortium

Republish this article

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Material from GIJN’s website is generally available for republication under a Creative Commons Attribution-NonCommercial 4.0 International license. Images usually are published under a different license, so we advise you to use alternatives or contact us regarding permission. Here are our full terms for republication. You must credit the author, link to the original story, and name GIJN as the first publisher. For any queries or to send us a courtesy republication note, write to hello@gijn.org.

<h2>Putting the &ldquo;Open&rdquo; in Open Data: Creating a Global Standard</h2><p class="byline"> <span>by</span> <a href="https://www.linkedin.com/pub/steven-adler/0/166/759?trk=pulse-det-athr_prof-art_ftr">Steven Adler</a> <span>for Global Investigative Journalism Network</span> <span>&bull; March 3, 2015</span> </p><p><a href="https://gijn.org/wp-content/uploads/2015/02/puzzlecloud.jpg"><img class=" wp-image-4740 alignright" src="https://gijn.org/wp-content/uploads/2015/02/puzzlecloud-771x578.jpg" alt="puzzlecloud" width="302" height="226"></a>Open Data reached a new milestone last week with publication of the first <a href="http://www.w3.org/TR/dwbp/" target="_blank" rel="nofollow noopener noreferrer">working draft</a> of Open Data Standards by the&nbsp;<a href="http://www.w3.org/">W3C</a> (World Wide Web Consortium). Open Data is spreading across the globe and transforming the way data is collected, published, and used. But all of this is happening without well-documented standards, leading to data published with inconsistent metadata, lacking official documentation of approval processes, corroboration of sources, and with conflicting terms of use. Often Open Data is hard to compare to other sources, even across administrative departments located in the same building. Open Data from more than one source has to be aggregated, normalized, cleansed, checked for quality, verified for authenticity, and validated for terms of use, at huge expense before it can be analyzed.</p><aside class="module align-right half type-pull-quote">This might not look very sexy, but let me tell you: Open Data that is published using these Best Practice Standards <strong>WILL BE VERY SEXY.</strong></aside><p>The <a href="http://www.w3.org/2013/dwbp/wiki/Main_Page" target="_blank" rel="nofollow noopener noreferrer">Data on the Web Best Practices Working Group</a> (I am a co-chair) spent the better part of six months studying 26 Open Data <a href="http://www.w3.org/TR/dwbp-ucr/" target="_blank" rel="nofollow noopener noreferrer">use cases</a> to understand how the lack of standards is retarding the growth of an industry that should be transforming government and fueling economic growth at much higher rates. We heard direct testimony on teleconferences and webinars form Open Data leaders in cities and nations across the world. We interviewed practitioners, and compiled a dossier of Open Data challenges and issues requiring standards to move the industry forward.</p><p><a href="http://www.w3.org/standards/semanticweb/data"><img class=" alignright wp-image-4786" src="https://gijn.org/wp-content/uploads/2015/02/w3c.png" alt="w3c" width="249" height="142"></a>In six months following our Use Case analysis, we met in San Mateo, Californiam and created the outline for the Best Practices document we published last weekIt provides the Open Data industry with recommendations and guidance for the following areas:</p><h4>Metadata</h4><ul>
<li><em>What kind of metadata should be considered when describing data on the Web?</em></li>
<li><em>How can metadata be provided in a machine readable way?</em></li>
</ul><h4>Data Identification</h4><ul>
<li><em>How can unique re-use be provided for data resources?</em></li>
<li><em>How should URIs be designed and managed for persistence?</em></li>
</ul><h4>Data Formats</h4><ul>
<li><em>What kind of data formats should be considered when publishing data on the Web?</em></li>
</ul><h4><a href="https://gijn.org/wp-content/uploads/2015/02/data-word-cloud2.jpg"><img class=" wp-image-4739 alignright" src="https://gijn.org/wp-content/uploads/2015/02/data-word-cloud2.jpg" alt="data word cloud2" width="281" height="227"></a>Data Vocabularies</h4><ul>
<li><em>How can existing vocabularies be used to provide semantic interoperability?</em></li>
<li><em>How can a new vocabulary be designed if needed?</em></li>
</ul><h4>Data Licenses</h4><ul>
<li><em>How can data licenses be made machine readable?</em></li>
<li><em>How can license information about data published on the Web be provided/gathered?</em></li>
</ul><h4>Data Provenance</h4><ul>
<li><em>How can data provenance information about data published on the Web be provided/gathered?</em></li>
</ul><h4>Data Quality</h4><ul>
<li><em>How can data quality information about data on the Web be provided/gathered?</em></li>
</ul><h4><a href="https://gijn.org/wp-content/uploads/2015/02/data-security3.jpg"><img class=" wp-image-4788 alignright" src="https://gijn.org/wp-content/uploads/2015/02/data-security3.jpg" alt="data security3" width="256" height="186"></a>Sensitive Data</h4><ul>
<li><em>How can data be published without infringing a person's right to privacy or an organization's security?</em></li>
</ul><h4>Data Access</h4><ul>
<li><em>What kind of data access should be considered when publishing data on the Web?</em></li>
<li><em>What requirements should be taken into account when deciding how to make data available on the Web?</em></li>
</ul><h4>Data Versions</h4><ul>
<li><em>How can different versions of a dataset be tracked and managed?</em></li>
</ul><h4>Data Preservation</h4><ul>
<li><em>How can publishers decide when and how data on the Web should be archived?</em></li>
</ul><h4>Feedback</h4><ul>
<li><em>How can user feedback about data consumed from the Web be gathered?</em></li>
</ul><p>This might not look very sexy, but let me tell you: Open Data that is published using these Best Practice Standards <strong>WILL BE VERY SEXY.</strong> It will be easy to collect, verify, share, analyze, monetize, and re-produce.</p><aside class="module align-right half type-pull-quote">Our goal is simple. We want to change the world by making Open Data dependable, reliable, available, and free.</aside><p>It will give journalists new tools in how they verify sources and document government corruption. It will provide universities with dependable Open Data that can uncover vast new areas of social, political, organizational, and scientific research. And it will provide private companies with a vast sea of free information to power new businesses and reap impressive rewards.</p><p>It isn't magic. It is a common sense approach to standards. We are recommending things many already know and do but not everyone does it all the time or equally well.</p><p><a href="https://gijn.org/wp-content/uploads/2015/02/data-circle.png"><img class=" wp-image-4741 alignright" src="https://gijn.org/wp-content/uploads/2015/02/data-circle.png" alt="data circle" width="242" height="237"></a>It is not comprehensive. We are a small team and we don't know everything. So we invite the world to read what we have written and provide feedback. We want our ideas to spark a global dialog on how to structure Open Data Best Practices - what should be included, excluded, and refined. We welcome criticism, new ideas, and debate.</p><p>Our goal is simple. We want to change the world by making Open Data the dependable free resource that illuminates, enlightens, and transforms our planet with insight and knowledge without the pre-condition of publication for a purpose or select audience.</p><p>It should be free for all, and all for free. Dependable. Reliable. Available.</p><p>Please help us by reading <a href="http://www.w3.org/TR/dwbp/" target="_blank" rel="nofollow noopener noreferrer">our work</a> and sending in your comments.</p><hr><p><a href="https://gijn.org/wp-content/uploads/2014/09/Adler.jpg"><img class="alignleft size-full wp-image-3890" src="https://gijn.org/wp-content/uploads/2014/09/Adler.jpg" alt="Adler" width="200" height="200"></a><em><a href="https://www.linkedin.com/pub/steven-adler/0/166/759">Steven Adler</a>&nbsp;(<a href="https://twitter.com/DataGov">@DataGov</a>) is chief information strategist for IBM. He is an expert in data science and an innovator who has developed billion-dollar-revenue businesses in the areas of data governance, enterprise privacy architectures, and Internet insurance. He has advised governments and large NGOs on open government data, data standards, privacy, regulation, and systemic risk.&nbsp;</em></p><p>
	This <a target="_blank" href="https://gijn.org/stories/putting-the-open-in-open-data-creating-a-global-standard/">article</a> first appeared on <a target="_blank" href="https://gijn.org">Global Investigative Journalism Network</a> and is republished here under a Creative Commons license.
	<img id="republication-tracker-tool-source" src="https://gijn.org/?republication-pixel=true&amp;post=657947&amp;ga=UA-21528033-17">
</p>

How the Hindu Is Embedding AI Into Its Data Journalism

by Neha Gupta • April 10, 2026

LLMs are quietly reshaping data journalism workflows at The Hindu, helping reporters process vast document sets, write scripts and build interactive tools.

Data Journalism

Developing a Data State Of Mind: Key Tips for Editors

by Hanna Duggal • April 3, 2026

Data is woven into how journalists cover everything from local government spending to global climate change patterns, but for editors without a specialist background, it can be daunting.

Data Journalism

2026 Sigma Awards for Data Journalism Open for Entries – Deadline Extended

by GIJN Staff • November 20, 2025

The Sigma Awards celebrate the best data journalism from around the world. Submissions are now open for data projects published in 2025.

Data Journalism

From Data to Storytelling: Concept and Design Tips from the Financial Times’ John Burn-Murdoch

by Hanna Duggal • June 20, 2025

The chief data reporter for the Financial Times discusses how he considers the use of text, color, and annotation to aid visual storytelling through charts and graphics.

Accessibility Settings

text size

color options

reading tools

other

Stories

Topics

Putting the “Open” in Open Data: Creating a Global Standard

Metadata

Data Identification

Data Formats

Data Vocabularies

Data Licenses

Data Provenance

Data Quality

Sensitive Data

Data Access

Data Versions

Data Preservation

Feedback

Read other stories tagged with:

Republish this article

Read Next

Data Journalism

How the Hindu Is Embedding AI Into Its Data Journalism

Data Journalism

Developing a Data State Of Mind: Key Tips for Editors

Data Journalism

2026 Sigma Awards for Data Journalism Open for Entries – Deadline Extended

Data Journalism

From Data to Storytelling: Concept and Design Tips from the Financial Times’ John Burn-Murdoch

Stories

Topics

Putting the “Open” in Open Data: Creating a Global Standard

Related Resources

Step-By-Step Guide for Journalists on the Basics of Google Sheets

Tipsheet for Using Ocean Data in Your Investigations

No Coding Required: A Step-by-Step Guide to Scraping Websites With Data Miner

GIJC23 – The Future of Data Journalism: New Analytical Tools, Data Visualization, and AI

Share

Metadata

Data Identification

Data Formats

Data Vocabularies

Data Licenses

Data Provenance

Data Quality

Sensitive Data

Data Access

Data Versions

Data Preservation

Feedback

Related Resources

Step-By-Step Guide for Journalists on the Basics of Google Sheets

Tipsheet for Using Ocean Data in Your Investigations

No Coding Required: A Step-by-Step Guide to Scraping Websites With Data Miner

GIJC23 – The Future of Data Journalism: New Analytical Tools, Data Visualization, and AI

Related Stories

How the Hindu Is Embedding AI Into Its Data Journalism

Developing a Data State Of Mind: Key Tips for Editors

2026 Sigma Awards for Data Journalism Open for Entries – Deadline Extended

From Data to Storytelling: Concept and Design Tips from the Financial Times’ John Burn-Murdoch

Read other stories tagged with:

Republish this article

Read Next

Data Journalism

How the Hindu Is Embedding AI Into Its Data Journalism

Data Journalism

Developing a Data State Of Mind: Key Tips for Editors

Data Journalism

2026 Sigma Awards for Data Journalism Open for Entries – Deadline Extended

Data Journalism

From Data to Storytelling: Concept and Design Tips from the Financial Times’ John Burn-Murdoch