Stories

•

Topics

» Data Journalism

Putting the “Open” in Open Data: Creating a Global Standard

by Steven Adler • March 3, 2015

Open Data reached a new milestone last week with publication of the first working draft of Open Data Standards by the W3C (World Wide Web Consortium). Open Data is spreading across the globe and transforming the way data is collected, published, and used. But all of this is happening without well-documented standards, leading to data published with inconsistent metadata, lacking official documentation of approval processes, corroboration of sources, and with conflicting terms of use. Often Open Data is hard to compare to other sources, even across administrative departments located in the same building. Open Data from more than one source has to be aggregated, normalized, cleansed, checked for quality, verified for authenticity, and validated for terms of use, at huge expense before it can be analyzed.

The Data on the Web Best Practices Working Group (I am a co-chair) spent the better part of six months studying 26 Open Data use cases to understand how the lack of standards is retarding the growth of an industry that should be transforming government and fueling economic growth at much higher rates. We heard direct testimony on teleconferences and webinars form Open Data leaders in cities and nations across the world. We interviewed practitioners, and compiled a dossier of Open Data challenges and issues requiring standards to move the industry forward.

In six months following our Use Case analysis, we met in San Mateo, Californiam and created the outline for the Best Practices document we published last weekIt provides the Open Data industry with recommendations and guidance for the following areas:

Metadata

What kind of metadata should be considered when describing data on the Web?
How can metadata be provided in a machine readable way?

Data Identification

How can unique re-use be provided for data resources?
How should URIs be designed and managed for persistence?

Data Formats

What kind of data formats should be considered when publishing data on the Web?

Data Vocabularies

How can existing vocabularies be used to provide semantic interoperability?
How can a new vocabulary be designed if needed?

Data Licenses

How can data licenses be made machine readable?
How can license information about data published on the Web be provided/gathered?

Data Provenance

How can data provenance information about data published on the Web be provided/gathered?

Data Quality

How can data quality information about data on the Web be provided/gathered?

Sensitive Data

How can data be published without infringing a person’s right to privacy or an organization’s security?

Data Access

What kind of data access should be considered when publishing data on the Web?
What requirements should be taken into account when deciding how to make data available on the Web?

Data Versions

How can different versions of a dataset be tracked and managed?

Data Preservation

How can publishers decide when and how data on the Web should be archived?

Feedback

How can user feedback about data consumed from the Web be gathered?

This might not look very sexy, but let me tell you: Open Data that is published using these Best Practice Standards WILL BE VERY SEXY. It will be easy to collect, verify, share, analyze, monetize, and re-produce.

It will give journalists new tools in how they verify sources and document government corruption. It will provide universities with dependable Open Data that can uncover vast new areas of social, political, organizational, and scientific research. And it will provide private companies with a vast sea of free information to power new businesses and reap impressive rewards.

It isn’t magic. It is a common sense approach to standards. We are recommending things many already know and do but not everyone does it all the time or equally well.

It is not comprehensive. We are a small team and we don’t know everything. So we invite the world to read what we have written and provide feedback. We want our ideas to spark a global dialog on how to structure Open Data Best Practices – what should be included, excluded, and refined. We welcome criticism, new ideas, and debate.

Our goal is simple. We want to change the world by making Open Data the dependable free resource that illuminates, enlightens, and transforms our planet with insight and knowledge without the pre-condition of publication for a purpose or select audience.

It should be free for all, and all for free. Dependable. Reliable. Available.

Please help us by reading our work and sending in your comments.

Steven Adler (@DataGov) is chief information strategist for IBM. He is an expert in data science and an innovator who has developed billion-dollar-revenue businesses in the areas of data governance, enterprise privacy architectures, and Internet insurance. He has advised governments and large NGOs on open government data, data standards, privacy, regulation, and systemic risk.

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Republish our articles for free, online or in print, under a Creative Commons license.

Read other stories tagged with:

data journalism open data open government world wide web consortium

Republish this article

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Material from GIJN’s website is generally available for republication under a Creative Commons Attribution-NonCommercial 4.0 International license. Images usually are published under a different license, so we advise you to use alternatives or contact us regarding permission. Here are our full terms for republication. You must credit the author, link to the original story, and name GIJN as the first publisher. For any queries or to send us a courtesy republication note, write to hello@gijn.org.

<h2>Putting the &ldquo;Open&rdquo; in Open Data: Creating a Global Standard</h2><p class="byline"> <span>by</span> <a href="https://www.linkedin.com/pub/steven-adler/0/166/759?trk=pulse-det-athr_prof-art_ftr">Steven Adler</a> <span>for Global Investigative Journalism Network</span> <span>&bull; March 3, 2015</span> </p><p><a href="https://gijn.org/wp-content/uploads/2015/02/puzzlecloud.jpg"><img class=" wp-image-4740 alignright" src="https://gijn.org/wp-content/uploads/2015/02/puzzlecloud-771x578.jpg" alt="puzzlecloud" width="302" height="226"></a>Open Data reached a new milestone last week with publication of the first <a href="http://www.w3.org/TR/dwbp/" target="_blank" rel="nofollow noopener noreferrer">working draft</a> of Open Data Standards by the&nbsp;<a href="http://www.w3.org/">W3C</a> (World Wide Web Consortium). Open Data is spreading across the globe and transforming the way data is collected, published, and used. But all of this is happening without well-documented standards, leading to data published with inconsistent metadata, lacking official documentation of approval processes, corroboration of sources, and with conflicting terms of use. Often Open Data is hard to compare to other sources, even across administrative departments located in the same building. Open Data from more than one source has to be aggregated, normalized, cleansed, checked for quality, verified for authenticity, and validated for terms of use, at huge expense before it can be analyzed.</p><aside class="module align-right half type-pull-quote">This might not look very sexy, but let me tell you: Open Data that is published using these Best Practice Standards <strong>WILL BE VERY SEXY.</strong></aside><p>The <a href="http://www.w3.org/2013/dwbp/wiki/Main_Page" target="_blank" rel="nofollow noopener noreferrer">Data on the Web Best Practices Working Group</a> (I am a co-chair) spent the better part of six months studying 26 Open Data <a href="http://www.w3.org/TR/dwbp-ucr/" target="_blank" rel="nofollow noopener noreferrer">use cases</a> to understand how the lack of standards is retarding the growth of an industry that should be transforming government and fueling economic growth at much higher rates. We heard direct testimony on teleconferences and webinars form Open Data leaders in cities and nations across the world. We interviewed practitioners, and compiled a dossier of Open Data challenges and issues requiring standards to move the industry forward.</p><p><a href="http://www.w3.org/standards/semanticweb/data"><img class=" alignright wp-image-4786" src="https://gijn.org/wp-content/uploads/2015/02/w3c.png" alt="w3c" width="249" height="142"></a>In six months following our Use Case analysis, we met in San Mateo, Californiam and created the outline for the Best Practices document we published last weekIt provides the Open Data industry with recommendations and guidance for the following areas:</p><h4>Metadata</h4><ul>
<li><em>What kind of metadata should be considered when describing data on the Web?</em></li>
<li><em>How can metadata be provided in a machine readable way?</em></li>
</ul><h4>Data Identification</h4><ul>
<li><em>How can unique re-use be provided for data resources?</em></li>
<li><em>How should URIs be designed and managed for persistence?</em></li>
</ul><h4>Data Formats</h4><ul>
<li><em>What kind of data formats should be considered when publishing data on the Web?</em></li>
</ul><h4><a href="https://gijn.org/wp-content/uploads/2015/02/data-word-cloud2.jpg"><img class=" wp-image-4739 alignright" src="https://gijn.org/wp-content/uploads/2015/02/data-word-cloud2.jpg" alt="data word cloud2" width="281" height="227"></a>Data Vocabularies</h4><ul>
<li><em>How can existing vocabularies be used to provide semantic interoperability?</em></li>
<li><em>How can a new vocabulary be designed if needed?</em></li>
</ul><h4>Data Licenses</h4><ul>
<li><em>How can data licenses be made machine readable?</em></li>
<li><em>How can license information about data published on the Web be provided/gathered?</em></li>
</ul><h4>Data Provenance</h4><ul>
<li><em>How can data provenance information about data published on the Web be provided/gathered?</em></li>
</ul><h4>Data Quality</h4><ul>
<li><em>How can data quality information about data on the Web be provided/gathered?</em></li>
</ul><h4><a href="https://gijn.org/wp-content/uploads/2015/02/data-security3.jpg"><img class=" wp-image-4788 alignright" src="https://gijn.org/wp-content/uploads/2015/02/data-security3.jpg" alt="data security3" width="256" height="186"></a>Sensitive Data</h4><ul>
<li><em>How can data be published without infringing a person's right to privacy or an organization's security?</em></li>
</ul><h4>Data Access</h4><ul>
<li><em>What kind of data access should be considered when publishing data on the Web?</em></li>
<li><em>What requirements should be taken into account when deciding how to make data available on the Web?</em></li>
</ul><h4>Data Versions</h4><ul>
<li><em>How can different versions of a dataset be tracked and managed?</em></li>
</ul><h4>Data Preservation</h4><ul>
<li><em>How can publishers decide when and how data on the Web should be archived?</em></li>
</ul><h4>Feedback</h4><ul>
<li><em>How can user feedback about data consumed from the Web be gathered?</em></li>
</ul><p>This might not look very sexy, but let me tell you: Open Data that is published using these Best Practice Standards <strong>WILL BE VERY SEXY.</strong> It will be easy to collect, verify, share, analyze, monetize, and re-produce.</p><aside class="module align-right half type-pull-quote">Our goal is simple. We want to change the world by making Open Data dependable, reliable, available, and free.</aside><p>It will give journalists new tools in how they verify sources and document government corruption. It will provide universities with dependable Open Data that can uncover vast new areas of social, political, organizational, and scientific research. And it will provide private companies with a vast sea of free information to power new businesses and reap impressive rewards.</p><p>It isn't magic. It is a common sense approach to standards. We are recommending things many already know and do but not everyone does it all the time or equally well.</p><p><a href="https://gijn.org/wp-content/uploads/2015/02/data-circle.png"><img class=" wp-image-4741 alignright" src="https://gijn.org/wp-content/uploads/2015/02/data-circle.png" alt="data circle" width="242" height="237"></a>It is not comprehensive. We are a small team and we don't know everything. So we invite the world to read what we have written and provide feedback. We want our ideas to spark a global dialog on how to structure Open Data Best Practices - what should be included, excluded, and refined. We welcome criticism, new ideas, and debate.</p><p>Our goal is simple. We want to change the world by making Open Data the dependable free resource that illuminates, enlightens, and transforms our planet with insight and knowledge without the pre-condition of publication for a purpose or select audience.</p><p>It should be free for all, and all for free. Dependable. Reliable. Available.</p><p>Please help us by reading <a href="http://www.w3.org/TR/dwbp/" target="_blank" rel="nofollow noopener noreferrer">our work</a> and sending in your comments.</p><hr><p><a href="https://gijn.org/wp-content/uploads/2014/09/Adler.jpg"><img class="alignleft size-full wp-image-3890" src="https://gijn.org/wp-content/uploads/2014/09/Adler.jpg" alt="Adler" width="200" height="200"></a><em><a href="https://www.linkedin.com/pub/steven-adler/0/166/759">Steven Adler</a>&nbsp;(<a href="https://twitter.com/DataGov">@DataGov</a>) is chief information strategist for IBM. He is an expert in data science and an innovator who has developed billion-dollar-revenue businesses in the areas of data governance, enterprise privacy architectures, and Internet insurance. He has advised governments and large NGOs on open government data, data standards, privacy, regulation, and systemic risk.&nbsp;</em></p><p>
	This <a target="_blank" href="https://gijn.org/stories/putting-the-open-in-open-data-creating-a-global-standard/">article</a> first appeared on <a target="_blank" href="https://gijn.org">Global Investigative Journalism Network</a> and is republished here under a Creative Commons license.
	<img id="republication-tracker-tool-source" src="https://gijn.org/?republication-pixel=true&amp;post=657947&amp;ga=UA-21528033-17">
</p>

10 Outstanding Data Projects Win the 2024 Sigma Awards

by Marianne Bouchart • March 27, 2024

There were 52 data journalism entries from 22 countries in shortlist for the 2024 Sigma Awards. Here are the top 10 winning projects.

Tipsheet Data Journalism Reporting Tools & Tips

Tipsheet for Using Ocean Data in Your Investigations

by Miriam Forero Ariza • March 22, 2024

Investigations into what happens on, under, and around the ocean can often be answered thanks to the vast amount of data available online.

Data Journalism Reporting Tools & Tips

Best Practices for Working With Mass Shootings Data

by Rowan Philp • March 20, 2024

There can be confusion among journalists about “mass shootings” data, which leads to wildly different numbers and deeper confusion among audiences.

Trump had 200k fewer donors in this period of the 2024 presidential campaign than in 2020

Data Journalism Data Journalism Top 10

Trump’s Disappearing Donors, Tracking the Mars Rover, and the Ongoing Wars in Gaza and Ukraine

by Ana Beatriz Assam and Laura Dixon • March 1, 2024

Our column of the best in data journalism also features stories on AI’s ability to forecast the weather, analyzing the Argentine president’s Tweetstorms, and apathetic EU voters.

Accessibility Settings

text size

color options

reading tools

other

Stories

Topics

Putting the “Open” in Open Data: Creating a Global Standard

Metadata

Data Identification

Data Formats

Data Vocabularies

Data Licenses

Data Provenance

Data Quality

Sensitive Data

Data Access

Data Versions

Data Preservation

Feedback

Read other stories tagged with:

Republish this article

Read Next

Data Journalism

10 Outstanding Data Projects Win the 2024 Sigma Awards

Data Journalism Reporting Tools & Tips

Best Practices for Working With Mass Shootings Data

Data Journalism Data Journalism Top 10

Trump’s Disappearing Donors, Tracking the Mars Rover, and the Ongoing Wars in Gaza and Ukraine

Stories

Topics

Putting the “Open” in Open Data: Creating a Global Standard

Related Resources

Tipsheet for Using Ocean Data in Your Investigations

No Coding Required: A Step-by-Step Guide to Scraping Websites With Data Miner

Basic Data Journalism Tips for Editors

My Favorite Tools: Venezuela’s Lisseth Boon on Design and Data Visualization

Share

Metadata

Data Identification

Data Formats

Data Vocabularies

Data Licenses

Data Provenance

Data Quality

Sensitive Data

Data Access

Data Versions

Data Preservation

Feedback

Related Resources

Tipsheet for Using Ocean Data in Your Investigations

No Coding Required: A Step-by-Step Guide to Scraping Websites With Data Miner

Basic Data Journalism Tips for Editors

My Favorite Tools: Venezuela’s Lisseth Boon on Design and Data Visualization

Related Stories

10 Outstanding Data Projects Win the 2024 Sigma Awards

Tipsheet for Using Ocean Data in Your Investigations

Best Practices for Working With Mass Shootings Data

Trump’s Disappearing Donors, Tracking the Mars Rover, and the Ongoing Wars in Gaza and Ukraine

Read other stories tagged with:

Republish this article

Read Next

Data Journalism

10 Outstanding Data Projects Win the 2024 Sigma Awards

Tipsheet Data Journalism Reporting Tools & Tips

Tipsheet for Using Ocean Data in Your Investigations

Data Journalism Reporting Tools & Tips

Best Practices for Working With Mass Shootings Data

Data Journalism Data Journalism Top 10

Trump’s Disappearing Donors, Tracking the Mars Rover, and the Ongoing Wars in Gaza and Ukraine