Stories

•

Topics

» Data Journalism

Data Biographies: Getting to Know Your Data

by Heather Krause • March 27, 2017

Read this article in

One of the most important takeaways from the NICAR conference — in my opinion — is the understanding that data stories can be simultaneously confusing and exciting. While I was there, I led a presentation on the importance of data biographies, and I’d like to share some of what I talked about with you.

There are many experts out there with years or decades of experience producing fascinating data stories and there are just as many (okay, probably many more) people still learning how to use data and experimenting with data journalism. When I’m introducing students to the world of data analysis and visualization, I’m often asked what the most important step in working with data is, and my answer is always the same: developing data biographies.

Too often, inexperienced data users make the mistake of taking their data at face value — assuming the story they see at first glance is the true (and only) story the data has to tell. I like to encourage people to treat data the way they would a human source. You’d never write a story without researching the person who supplied your information — why treat data any differently?

Getting to Know Your Data

For every piece of data you’re going to include in your story, you need to create a data biography — the background, or origin of your data. Just as you’d do a background check on a human source before publishing what they told you, you need to understand your data:

Where did it come from?
Who collected it?
How was it collected?
Most importantly, why was it collected?

This task is not always as straightforward as it may look at first blush. But getting to know your data can reveal crucial gaps, bias, misinformation, or overlooked details in your story. Think about it this way: if a doctor told you that you needed more sugar in your diet, you might assume there was some medical reason for his suggestion. If a candy apple salesman told you the same thing, you’d probably perceive the information very differently. Likewise, data isn’t just about the numbers in front of you, but the story behind how those numbers got there in the first place.

Real World Example: Violence Against Women Stats

A while ago, our team was working on a data story about violence against women. We spent a bit of time searching for data sources and determined that the United Nations was a good starting point. We downloaded the UN’s data on both violence against women and intimate partner violence and started our analysis.

Examining the variable for intimate partner violence over a woman’s lifetime, we did a couple of quick plots to get an idea of what trends within various countries looked like:

Trends in some countries were surprising and indicated unusual changes in the rates of violence against women. We wondered what was happening.

Our logical first step after our quick glance at the data was to create data biographies for each of these points. We needed to know the background of the information we were looking at so that we could better understand the patterns we were seeing.

Data Biography: Where?

In this case, the first thing we noticed in our data was where the information was coming from. Some of the data reflected all women, some reflected only women of a certain age, and some only included women of a specific marital status. All this data was lumped together in the same variable — the same name, the same label, and no hint as to the differences in the data sources.

Data Biography: Who?

Next, we looked at who collected that data. Examining the UN’s documentation to complete our data biography revealed that a wide range of people and organizations had been involved in collecting the data contained in this variable.

Data Biography: How and Why?

Some of the parties collecting the data we were using had gathered it for national statistics purposes; some were advocates making a case; some were testing out new methodology. All of our data, collected using different methods and for different reasons, was presented in the same table, with the same variable name and the same labels. Had we not taken the time to get to know our data with a data biography, we would never have realized how different all these data points were.

Once we had completed our data biography, it became clear that some of the trends we had seen that looked like significant changes in violence rates were actually variations in the data collection.

Using our data biography, we determined that data collection in Rwanda was reasonably consistent across the years. Because we were confident the trends we saw in that data were actually happening, we could move forward in investigating what caused such a dramatic spike in violence against women there.

Interestingly, in the years shown above, Rwanda elected a majority female parliament and passed the country’s first-ever laws aimed at preventing violence against women. So what did that mean?

Was there a huge backlash against the government changes that drove up violent acts?

Or were more incidents of violence being reported now that women felt they had recourse?

Even with a good data biography, you’ll still have to take care in interpreting your data — we’ll talk more about that in our next post on bulletproofing your data.

Data Isn’t Always Objective

Those of you who participated in the free online data journalism course I led with Alberto Cairo recently may remember this clip explaining how to create a data biography:

Remember, by taking the time to create a data biography, you can tell your story with full confidence that your sources are accurate and trustworthy. Want a shortcut to creating quality data biographies? Download a free copy of our data biography template.

This post originally appeared on the website of Datassist and is cross-posted here with the author’s permission.

Heather Krause is a data scientist. She founded Datassist, an international team of data professionals, which provides data consulting to journalists, non-profits and policy makers worldwide.

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Republish our articles for free, online or in print, under a Creative Commons license.

Read other stories tagged with:

Alberto Cairo data biography data collection data journalism Data Stories intimate partner violence Knight MOOC NICAR17 violence against women

Republish this article

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Material from GIJN’s website is generally available for republication under a Creative Commons Attribution-NonCommercial 4.0 International license. Images usually are published under a different license, so we advise you to use alternatives or contact us regarding permission. Here are our full terms for republication. You must credit the author, link to the original story, and name GIJN as the first publisher. For any queries or to send us a courtesy republication note, write to hello@gijn.org.

<h2>Data Biographies: Getting to Know Your Data</h2> by <a href="https://twitter.com/datassist">Heather Krause</a> for Global Investigative Journalism Network &bull; March 27, 2017 <a href="https://gijn.org/wp-content/uploads/2017/03/nicar17-banner.png"><img class="alignright wp-image-34937 size-medium" src="https://gijn.org/wp-content/uploads/2017/03/nicar17-banner-336x87.png" alt="" width="336" height="87"></a>One of the most important takeaways from the NICAR conference &mdash; in my opinion &mdash; is the understanding that data stories can be simultaneously confusing and exciting. While I was there, I led a presentation on the importance of data biographies, and I&rsquo;d like to share some of what I talked about with you.There are many experts out there with years or decades of experience producing fascinating data stories and there are just as many (okay, probably many more) people still learning how to use data and experimenting with data journalism. When I&rsquo;m introducing students to the world of data analysis and visualization, I&rsquo;m often asked what the most important step in working with data is, and my answer is always the same: developing data biographies.Too often, inexperienced data users make the mistake of taking their data at face value &mdash; assuming the story they see at first glance is the true (and only) story the data has to tell. I like to encourage people to treat data the way they would a human source. You&rsquo;d never write a story without researching the person who supplied your information &mdash; why treat data any differently?<h3>Getting to Know Your Data</h3>For every piece of data you&rsquo;re going to include in your story, you need to create a data biography &mdash; the background, or origin of your data. Just as you&rsquo;d do a background check on a human source before publishing what they told you, you need to understand your data:<ul>
<li>Where did it come from?</li>
<li>Who collected it?</li>
<li>How was it collected?</li>
<li>Most importantly, why was it collected?</li>
</ul>This task is not always as straightforward as it may look at first blush. But <a href="http://idatassist.com/strategic-resources-for-data-journalists/">getting to know your data</a> can reveal crucial gaps, bias, misinformation, or overlooked details in your story. Think about it this way: if a doctor told you that you needed more sugar in your diet, you might assume there was some medical reason for his suggestion. If a candy apple salesman told you the same thing, you&rsquo;d probably perceive the information very differently. Likewise, data isn&rsquo;t just about the numbers in front of you, but the story behind how those numbers got there in the first place.<h3>Real World Example: Violence Against Women Stats</h3>A while ago, our team was working on a data story about violence against women. We spent a bit of time searching for data sources and determined that the United Nations was a good starting point. We downloaded the UN&rsquo;s data on both <a href="https://unstats.un.org/unsd/gender/worldswomen.html">violence against women and intimate partner violence</a> and started our analysis.Examining the variable for intimate partner violence over a woman&rsquo;s lifetime, we did a couple of quick plots to get an idea of what trends within various countries looked like:<a href="https://gijn.org/wp-content/uploads/2017/03/data-biographies-violence-against-women.jpg"><img class="aligncenter wp-image-34927 size-large" src="https://gijn.org/wp-content/uploads/2017/03/data-biographies-violence-against-women-771x464.jpg" alt="" width="771" height="464"></a>Trends in some countries were surprising and indicated unusual changes in the rates of violence against women. We wondered what was happening.<a href="https://gijn.org/wp-content/uploads/2017/03/data-biographies-violence-against-women-2.jpg"><img class="aligncenter wp-image-34928 size-large" src="https://gijn.org/wp-content/uploads/2017/03/data-biographies-violence-against-women-2-771x495.jpg" alt="" width="771" height="495"></a>Our logical first step after our quick glance at the data was to create data biographies for each of these points. We needed to know the background of the information we were looking at so that we could better understand the patterns we were seeing.<h3>Data Biography: Where?</h3>In this case, the first thing we noticed in our data was where the information was coming from. Some of the data reflected all women, some reflected only women of a certain age, and some only included women of a specific marital status. All this data was lumped together in the same variable &mdash; the same name, the same label, and no hint as to the differences in the data sources.<h3><a href="https://gijn.org/wp-content/uploads/2017/03/data-biographies-data-breakdown.jpg"><img class="aligncenter wp-image-34929 size-large" src="https://gijn.org/wp-content/uploads/2017/03/data-biographies-data-breakdown-771x323.jpg" alt="" width="771" height="323"></a>Data Biography: Who?</h3>Next, we looked at who collected that data. Examining the UN&rsquo;s documentation to complete our data biography revealed that a wide range of people and organizations had been involved in collecting the data contained in this variable.<h3>Data Biography: How and Why?</h3>Some of the parties collecting the data we were using had gathered it for national statistics purposes; some were advocates making a case; some were testing out new methodology. All of our data, collected using different methods and for different reasons, was presented in the same table, with the same variable name and the same labels. Had we not taken the time to get to know our data with a data biography, we would never have realized how different all these data points were.<a href="https://gijn.org/wp-content/uploads/2017/03/data-biographies-different-data-points.jpg"><img class="aligncenter wp-image-34930 size-large" src="https://gijn.org/wp-content/uploads/2017/03/data-biographies-different-data-points-771x325.jpg" alt="" width="771" height="325"></a>Once we had completed our data biography, it became clear that some of the trends we had seen that looked like significant changes in violence rates were actually variations in the data collection.<a href="https://gijn.org/wp-content/uploads/2017/03/data-biographies-rwanda-1.jpg"><img class="aligncenter wp-image-34931 size-large" src="https://gijn.org/wp-content/uploads/2017/03/data-biographies-rwanda-1-771x505.jpg" alt="" width="771" height="505"></a><a href="https://gijn.org/wp-content/uploads/2017/03/data-biographies-rwanda-2.jpg"><img class="aligncenter wp-image-34932 size-large" src="https://gijn.org/wp-content/uploads/2017/03/data-biographies-rwanda-2-771x502.jpg" alt="" width="771" height="502"></a><a href="https://gijn.org/wp-content/uploads/2017/03/data-biographies-rwanda-3.jpg"><img class="aligncenter wp-image-34933 size-large" src="https://gijn.org/wp-content/uploads/2017/03/data-biographies-rwanda-3-771x420.jpg" alt="" width="771" height="420"></a>Using our data biography, we determined that data collection in Rwanda was reasonably consistent across the years. Because we were confident the trends we saw in that data were actually happening, we could move forward in investigating what caused such a dramatic spike in violence against women there.Interestingly, in the years shown above, Rwanda elected a majority female parliament and passed the country&rsquo;s first-ever laws aimed at preventing violence against women. So what did that mean?Was there a huge backlash against the government changes that drove up violent acts?Or were more incidents of violence being reported now that women felt they had recourse?Even with a good data biography, you&rsquo;ll still have to take care in <a href="http://idatassist.com/numbers-moving-in-a-negative-direction-is-not-always-a-negative-thing/">interpreting your data</a> &mdash; we&rsquo;ll talk more about that in our next post on bulletproofing your data.<h3>Data Isn&rsquo;t Always Objective</h3>Those of you who participated in the free online data journalism course I led with Alberto Cairo recently may remember this clip explaining how to create a data biography:Remember, by taking the time to create a data biography, you can tell your story with full confidence that your sources are accurate and trustworthy. Want a shortcut to creating quality data biographies? Download a free copy of our data biography <a href="https://www.dropbox.com/s/uau1bvjbjvvwgqa/Datassist%20Data%20Biography%20Template.xlsx?dl=0">template</a>.<hr><a href="https://gijn.org/wp-content/uploads/2017/03/heather-krause-profile.jpg"><img class="alignleft wp-image-34934" src="https://gijn.org/wp-content/uploads/2017/03/heather-krause-profile.jpg" alt="" width="180" height="180"></a>This post <a href="http://idatassist.com/data-biographies-how-to-get-to-know-your-data/">originally appeared</a> on the website of Datassist and is cross-posted here with the author's permission.<a href="https://twitter.com/datassist">Heather Krause</a> is a data scientist. She founded <a href="http://idatassist.com/">Datassist</a>, an international team of data professionals, which provides data consulting to journalists, non-profits and policy makers worldwide.
	This <a target="_blank" href="https://gijn.org/stories/data-biographies-getting-to-know-your-data/">article</a> first appeared on <a target="_blank" href="https://gijn.org">Global Investigative Journalism Network</a> and is republished here under a Creative Commons license.
	<img id="republication-tracker-tool-source" src="https://gijn.org/?republication-pixel=true&amp;post=657947&amp;ga=UA-21528033-17">

One Name at a Time: How Die Zeit Built a Searchable Database of Nazi Party Members

by Hanna Duggal • June 26, 2026

An online tool set up by the German newspaper Die Zeit, in cooperation with archives in Germany and in the United States, allows people to search several million Nazi Party membership cards.

Data Journalism

How the Hindu Is Embedding AI Into Its Data Journalism

by Neha Gupta • April 10, 2026

LLMs are quietly reshaping data journalism workflows at The Hindu, helping reporters process vast document sets, write scripts and build interactive tools.

Data Journalism

Developing a Data State Of Mind: Key Tips for Editors

by Hanna Duggal • April 3, 2026

Data is woven into how journalists cover everything from local government spending to global climate change patterns, but for editors without a specialist background, it can be daunting.

Data Journalism

2026 Sigma Awards for Data Journalism Open for Entries – Deadline Extended

by GIJN Staff • November 20, 2025

The Sigma Awards celebrate the best data journalism from around the world. Submissions are now open for data projects published in 2025.

Accessibility Settings

text size

color options

reading tools

other

Stories

Topics

Data Biographies: Getting to Know Your Data

Read this article in

Getting to Know Your Data

Real World Example: Violence Against Women Stats

Data Biography: Where?

Data Biography: Who?

Data Biography: How and Why?

Data Isn’t Always Objective

Read other stories tagged with:

Republish this article

Read Next

Data Journalism

One Name at a Time: How Die Zeit Built a Searchable Database of Nazi Party Members

Data Journalism

How the Hindu Is Embedding AI Into Its Data Journalism

Data Journalism

Developing a Data State Of Mind: Key Tips for Editors

Data Journalism

2026 Sigma Awards for Data Journalism Open for Entries – Deadline Extended

Stories

Topics

Data Biographies: Getting to Know Your Data

Read this article in

Related Resources

Step-By-Step Guide for Journalists on the Basics of Google Sheets

Tipsheet for Using Ocean Data in Your Investigations

No Coding Required: A Step-by-Step Guide to Scraping Websites With Data Miner

GIJC23 – The Future of Data Journalism: New Analytical Tools, Data Visualization, and AI

Share

Getting to Know Your Data

Real World Example: Violence Against Women Stats

Data Biography: Where?

Data Biography: Who?

Data Biography: How and Why?

Data Isn’t Always Objective

Related Resources

Step-By-Step Guide for Journalists on the Basics of Google Sheets

Tipsheet for Using Ocean Data in Your Investigations

No Coding Required: A Step-by-Step Guide to Scraping Websites With Data Miner

GIJC23 – The Future of Data Journalism: New Analytical Tools, Data Visualization, and AI

Related Stories

One Name at a Time: How Die Zeit Built a Searchable Database of Nazi Party Members

How the Hindu Is Embedding AI Into Its Data Journalism

Developing a Data State Of Mind: Key Tips for Editors

2026 Sigma Awards for Data Journalism Open for Entries – Deadline Extended

Read other stories tagged with:

Republish this article

Read Next

Data Journalism

One Name at a Time: How Die Zeit Built a Searchable Database of Nazi Party Members

Data Journalism

How the Hindu Is Embedding AI Into Its Data Journalism

Data Journalism

Developing a Data State Of Mind: Key Tips for Editors

Data Journalism

2026 Sigma Awards for Data Journalism Open for Entries – Deadline Extended