Image: Shutterstock

Stories

•

Topics

» Data Journalism

4 Things Data Journalists Need to Know about Standard Deviation

by Denise-Marie Ordway • August 17, 2022

Read this article in

Image: Shutterstock

If you’re a journalist who reads academic research, you’ve likely seen the term “standard deviation” many times. If you’re not sure what it means or how to explain it to audiences, keep reading, because we’re going to break it down for you.

Here are four key things you need to know.

1. The standard deviation of a dataset is a number that indicates how much variation there is within the data.

When researchers analyze quantitative data such as birth rates, temperature readings, and student test scores, they typically calculate the standard deviation of the data to gauge how close or far apart the data is. A higher standard deviation means the data is more spread out. The lower the standard deviation, the more closely data cluster around the average value of the data.

Deborah J. Rumsey, a statistics professor at the Ohio State University, points out in her book Statistics for Dummies that the measure provides critical context.

“Without it, you’re getting only part of the story about the data,” she wrote. “Statisticians like to tell the story about the man who had one foot in a bucket of ice water and the other foot in a bucket of boiling water. He said, on average, he felt just great! But think about the variability in the two temperatures for each of his feet. Closer to home, the average house price, for example, tells you nothing about the range of house prices you may encounter when house-hunting. The average salary may not fully represent what’s really going on in your company, if the salaries are extremely spread out.”

2. Scientists can use standard deviation to make predictions, investigate trends, and answer other key research questions.

The standard deviation of a dataset plays a limited role in many academic studies. Scientists might only note standard deviation values in a table or list, or mention them within the body of an academic article.

Sometimes, however, researchers rely heavily on the measure to help them answer questions central to their studies. For example:

Researchers can make predictions about the weather, voter behavior, tax revenue, healthcare usage, and a host of other things based partly on the standard deviation of data gathered over time.
Equities researchers typically use the standard deviation of stock prices to measure market volatility, with a high standard deviation indicating high volatility.
Researchers examining student test scores can use the standard deviation to determine whether most students perform at or close to the average, or whether test scores are all over the place. The measure also allows researchers to estimate the proportion of students who need more help mastering the material.

Here’s a brief explanation of how to calculate standard deviation.

3. In some studies, scientists report their findings in terms of standard deviations instead of a unit of measurement, such as inches or pounds.

When datasets have data points with different units, scientists often need to standardize, or rescale, the data before they can draw comparisons and look for relationships. For instance, scientists might want to examine the relationship between orange juice consumption, measured in ounces [or grams], and flu vaccination rates, measured as the number of vaccines administered each month per 100,000 residents.

The process of standardizing data includes dividing each numerical data point by the standard deviation of the dataset. Doing this changes the units of measurement. Instead of expressing findings using common units such as ounces, inches, and pounds — or kilograms — they must be reported in terms of standard deviations.

Hypothetically, scientists looking at orange juice consumption and flu vaccination rates could conclude that a one standard deviation increase in juice consumption is associated with a one standard deviation reduction in vaccination rates.

While standardizing datasets can make them easier for researchers to work with, Brian Healy, an associate professor of neurology at Harvard Medical School, notes many people might have difficulty understanding the results. He urges journalists to read these papers closely.

“The problem is, unless you look really closely in the paper, you’ll have no idea what a one standard deviation means,” says Healy, who’s also the lead biostatistician for the Partners Multiple Sclerosis Center at Brigham and Women’s Hospital in Boston.

“Do understand the units that results are being shown in,” he adds. “If there is a number reported, you want to make sure you understand how to interpret the number, and you can’t understand how to interpret the number without knowing the units.”

4. Scientists can use standard deviation to help confirm whether a data point they consider an outlier actually is an outlier.

Outliers are extremely high or low values that can complicate statistical analyses, and skew results. Many researchers will remove or alter outliers caused by error — for example, an error in collecting or entering data.

When you look at a graph of all the data in a dataset, some data points appear to be outliers because they differ so much from the others. Since the standard deviation of a dataset takes into account how far away individual values are from the average, scientists often use it to gauge whether an unusual data point is an outlier. This method works well for datasets that follow the pattern of a symmetrical, bell-shaped curve in which the majority of data converge near the center of the bell, where the average value is located.

After calculating the standard deviation for that dataset, it’s easy to spot outliers. A general rule of thumb for data that follows a bell-shaped curve is that approximately 99.7% of the data will be within three standard deviations of the average. Data outside this boundary are usually deemed outliers.

Although the standard deviation of a dataset is affected by outliers, journalists should not assume a large standard deviation indicates data quality problems. As Rumsey writes in Statistics for Dummies, “a large standard deviation isn’t necessarily a bad thing; it just reflects a large amount of variation in the group that is being studied.”

This post was originally published by The Journalist’s Resource, and is reprinted here via its Creative Commons license. The Journalist’s Resource would like to thank Troy Quast, a professor of health economics at the University of South Florida’s College of Public Health, and Brian Healy, an associate professor of neurology at Harvard Medical School, for their help creating this tipsheet.

Additional Resources

5 Things Journalists Need to Know About Statistical Significance

New Data Tools and Tips for Investigating Climate Change

GIJN Resource Center: Data Journalism

Denise-Marie Ordway is managing editor of The Journalist’s Resource, which she joined in 2015 after working for newspapers and radio stations in the US and Central America. Her work has appeared in USA TODAY, The New York Times, and The Washington Post. She was a 2014-15 Harvard Nieman Fellow.

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Republish our articles for free, online or in print, under a Creative Commons license.

Read other stories tagged with:

academic research climate Cross post data journalism outliers predictions standard deviation statistics variation

Republish this article

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Material from GIJN’s website is generally available for republication under a Creative Commons Attribution-NonCommercial 4.0 International license. Images usually are published under a different license, so we advise you to use alternatives or contact us regarding permission. Here are our full terms for republication. You must credit the author, link to the original story, and name GIJN as the first publisher. For any queries or to send us a courtesy republication note, write to hello@gijn.org.

<h2>4 Things Data Journalists Need to Know about Standard Deviation</h2> by <a href="https://twitter.com/deniseordway?lang=en">Denise-Marie Ordway</a> for Global Investigative Journalism Network &bull; August 17, 2022 If you&rsquo;re a journalist who reads academic research, you&rsquo;ve likely seen the term &ldquo;standard deviation&rdquo; many times. If you&rsquo;re not sure what it means or how to explain it to audiences, keep reading, because we&rsquo;re going to break it down for you.Here are four key things you need to know.1. The standard deviation of a dataset is a number that indicates how much variation there is within the data.When researchers analyze quantitative data such as birth rates, temperature readings, and student test scores, they typically calculate the standard deviation of the data to&nbsp;<a href="https://www.ncbi.nlm.nih.gov/books/NBK574574/">gauge how close or far apart the data is</a>. A higher standard deviation means the data is more spread out. The lower the standard deviation, the more closely data cluster around the average value of the data.<aside class="module align-right half type-pull-quote">&ldquo;Statisticians like to tell the story about the man who had one foot in a bucket of ice water and the other in boiling water." -- statistics professor Deborah J. Rumsey</aside><a href="https://stat.osu.edu/people/rumsey-johnson.1">Deborah J. Rumsey</a>, a statistics professor at the Ohio State University, points out in her book&nbsp;<a href="https://www.dummies.com/article/academics-the-arts/math/statistics/finding-standard-deviation-in-a-statistical-sample-169326/">Statistics for Dummies</a>&nbsp;that the measure provides critical context.&ldquo;Without it, you&rsquo;re getting only part of the story about the data,&rdquo; she wrote. &ldquo;Statisticians like to tell the story about the man who had one foot in a bucket of ice water and the other foot in a bucket of boiling water. He said, on average, he felt just great! But think about the variability in the two temperatures for each of his feet. Closer to home, the average house price, for example, tells you nothing about the range of house prices you may encounter when house-hunting. The average salary may not fully represent what&rsquo;s really going on in your company, if the salaries are extremely spread out.&rdquo;<a href="https://gijn.org/wp-content/uploads/2022/08/Deviation-JR.png"><img class="aligncenter wp-image-562619 size-large" src="https://gijn.org/wp-content/uploads/2022/08/Deviation-JR-771x560.png" alt="" width="771" height="560"></a>2. Scientists can use standard deviation to make predictions, investigate trends, and answer other key research questions.The standard deviation of a dataset plays a limited role in many academic studies. Scientists might only note standard deviation values in a table or list, or mention them within the body of an academic article.Sometimes, however, researchers rely heavily on the measure to help them answer questions central to their studies. For example:<ul>
<li>Researchers can make predictions about the weather, voter behavior, tax revenue, healthcare usage, and a host of other things based partly on the standard deviation of data gathered over time.</li>
<li>Equities researchers typically use&nbsp;<a href="https://www.emerald.com/insight/content/doi/10.1108/SEF-09-2020-0389/full/html">the standard deviation of stock prices to measure market volatility</a>, with a high standard deviation indicating high volatility.</li>
<li>Researchers examining student test scores can use the standard deviation to determine whether most students perform at or close to the average, or whether test scores are all over the place. The measure also allows researchers to estimate the proportion of students who need more help mastering the material.</li>
</ul>Here&rsquo;s&nbsp;<a href="https://www.ncbi.nlm.nih.gov/books/NBK574574/">a brief explanation</a>&nbsp;of how to calculate standard deviation.3. In some studies, scientists report their findings in terms of standard deviations instead of a unit of measurement, such as inches or pounds.When datasets have data points with different units, scientists often need to standardize, or rescale, the data before they can draw comparisons and look for relationships. For instance, scientists might want to examine the relationship between orange juice consumption, measured in ounces [or grams], and flu vaccination rates, measured as the number of vaccines administered each month per 100,000 residents.The process of standardizing data includes dividing each numerical data point by the standard deviation of the dataset. Doing this changes the units of measurement. Instead of expressing findings using common units such as ounces, inches, and pounds -- or kilograms -- they must be reported in terms of standard deviations.<aside class="module align-right half type-pull-quote">"If there is a number reported, you want to make sure you understand how to interpret the number." &mdash; Harvard Medical School professor Brian Healy</aside>Hypothetically, scientists looking at orange juice consumption and flu vaccination rates could conclude that a one standard deviation increase in juice consumption is associated with a one standard deviation reduction in vaccination rates.While standardizing datasets can make them easier for researchers to work with,&nbsp;<a href="https://postgraduateeducation.hms.harvard.edu/faculty-staff/brian-healy-0">Brian Healy</a>, an associate professor of neurology at Harvard Medical School, notes many people might have difficulty understanding the results. He urges journalists to read these papers closely.&ldquo;The problem is, unless you look really closely in the paper, you&rsquo;ll have no idea what a one standard deviation means,&rdquo; says Healy, who&rsquo;s also the lead biostatistician for the Partners Multiple Sclerosis Center at Brigham and Women&rsquo;s Hospital in Boston.&ldquo;Do understand the units that results are being shown in,&rdquo; he adds. &ldquo;If there is a number reported, you want to make sure you understand how to interpret the number, and you can&rsquo;t understand how to interpret the number without knowing the units.&rdquo;4. Scientists can use standard deviation to help confirm whether a data point they consider an outlier actually is an outlier.Outliers are extremely high or low values that can&nbsp;<a href="https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1139&amp;context=pare">complicate statistical analyses, and skew results</a>. Many researchers will remove or alter outliers caused by error -- for example, an error in collecting or entering data.When you look at a graph of all the data in a dataset, some data points appear to be outliers because they differ so much from the others. Since the standard deviation of a dataset takes into account how far away individual values are from the average, scientists often use it to gauge whether an unusual data point is an outlier. This method works well for datasets that follow the pattern of a symmetrical, bell-shaped curve in which the majority of data converge near the center of the bell, where the average value is located.<aside class="module align-right half type-pull-quote">Although the standard deviation is affected by outliers, journalists should not assume a large standard deviation indicates data quality problems.</aside>After calculating the standard deviation for that dataset, it&rsquo;s easy to spot outliers. A&nbsp;<a href="http://faculty.washington.edu/tamre/GS105Lecture2.pdf">general rule of thumb</a>&nbsp;for data that follows a bell-shaped curve is that approximately 99.7% of the data will be within three standard deviations of the average. Data outside this boundary are usually deemed outliers.Although the standard deviation of a dataset is affected by outliers, journalists should not assume a large standard deviation indicates data quality problems. As Rumsey writes in Statistics for Dummies, &ldquo;a large standard deviation isn&rsquo;t necessarily a bad thing; it just reflects a large amount of variation in the group that is being studied.&rdquo;This&nbsp;post was <a href="https://journalistsresource.org/media/standard-deviation-data-journalists/">originally published</a> by&nbsp;<a href="https://journalistsresource.org/">The Journalist&rsquo;s Resource</a>, and is reprinted here via its <a href="https://creativecommons.org/licenses/by-nd/4.0/">Creative Commons license</a>.&nbsp;The Journalist&rsquo;s Resource would like to thank&nbsp;<a href="https://health.usf.edu/publichealth/overviewcoph/faculty/troy-quast">Troy Quast</a>, a professor of health economics at the University of South Florida&rsquo;s College of Public Health, and&nbsp;<a href="https://postgraduateeducation.hms.harvard.edu/faculty-staff/brian-healy-0">Brian Healy</a>, an associate professor of neurology at Harvard Medical School, for their help creating this tipsheet.<h4>Additional Resources</h4><a href="https://gijn.org/2022/07/20/5-things-journalists-need-to-know-about-statistical-significance/">5 Things Journalists Need to Know About Statistical Significance</a><a href="https://gijn.org/2021/05/11/new-data-tools-and-tips-for-investigating-climate-change/">New Data Tools and Tips for Investigating Climate Change</a><a href="https://helpdesk.gijn.org/support/solutions/articles/14000036505-data-journalism">GIJN Resource Center: Data Journalism</a><hr><a href="https://gijn.org/wp-content/uploads/2022/04/Ordway.jpeg"><img class=" wp-image-509778 alignleft" src="https://gijn.org/wp-content/uploads/2022/04/Ordway.jpeg" alt="Denise-Marie Ordway" width="167" height="167"></a><a href="https://twitter.com/deniseordway?lang=en">Denise-Marie Ordway</a> is managing editor of The Journalist&rsquo;s Resource, which she joined in 2015 after working for newspapers and radio stations in the US and Central America. Her work has appeared in USA TODAY, The New York Times, and The Washington Post. She was a 2014-15 Harvard Nieman Fellow.
	This <a target="_blank" href="https://gijn.org/stories/4-things-data-journalists-need-to-know-about-standard-deviation/">article</a> first appeared on <a target="_blank" href="https://gijn.org">Global Investigative Journalism Network</a> and is republished here under a Creative Commons license.
	<img id="republication-tracker-tool-source" src="https://gijn.org/?republication-pixel=true&amp;post=657947&amp;ga=UA-21528033-17">

Lessons Learned: 10 Common Mistakes in Data Journalism

by Rowan Philp • April 24, 2024

GIJN asked speakers and attendees in the NICAR conference hallways for the data journalism gaps they see, and for under-covered topic areas and under-used skills that newsrooms can address.

Data Journalism News & Analysis

From Space to Story in Data Journalism

by Robert Simmon, Nightingale • April 19, 2024

Over the past 10 years satellite imagery has become an important component of data journalism. In the next 10, it will likely evolve further, from a tool used primarily for illustrating stories to an integral part of research and investigative reporting.

Data Journalism

10 Outstanding Data Projects Win the 2024 Sigma Awards

by Marianne Bouchart • March 27, 2024

There were 52 data journalism entries from 22 countries in shortlist for the 2024 Sigma Awards. Here are the top 10 winning projects.

Tipsheet Data Journalism Reporting Tools & Tips

Tipsheet for Using Ocean Data in Your Investigations

by Miriam Forero Ariza • March 22, 2024

Investigations into what happens on, under, and around the ocean can often be answered thanks to the vast amount of data available online.

Accessibility Settings

text size

color options

reading tools

other

Stories

Topics

4 Things Data Journalists Need to Know about Standard Deviation

Read this article in

Additional Resources

Read other stories tagged with:

Republish this article

Read Next

Data Journalism News & Analysis

Lessons Learned: 10 Common Mistakes in Data Journalism

Data Journalism News & Analysis

From Space to Story in Data Journalism

Data Journalism

10 Outstanding Data Projects Win the 2024 Sigma Awards

Stories

Topics

4 Things Data Journalists Need to Know about Standard Deviation

Read this article in

Related Resources

Tipsheet for Using Ocean Data in Your Investigations

No Coding Required: A Step-by-Step Guide to Scraping Websites With Data Miner

Basic Data Journalism Tips for Editors

My Favorite Tools: Venezuela’s Lisseth Boon on Design and Data Visualization

Share

Additional Resources

Related Resources

Tipsheet for Using Ocean Data in Your Investigations

No Coding Required: A Step-by-Step Guide to Scraping Websites With Data Miner

Basic Data Journalism Tips for Editors

My Favorite Tools: Venezuela’s Lisseth Boon on Design and Data Visualization

Related Stories

Lessons Learned: 10 Common Mistakes in Data Journalism

From Space to Story in Data Journalism

10 Outstanding Data Projects Win the 2024 Sigma Awards

Tipsheet for Using Ocean Data in Your Investigations

Read other stories tagged with:

Republish this article

Read Next

Data Journalism News & Analysis

Lessons Learned: 10 Common Mistakes in Data Journalism

Data Journalism News & Analysis

From Space to Story in Data Journalism

Data Journalism

10 Outstanding Data Projects Win the 2024 Sigma Awards

Tipsheet Data Journalism Reporting Tools & Tips

Tipsheet for Using Ocean Data in Your Investigations