Image: Shutterstock
Step-By-Step Guide for Journalists on the Basics of Google Sheets
Read this article in
Investigative journalists often find themselves digging through vast datasets that may be coming from a leak or freedom of information request. Knowing how to use spreadsheets is a crucial skill, as it allows you to find potential stories in large amounts of data. It also offers the dual advantages of facilitating data cleaning and visualization while ensuring a reader-friendly presentation of information. The key is to think critically about how to use the data available in order to find stories in the spreadsheets.
In this guide, we might refer to Google Sheets specifically, but please note that these steps and formulas also apply to Microsoft’s Excel. The differences are small, like where the buttons in the menu are placed or how they look. For the sake of simplicity, we are not going to indicate those differences every time.
To get started, open Google Drive and create a new spreadsheet using Google Sheets. To do so, click on the + NEW button on the top left of your Drive, and the click Google Sheets:
When the document opens, you can see an empty spreadsheet, made of rows and columns. This is where your data will live.
And here is a column.
When you start using an empty spreadsheet, it is recommended that you always use the top row for your headers. These are the names of your data categories, for instance, NAME. Shortly you will see an example of that.
For now, enter the following data in the descending fields in column A:
NAME
Andy
John
Mary
Sally
Mark
In column B, enter the salaries in descending fields as follows:
SALARY
450000
380000
410000
290000
950000
For numbers, avoid punctuation to prevent potential formatting issues.
Make sure you always include column headers to label your data for organization and sorting purposes, we will show you later how to sort and filter using headers.
Using Basic Formulas
Every formula starts with an “=” sign. Be careful when you type your formula, as Google will try to autofill what it thinks you want and might make mistakes. Always double-check your formulas.
To see the total of salaries paid, we will sum the annual salaries from column B of all employees in column A. You can see them in the picture below from B2 to B6. In a formula, this will be written as “=SUM(B2:B6)” for Google to recognize the beginning and ending cells of the dataset.
We can also calculate the average salary of the employees, by using the formula “=AVERAGE(B2:B6)”, as follows.
In our case, the average salary equals 496000. Beware of outliers! Notice that Mark in cell A6 is an outlier — his salary far exceeds the others — and because of that, the average salary of all the employees is skewed larger. (Note that all the other employees make less than the average of 496000.)
Journalists often use averages in their stories, but this calculation can unfairly misrepresent the data, because an average is calculated by totaling up individual amounts and dividing by the number of inputs.
To be more faithful to the true statistical picture, it’s often better to instead calculate the median, which involves finding the value that sits at the midpoint of all the values listed.
In this case, the median is 410000, a much more realistic representation of the typical salary for this company’s employees. (Note that two employees make more than the median, two make less, and one makes exactly that amount.) Acknowledging outliers and understanding their impact is crucial for accurate and transparent reporting.
We now make a new column C with new salaries, and want to see the new total. Instead of writing the formula again, we can simply click on B8, and we can see above the formula that populates that cell.
We can copy cell B8 and paste it onto cell C8 for the new total salary, and Google Sheets will automatically adjust the formula to use the data fields from our new column C.
Be careful, however. If you make a mistake in your first formula, copying it to other cells will only replicate the mistake. Always double-check your formulas to make sure the information you are gathering is correct, just as you would confirm any information a source gives you during an interview.
A faster way to copy formulas is to click on the cell with the formula, move your cursor to the bottom right corner where you see a blue dot, and a “+” sign will appear. Click it and drag your cursor to the right to copy the formula.
If we now make a new column with the header name DIFFERENCE in column D, we can quickly calculate how much the salaries were adjusted, by subtracting the old salary from the new one.
Once entered, Google Sheets will suggest the same calculation for the other rows, you can click the check sign to accept the suggestion, or move the cursor again to the bottom right corner of D2 and drag it down through the other rows to D6. Alternatively, you can double-click on the blue dot on the bottom right corner of D2 and it will automatically fill in the formula for all the rows underneath.
To find the salary percentage change between the two salaries, we need to take the new salary and subtract from it the old one, then divide the result by the old salary. Here is a mnemonic to help you remember that: ask yourself: “Did I love math as a child?” “Noooo!” N-O-O = (New – Old) / Old. (Credit to journalist Sarah Cohen for this excellent way to remember the formula).
We will write the formula as we would in math, using parentheses () to indicate to Google Sheets which operation takes place first, in our case C2-B2.
The percentage change results, however, are not easily usable for our stories.
To fix this, we can click on the top of column E, which will highlight all the cells in the column, then click Format in the top page menu, go to Number > Percent, and click it to format the whole column.
We will now make a new column F to calculate the percentage of the whole. To achieve this, we need to divide the part by the whole, meaning C2 (the new salary of Andy) by C8 (the total of new salaries).
As mentioned before, if we now want to do the same calculation for the rest of rows 3 to 6, we would drag cell F2 to apply the formula to the rest of column F. However, when we do that, we get some wrong or weird results.
This is because when you copy a formula and apply it to other rows, Google Sheets will always continue to move down the formula using the corresponding fields to make the new calculation. This means that, in cell F2 the formula is C2/C8, in cell F3 it’s C3/C9, F4 is C4/C10, F5 is C5/C11, and F6 is C6/C12.
We can avoid this issue by using the anchor “$” tool. An anchor is a command to tell Google Sheets to not move the formula from a specific cell, as we will always want to use a particular cell for our formula.
In this case, cell C8 (TOTAL NEW SALARY) is the cell we will always want to divide the new salaries by. To anchor C8 to our formula, we will write it as “C2/$C$8”. You can see two anchors in this formula because we are telling Google Sheets “do not leave column C, and do not leave row 8”. Always put the anchor before the applicable row and the column, not after.
After that, we can again select column F, go to the top menu, and click Number > Percent to see the correct percentages instead of the error “#DIV/0!” messages.
Creating Charts
As journalists, we want to help people understand the data by using visuals. In our case, we want to show the old salary and the new salary, so we will need to select from A1 to C6.
After selecting the data, click Insert in the top menu, and then click Chart.
Google Sheets will make a chart for you, but you can hover to the right to change it and customize it using the Chart Editor tool.
We can also make a pie chart with the percentages of the whole amount that we calculated before in column F. To do this, we will need column A with the names, and column F with the percentages. As the two columns are not next to each other, we cannot simply click on A1 and drag the cursor. In this case, we click on A1, keep pressed, drag down to A6, and then let go to complete the selection.
Then, we move the cursor to F1, press Command for Mac or Control for Windows button, keep it pressed, and then drag the cursor down to F6. This will select exactly the data we want for our pie chart.
Once again, we click on the top menu Insert > Chart, and Google Sheets will make a pie chart.
After you customize your graphics, you can save them separately for your stories. Click the three vertical dots on the top right of the chart, go to “Download chart” and select your preferred file type.
Sorting Long Lists of Data
What if we were provided with a long list of data and wanted to get a story out of it? Sorting is one of the best ways for you to see data in different ways and see if anything is interesting in the numbers.
For example, we will look at life expectancy by country in this spreadsheet. Make a copy of the document to try this yourself.
After you open the spreadsheet, click on column A, keep it pressed, and drag the cursor right to column C, so that the three columns and all the data rows below are selected. Then, in the top menu click Data > Sort range > Advanced range sorting options.
First, make sure you tick the “Data has header row” so that Google Sheets will know that the first row 1 is not a data row itself, but rather the titles/names of your columns.
If you select “Sort by: Male Life expectancy” A to Z, you will see the countries with the male life expectancy from lowest value (A) to highest (Z).
On the other hand, if you were to select Z to A, it would show you the highest-value countries first, in this case Switzerland.
But you can also see that some countries have no data. By sorting hundreds of rows, you can quickly see which countries are missing data, and you can report it in your work so that you don’t misrepresent your findings.
Freeze Rows and Columns
Finally, we will use a new set of data for our last tips. Make a copy of this document on the salary of European football players.
When you open it and scroll, you see that the top row headers disappear as you scroll down, which could be an issue if you want to keep that information always visible. To fix it, right-click on row 1, and at the bottom click View more row actions > Freeze up to row 1.
Now, when you scroll down you will see that the top row will always remain visible. You can also do the same with columns, try right-clicking on column A and freezing it.
If you change your mind, follow the same steps and you will be able to click “Unfreeze” to remove the block.
Using Filters
What if we wanted to see only a specific set of data? We would use the filter option.
To activate filters for your data, simply click anywhere on the data, then in the top menu click Data > Create a filter.
You will now see on the top row header fields that inverted triangle buttons appear next to the names of the columns.
Click on the inverted triangle next to Nation in C1, and click “Clear” to remove all the data selected, then select de GER and click OK.
You are now able to see all the best-paid German football players. You can repeat this process for multiple countries.
Be mindful, though, that the rows are not listed in consecutive number order anymore, they go from 1 to 6, 13, 19, all the way to 97. This is because Google Sheets is hiding the data that you decided you don’t want to see, but it didn’t delete it. So in this case, you cannot use the =SUM function at the bottom of the list as we did before [=SUM(G6:G97)] because it will add all the data from row 1 to row 97, including the hidden rows of non-German football players.
If you still want to use this specific data for your calculations, select the data, copy it, then at the bottom left of your document click the plus sign to create a new sheet.
Now paste the data, and you will see that the rows are numbered correctly once again. This is because now there is no more hidden data, and you can use the =SUM or other functions to calculate the data for your investigation.
You can find many more tips and tools on our website. Also, check out 10 Simple Data Errors That Can Ruin an Investigation, or browse our Resource Center’s data guides.
This guide is based on tips and techniques drawn from a session held at the 2023 Global Investigative Journalism Conference in Gothenburg, Sweden, from September 19-22, 2023. The session featured speakers Mark Horvit, professor at the University of Missouri School of Journalism; Tommy Kaas, co-founder of Kaas & Mulvad; and Andrew Lehren, senior editor for NBC News Investigations. You can watch the full session recording on our YouTube channel. The piece was edited and shortened for clarity by GIJN’s online producer Leonardo Peralta.
Mark Horvit is a professor at the University of Missouri School of Journalism, where he teaches investigative reporting, and is director of the school’s State Government Reporting Program. He is also chair of the Journalism Professions faculty. Horvit previously served as executive director of Investigative Reporters and Editors (IRE) and the National Institute for Computer-Assisted Reporting (NICAR), where he conducted training in investigative reporting and data journalism throughout the world. Horvit worked as a reporter and editor for 20 years before joining IRE.
Tommy Kaas is editor of Kaas & Mulvad (founded 2007), who specializes in finding news and patterns in complex data and presenting the results online. Tommy has many years of experience from a number of Danish media and from training Data Journalism, including as lecturer at the Journalism Programme at Roskilde University. He is also co-founder of Foreningen for Computerstøttet Journalistik 1997 and DICAR 1999.
Andrew W. Lehren is a senior editor on the NBC News Investigations team. He has reported on the pandemic and economic relief efforts, and led work on five International Consortium of Investigative Journalists collaborations, including the Implant Files, China Files, and the FinCEN Files. His work on people injured by foul balls led Major League Baseball to extend protections at all its stadiums. He previously spent almost 13 years as a reporter at The New York Times, working on a range of national, international, and investigative stories. He was one of the newspaper’s lead reporters analyzing the WikiLeaks trove of diplomatic cables, Afghanistan and Iraq war logs, and Guantanamo detainee dossiers. Those stories were compiled into a bestselling book, Open Secrets. He contributed to the Pulitzer Prize-winning series that examined substandard Chinese chemicals tainting US pharmaceuticals. He earned degrees from the University of Missouri and Lehigh University.