Many journalists have come to rely on quotation marks around phrases to target their searches in Google, having quickly found solid research connections in the past. But online search expert Henk van Ess warns that Google now often ignores precise instructions.
For instance, the following targeted search – “Saint Kitts” “Taylor Swift” – fetches around 400,000 results connecting the Caribbean island to the US singer, where, in reality, there are only two online connections of any substance, which happen to involve celebrity merchandise sold from that island.
In a recent GIJN masterclass webinar on Google search – attended by 642 journalists from 101 countries – van Ess explained that Google’s search algorithm has changed over time “to please the masses,” and has a “bias” favoring popular terms while often ignoring, or substituting, the precise terms you searched for. Van Ess is one of the world’s leading open source journalism trainers, and was a co-creator of the Who Posted What? tool that can date-search Facebook posts.
He explained that Google now leans on a perceived “strength of signal” to offer results – and that journalists need to force Google to ignore its built-in biases to find what they’re after.
Rather than narrowing searches as in the past, van Ess explained that extra, precise search terms and Google “dorks” (search hacking formulas) can increase the number of useful results that reporters can explore.
He also demonstrated how researchers can burrow through the mass of anticipated results by using tricks like the names of colors in Google Images, and phrases that the Google algorithm itself uses.
In his many public presentations and GIJN webinars, van Ess always emphasizes that journalists need to think differently when making online searches. Think both literally and laterally, he advises — but not as you would when questioning another person.
A simple example: If you’re searching for a map, don’t use the term “map,” but, rather, a word more likely to appear on any map – like “scale.”
One of his favorite techniques is the “excluding trick,” where you can attach the minus sign to terms you don’t want Google to include in results. So if you encounter, say, a video on YouTube, but need to see where else, or where first, it was posted, you can try searching the video URL along with this formula: -site:youtube.com. Similarly, you can use the minus sign to exclude a famous name you might not need, and which might otherwise overwhelm a search, like: -Putin.
He also encourages reporters without coding skills to get past their fear of the strings of characters that appear in URLs, and use them to their advantage. For instance, for social media search in the Who Posted What? tool, he said journalists can look at long Facebook links that typically begin like this – https://www.facebook.com/search/posts/?… – and replace the word “posts” with other words like “photos” or “videos” to find links to other forms of evidence for the same search.
Van Ess said that keyword searches in Google were “mostly executed” a few years ago, but that these are now often overridden “if a similar word is more popular.”
He stresses that he is, in general, a big fan of Google search. However, to illustrate the signals problems with the search algorithm for webinar attendees, he used the name of the Caribbean island nation of Saint Kitts and Nevis in combination with several search terms.
Researching energy impacts of Russian sanctions on that small Caribbean nation, van Ess showed that even the placement of the island’s name at the beginning of a search, and within quotations – “Saint Kitts” Putin gas supply – ignores the island. (A search for the word “Kitts” inside the links shows no matches). “Why doesn’t Google take into account what you just typed?” asked van Ess. “Because it looks at signals. If it’s a weak signal, it tries to give you a stronger signal. Google thinks the strongest signals are ‘gas,’ ‘Putin,’ ‘supply,’ – but Saint Kitts? – [No]. There is no connection in the news, and Google wants to please us.”
Some Google Search Workarounds
“Nowadays, the more keywords you use, the more results you get – which might sound counter-intuitive,” explained van Ess. “Because we are more precise, Google will also be more precise, and show us more than the generic searches. The more nifty formulas we type into Google, the more results we get.”
How to put this advice into practice:
- Use the power of search operators and Google dorks. Van Ess showed that a general search combination – supreme court new york gun law – gets more useful results when you add one of these specific terms, like: supreme court new york gun law filetype:pdf.
- Use quotation marks around single words. Van Ess showed that quotes around single words help prevent the search engine from ignoring less popular single terms.
- Force Google to find connections within short passages. A powerful “trick” that van Ess recommends when searching for links between two things is to use the word AROUND (in capitals), along with the maximum number of words you think might separate those two things in sentences or titles. That number should appear in parentheses after AROUND, without a space. So a strong search might look like this: “grace mugabe” AROUND(7) “dubai” or the example he used to find any connection between a Washington Post investigative reporter and van Ess’s home country: “Beth Reinhard” AROUND(12) “Netherlands”. “You have to shout at Google, otherwise it will search for the word ‘around’ itself,” he explained.
- Think like a document. While humans might assume that a mission statement would include the phrase “mission statement,” van Ess points out that Google might “think” of that kind of description by word strings common to those texts – like, simply, “is to be” – and words like “ambition.” His advice? Look at examples of the document type you’d expect to find – like mission statement filetype:pdf – and experiment with phrases that pop up in quotations, together with the organization you’re interested in – like: site:amazon.com.
- Try keywords with the “inurl” dork to find Facebook groups. Van Ess showed that reporters can try this formula – “SEARCHTERM” inurl:groups site:facebook.com – to find elusive social media groups. For example, if you were investigating groups that might have been part of the January 6, 2021 riot at the US Capitol, you might search: “stop the steal” inurl:groups site:facebook.com.
- If you struggle to find a particular photo in Google Image search – but know some of the prominent colors likely to appear in the background (such as the brand colors of a restaurant chain) – try typing the name of those colors (such as: green white) after the keyword. To find whether one particular Serbian politician had visited politicians in Saint Kitts, van Ess used the major colors of its flag and the country’s site name to find a photo: Nikola Selaković green red site:kn.
- As an alternative search, try using word strings that algorithms use – such as “Image may contain: KEYWORD,” or: “May be an image of KEYWORDS” site:facebook.com.
“Quotation marks (around phrases) were the best methods of avoiding information overload, but not anymore,” said van Ess. “Think like a document. Use the tricks. These are especially important for investigative reporters, because you already have a hunch that there is a connection out there.”
Henk van Ess on Visual Thinking for Online Investigations
My Favorite Tools with BuzzFeed’s Craig Silverman
Tips for Mining Social Media Platforms with Henk van Ess
Rowan Philp is a reporter for GIJN. He was formerly chief reporter for South Africa’s Sunday Times. As a foreign correspondent, he has reported on news, politics, corruption, and conflict from more than two dozen countries around the world.