Image: Shutterstock
Tips for Using ChatGPT as a Quick Search Tool in Investigations
With the new availability of generative AI tools, investigative journalists are grappling with — and often confused by — AI’s many ethical pitfalls, error threats, and advanced options.
The early use cases for journalists range from data code production to subject briefings to chart-making. And new free or paid-for AI tools relevant to journalism are released every week. However, there are numerous examples of why generative AI tools do not provide usable evidence or corroboration of anything, and are prone to bias and error. (For more on the ethical considerations and advanced applications of these platforms, see GIJN’s recent story New AI and Large Language Model Tools for Journalists.)
At the 2024 Investigative Reporters and Editors conference, one workshop suggested that reporters can avoid much of this peril and confusion — while learning this growing technology along the way — by simply using the original AI chatbot, ChatGPT, as a much more narrowly focused search tool.
Jeremy Jojola — an investigative reporter at KUSA-TV in Denver, Colorado who won IRE’s Don Bolles Medal in 2020 for his reporting on white supremacy extremist groups — said contact searches and quick document analysis using this large-language model can not only save you time at the start of an investigation, but also prompt further outreach to critical, human sources that might not otherwise be included.
The key, he said, is to use LLMs like ChatGPT only at the beginning of a reporting project, and only to help orient you, the journalist — never to inform your audience.
“It’s not a reference source, it’s a starting point — it’s very important for our ethical and legal standards that nothing it spits out goes into our reporting,” Jojola cautioned. “I never ever use what ChatGPT spits out for content — not for copy; not for scripting. You get the treasure of your story with good reporting, but this can give you a map that points you in the direction of that treasure. It’s an amazing tool, even if just used like an advanced form of Google search.”
ChatGPT + Google Search = Better, Quicker Results
As a large language model, ChatGPT fundamentally uses vast troves of training data, rather than web page indexing, to generate responses, so search engines like Google are still more accurate, comprehensive, and up-to-date for general queries — especially when used with Boolean search tricks.
But reporters like Jojola argue that Google results, and the ranking impact of search engine optimized sites, can be so overwhelming that they can dissuade journalists from making initial calls to unfamiliar sources or identifying clear trends, and that they lack quick analysis features for documents.
So he uses ChatGPT — and especially the paid-for GPT-4 version, which can also search the internet — to find quick contacts, leads, and takeaways that might be worth pursuing. The free GPT 3.5 version from OpenAI is not directly connected to the internet, but, instead, is trained to give text answers based on “vast amounts of data from the internet written by humans, including conversations” — which helps explain both its errors and its human-like dialogue. Its data is not up-to-date. The GPT-4 model ($20 per month) — which can also take images and screenshots as input prompts – is trained on many more gigabytes of data, and can also browse the live internet, and therefore includes up-to-date information.
Despite errors, Jojola said OpenAI’s various chatbots are useful at organizing data, and that their rapid, at-a-glance search response format and concise summaries can actually lead to more, not less, reporter contact with new, human sources.
As an example for the workshop, he entered the following into the GPT-4 chat box: “I’m a reporter. I need to find a reputable expert who could offer insight into the reintroduction of wolves in Colorado. Give me a list of a few names, the organizations they’re affiliated with, and their contacts.” In addition to officials within agencies and nonprofit groups, the tool searched the internet and its databases to find and describe a retired US government biologist who led wolf recovery attempts in the Rocky Mountains, as well as the founder of the “Carnivore Coexistence Lab.”
“How do you not call such interesting people right away?” Jojola said, chuckling. “The reintroduction of wolves is actually a big story in Colorado — a big-issue conflict between urban and rural areas. If you’re on a day-turn story, ChatGPT gives you a list of go-to experts and their numbers in five seconds, instead of having to spend 40 minutes trawling through Google.”
He noted that benefits of the paid GPT-4 version include more document uploads, more up-to-date data, and contextual “discussions” — where the chat interface gives answers in the context of prior questions — but that the free version remains effective for many quick searches.
Since time-saving and quick filtering is his goal, Jojola doesn’t spend time worrying about the exact phrasing of his prompts. Instead, he concentrates on the what-where-when of the search he needs, and simply phrases prompts “the way I’d talk to a person, just more bossy.”
Some practical quick-search uses for ChatGPT shared at the workshop:
- Quick searches of “unflagged” public documents. In addition to those few documents flagged as newsworthy by human sources, reporters receive a constant tide of reports, audits, and copies of public contracts that may or may not contain indications of mismanagement, systemic error, corruption, or exploitation. Jojola recommends uploading lengthy documents to ChatGPT along with a simple prompt like this: “Give me a summary of this government contract and how much the entity will be paid for services. Give me names of people in this contract, and the page numbers where they appear.” (The example he used — from a government contract for license plate readers — spat out the correct public costs and terms, as well as several names by page number, within seconds.) However, he suggested that reporters avoid uploading sensitive or private documents. There are more sophisticated document parsing tools available, which also have optical character recognition — such as Google Pinpoint — but Jojola said ChatGPT offers a useful, rapid initial filter.
- Summarizing public concerns. Annual reports and public comment records on government regulations often contain dozens of dense pages on public concerns that few journalists have the time to read. At the workshop, Jojola uploaded a 40-page annual report and asked ChatGPT to only list and summarize the concerns raised. “It’s amazing how fast it can scan,” he remarked. Again: the tool may miss nuances and make errors, but can give reporters an almost immediate snapshot of the nature and volume of problems raised at public meetings, which could trigger deeper digging and a potentially unexpected story.
- “Big picture” contacts at a glance. ChatGPT can help prompt reporters to pick up the phone to contact new sources on “maybe stories,” Jojola argued, simply by spitting out a half-dozen prominent expert names, titles, and phone numbers on a single, uncluttered page in a few seconds. Reporters can then pick out a “big picture” source — perhaps a person listed at a university — and ask the suggested expert about the issue they’re focused on, without having to scroll through, and click into, dozens of website links of varied relevance thrown up by keywords. For instance: a simple prompt of “Give me contact details, including phone numbers, of organizations that help victims of domestic violence in South Africa” receives a much clearer, single-page contacts list in ChatGPT than a Google keyword search for “domestic violence support ‘South Africa’ contact site:za,” which offers pages of local and international websites. Using the LLM, an initial list of “first-call” sources becomes obvious. (The AI tool also included a mix of contacts concerned with “gender-based violence,” which has a different focus in South Africa, and may be more relevant to the story.)
- Organizing official contacts. Jojola showed how ChatGPT automatically sources and alphabetizes large sets of public contact details in moments, with a prompt like: “Give me the office phone numbers and government email addresses for members of Colorado’s state legislature, specifically House Democrats.” “This makes it so much easier to collect this data and get the emails, rather than go into the sites of each lawmaker,” he explained.
- Layperson translations of technical findings — such as autopsy reports. The reality for many smaller newsrooms is that — unless a human source tells you something is suspicious — red flags can be easily missed in technical or jargon-heavy reports because there is often insufficient time to have it analyzed, or an absence of specialist colleagues to ask. Jojola said autopsy reports were a good example of the kind of document that can trigger new avenues of inquiry, after a quick upload and translation request in a generally reliable AI search tool. Of course, ChatGPT’s explanation of a medical finding must then be double-checked with forensic sources. “A lot of us in the news business aren’t very good at understanding autopsy reports — they have a lot of medical phrases and toxicology terms that are hard to understand, and 12-letter chemicals,” he noted. Jojola then shared how the following prompt for a real autopsy report of the victim of a police shooting revealed undisclosed details about the case: “Give me a synopsis of this autopsy report. Give me, in layperson’s terms, what substances the deceased had in his system.”
- Quick apples-to-apples comparisons. It’s sometimes difficult for reporters to immediately grasp whether a data point they see in a press release or annual report is unusually high or low, or “a big deal.” In addition to time comparisons, AI tools can give you immediate geographic targets for comparison research, with simple prompts like “Give me cities in Africa that have similar populations to Kigali, Rwanda.” (ChatGPT immediately noted that Blantyre, Malawi; Freetown, Sierra Leone; and Mombasa, Kenya all have the same population of 1.2 million.) “City reporters like to compare their communities to other cities — things like crime, population growth, transportation issues,” said Jojola. “Again, those numbers from AI won’t appear [in stories] until I’ve triple-checked them, but it gives you a quick sense of things out of whack.”
Still, Jojola acknowledges that his ChatGPT searches do sometimes go off-course. “I asked for a scientific study on road rage the other day, and it spit out this seemingly great study, and I said give me the source, and it went to a personal injury law firm — which is not great.” he recalled.
But since it was the only Google-like search to be checked, misdirections like this, he said, shouldn’t really matter. “It also gave me a road rage study by the National Institutes of Health, which is a more credible source, so I started with that instead,” he noted.
“We can’t be afraid of technology, because it’s coming,” he concluded. “You just need to stick to your standards and your process. You have to cite the right source: you’re never going to cite Google search as a source, so not ChatGPT either.”
Rowan Philp is a senior reporter for GIJN. He was formerly chief reporter for South Africa’s Sunday Times. As a foreign correspondent, he has reported on news, politics, corruption, and conflict from more than two dozen countries around the world.