Image: Shutterstock

Stories

•

Topics

Updated Test of 24 LLMs for Geolocation

by Foeke Postma, Bellingcat • September 3, 2025

In June, Bellingcat ran 500 geolocation tests, comparing LLMs from various companies against each other, as well as Google Lens — a staple tool for finding the location of photos.

At the time, ChatGPT o4-mini-high emerged as the clear winner, with Google Lens outperforming most other models. Just two months later, with new versions of these AI tools available, we re-ran the trial — this time including Google “AI Mode,” GPT-5, GPT-5 Thinking, and Grok 4 into the mix.

These five photos were excluded from our most recent trial as they were published in our previous article. Images: Bellingcat

The original test used 25 of Bellingcat’s own holiday photos. From cities to remote countryside, the images included scenes both with and without recognizable features — such as roads, signage, mountains, or architecture. Images were sourced from every continent.

For the updated trial, five test photos were excluded, as they had appeared in a previous article, thus compromising the integrity of the results.

All 24 models’ responses were ranked on a scale from 0 to 10, with 10 indicating an accurate and specific identification (such as a neighborhood, trail, or landmark) and 0 indicating no attempt to identify the location at all.

Google AI Mode was shown to be the most capable geolocation tool overall.

Grok 4 gave both better and worse answers compared to Grok 3 but, on average, scored marginally higher. However, it was still less accurate than older versions of Gemini and GPT.

GPT-5, even in ‘Thinking’ and ‘Pro’ modes, was a considerable downgrade when compared with the capabilities demonstrated by GPT o4-mini-high. In one example, of a city street with skyscrapers in the background, o4-mini-high correctly identified the street, while GPT-5 in Thinking mode pointed to the wrong country.

Despite delivering faster answers, GPT-5 appeared to sacrifice accuracy. A surprising number of errors and a general sense of disappointment in the new model have also been reported by other users.

Bellingcat tested GPT-5 and its ‘Thinking’ mode via the Plus subscription, which costs roughly the same as access to 04-mini-high prior to its retirement. Five of the most difficult test images were also run through GPT-5 Pro. But even Pro, with a premium price tag of €200 per month, failed to geolocate the photos any more accurately than GPT 04-mini-high.

A Beach, a Hotel, and a Ferris Wheel

The disparity between Google and the GPT models became even more apparent in Test 25 — a photo of a shoreline hotel in Noordwijk, the Netherlands, with a Ferris wheel rising just beyond the dunes.

Test 25: A photo of Noordwijk beach in the Netherlands. Image: Bellingcat

In the previous trial, most older models — including those from GPT, Claude, Gemini, and Grok — accurately identified the country as the Netherlands but failed to locate the town. Many latched onto the Ferris wheel but pointed instead to the seaside town of Scheveningen, which also has a Ferris wheel, though situated on a pier, not among the sand dunes.

However, the most recent models, GPT-5 Pro and Thinking, were even less accurate, identifying a beach in France — an entirely different country.

Unfortunately for open source researchers, following the release of GPT-5, OpenAI removed the option to select older models such as o4-mini-high. After a wave of negative feedback, OpenAI reinstated GPT-4o as the default model for paid subscribers. However, the most capable geolocation models identified in Bellingcat’s testing remain inaccessible.

Google AI Mode, on the other hand, was the first, and only model so far, to correctly identify Noordwijk as the location in Test 25.

Though AI Mode is powered by a version of Gemini 2.5, it outperformed Gemini 2.5 Pro Deep Research in these tests. Described by Google as its “most powerful AI search, with more advanced reasoning and multimodality,” AI Mode geolocated test images with greater accuracy than any GPT models, including our previous winner, o4-mini-high.

Image: Screenshot, Google

AI Mode is currently only available in India, the United Kingdom, and the United States.

The majority of models, at some point, returned a hallucination. Users should not rely solely on the answers provided by LLMs. Even the best options, including Google AI Mode, still, at times, confidently point to the wrong location.

The difference in models’ capabilities compared with just two months ago shows how quickly this field is evolving. However, OpenAI’s recent changes also suggest that progress is not guaranteed, and that AI’s ability to geolocate may plateau or even worsen over time. As new models emerge, Bellingcat will continue to test them.

Thanks to Nathan Patin for contributing to the original benchmark tests.

Editor’s Note: This story was originally published by Bellingcat and is reposted here with permission.

Foeke Postma works as a researcher and trainer at Bellingcat. He has a background in conflict analysis and resolution, and is particularly interested in military, environmental, and LGBT+ issues.

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Republish our articles for free, online or in print, under a Creative Commons license.

Read other stories tagged with:

AI Cross post Geolocation

Republish this article

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Material from GIJN’s website is generally available for republication under a Creative Commons Attribution-NonCommercial 4.0 International license. Images usually are published under a different license, so we advise you to use alternatives or contact us regarding permission. Here are our full terms for republication. You must credit the author, link to the original story, and name GIJN as the first publisher. For any queries or to send us a courtesy republication note, write to hello@gijn.org.

<h2>Updated Test of 24 LLMs for Geolocation</h2> by <a href="https://www.bellingcat.com/author/foekepostma/">Foeke Postma, Bellingcat</a> for Global Investigative Journalism Network &bull; September 3, 2025 In June, <a href="https://www.bellingcat.com/resources/how-tos/2025/06/06/have-llms-finally-mastered-geolocation/" target="_blank" rel="noreferrer noopener">Bellingcat ran 500 geolocation tests</a>, comparing LLMs from various companies against each other, as well as Google Lens &mdash; a staple tool for finding the location of photos.At the time, ChatGPT o4-mini-high emerged as the clear winner, with Google Lens outperforming most other models. Just two months later, with new versions of these AI tools available, we re-ran the trial &mdash; this time including Google &ldquo;AI Mode,&rdquo; GPT-5, GPT-5 Thinking, and Grok 4 into the mix.The <a href="https://www.bellingcat.com/resources/how-tos/2025/06/06/have-llms-finally-mastered-geolocation/">original test</a> used 25 of Bellingcat&rsquo;s own holiday photos. From cities to remote countryside, the images included scenes both with and without recognizable features &mdash; such as roads, signage, mountains, or architecture. Images were sourced from every continent.For the updated trial, five test photos were excluded, as they had appeared in a previous article, thus compromising the integrity of the results.All 24 models&rsquo; responses were ranked on a scale from 0 to 10, with 10 indicating an accurate and specific identification (such as a neighborhood, trail, or landmark) and 0 indicating no attempt to identify the location at all.<img src="https://public.flourish.studio/visualisation/24646406/thumbnail" width="100%" alt="chart visualization">Google AI Mode was shown to be the most capable geolocation tool overall.Grok 4 gave both better and worse answers compared to Grok 3 but, on average, scored marginally higher. However, it was still less accurate than older versions of Gemini and GPT.<aside>Google AI Mode, on the other hand, was the first, and only model so far, to correctly identify Noordwijk as the location in Test 25.</aside>GPT-5, even in &lsquo;Thinking&rsquo; and &lsquo;Pro&rsquo; modes, was a considerable downgrade when compared with the capabilities demonstrated by GPT o4-mini-high. In one example, of a city street with skyscrapers in the background, o4-mini-high correctly identified the street, while GPT-5 in Thinking mode pointed to the wrong country.Despite delivering faster answers, GPT-5 appeared to sacrifice accuracy. A surprising number of errors and a general sense of disappointment in the new model have also been <a href="https://www.wired.com/story/openai-gpt-5-backlash-sam-altman/">reported by other users</a>.Bellingcat tested GPT-5 and its &lsquo;Thinking&rsquo; mode via the Plus subscription, which costs roughly the same as access to 04-mini-high prior to its retirement. Five of the most difficult test images were also run through GPT-5 Pro. But even Pro, with a premium price tag of &euro;200 per month, failed to geolocate the photos any more accurately than GPT 04-mini-high.<h4>A Beach, a Hotel, and a Ferris Wheel</h4>The disparity between Google and the GPT models became even more apparent in Test 25 &mdash; a photo of a shoreline hotel in Noordwijk, the Netherlands, with a Ferris wheel rising just beyond the dunes.In the previous trial, most older models &mdash; including those from GPT, Claude, Gemini, and Grok &mdash; accurately identified the country as the Netherlands but failed to locate the town. Many latched onto the Ferris wheel but pointed instead to the seaside town of Scheveningen, which also has a Ferris wheel, though situated on a pier, not among the sand dunes.However, the most recent models, GPT-5 Pro and Thinking, were even less accurate, identifying a beach in France &mdash; an entirely different country.<aside>The majority of models, at some point, returned a hallucination. Users should not rely solely on the answers provided by LLMs.</aside>Unfortunately for open source researchers, following the release of GPT-5, OpenAI removed the option to select older models such as o4-mini-high. After a wave of negative feedback, OpenAI reinstated GPT-4o as the default model for paid subscribers. However, the most capable geolocation models identified in Bellingcat&rsquo;s testing remain inaccessible.Google AI Mode, on the other hand, was the first, and only model so far, to correctly identify Noordwijk as the location in Test 25.Though AI Mode is powered by a version of Gemini 2.5, it outperformed Gemini 2.5 Pro Deep Research in these tests. <a href="https://blog.google/products/search/google-search-ai-mode-update/#ai-mode-search">Described by Google</a> as its &ldquo;most powerful AI search, with more advanced reasoning and multimodality,&rdquo; AI Mode geolocated test images with greater accuracy than any GPT models, including our previous winner, o4-mini-high.<a href="https://blog.google/around-the-globe/google-europe/united-kingdom/ai-mode-search-uk/?utm_source=chatgpt.com">AI Mode is currently only available</a> in India, the United Kingdom, and the United States.The majority of models, at some point, returned a hallucination. Users should not rely solely on the answers provided by LLMs. Even the best options, including Google AI Mode, still, at times, confidently point to the wrong location.The difference in models&rsquo; capabilities compared with just two months ago shows how quickly this field is evolving. However, OpenAI&rsquo;s recent changes also suggest that progress is not guaranteed, and that AI&rsquo;s ability to geolocate may plateau or even worsen over time. As new models emerge, Bellingcat will continue to test them.Thanks to Nathan Patin for contributing to the original benchmark tests.Editor's Note: This story was <a href="https://www.bellingcat.com/resources/2025/08/14/llms-vs-geolocation-gpt-5-performs-worse-than-other-ai-models/">originally published</a> by Bellingcat and is reposted here with permission.&nbsp;<hr><a href="https://gijn.org/wp-content/uploads/2022/08/Foeke-Postma-profile-picture.png"><img class=" wp-image-568934 alignleft" src="https://gijn.org/wp-content/uploads/2022/08/Foeke-Postma-profile-picture.png" alt="Foeke Postma profile picture" width="146" height="157"></a><a href="https://www.bellingcat.com/author/foekepostma/">Foeke Postma</a> works as a researcher and trainer at Bellingcat. He has a background in conflict analysis and resolution, and is particularly interested in military, environmental, and LGBT+ issues. 
	This <a target="_blank" href="https://gijn.org/stories/updated-test-24-llms-ai-geolocation/">article</a> first appeared on <a target="_blank" href="https://gijn.org">Global Investigative Journalism Network</a> and is republished here under a Creative Commons license.
	<img id="republication-tracker-tool-source" src="https://gijn.org/?republication-pixel=true&amp;post=657947&amp;ga=UA-21528033-17">

New Technology and Online Resources to Conduct Investigations With Old Photos

by Foeke Postma • August 30, 2022

Bellingcat’s Foeke Postma offers tips and tools for using new technology and online resources to investigate old photographs.

Case Studies

Can AI Chatbots Be Used for Geolocation?

by Dennis Kovtun, Bellingcat • July 26, 2023

Bellingcat fellow Dennis Kovtun tests the online geolocation capabilities of two popular AI chatbots — Microsoft’s Bing AI and Google’s Bard — and finds both have some serious drawbacks.

A page from the new Open Source Munitions Portal (OSMP) filtered for spent munitions in Ukraine. Image: Screenshot, OSMP

Databases Reporting Tools & Tips

New Open Source Tools and Tips to Investigate Bombing of Civilians

by Rowan Philp • October 14, 2024

An innovative new database, the Open Source Munitions Portal (OSMP), identifies and shows remnants of explosive devices in conflict zones.

Data Journalism

Using the Sun and Shadows for Geolocating Photos and Videos

by Youri van der Weide • January 4, 2021

In this quick how-to article, open source researcher Youri van der Weide walks readers through how to use the free tool SunCalc to verify where a photo or video posted online was taken.

Accessibility Settings

text size

color options

reading tools

other

Stories

Topics

Updated Test of 24 LLMs for Geolocation

A Beach, a Hotel, and a Ferris Wheel

Read other stories tagged with:

Republish this article

Read Next

Methodology Reporting Tools & Tips

New Technology and Online Resources to Conduct Investigations With Old Photos

Case Studies

Can AI Chatbots Be Used for Geolocation?

Databases Reporting Tools & Tips

New Open Source Tools and Tips to Investigate Bombing of Civilians

Data Journalism

Using the Sun and Shadows for Geolocating Photos and Videos

Stories

Topics

Updated Test of 24 LLMs for Geolocation

Related Resources

Struggling to Find the Right Open Source Tool? Try Bellingcat’s New Online Investigations Toolkit

10 Tips for Using Geolocation and Open Source Data to Fuel Investigations

Tipsheet for Investigative Journalists on War Crimes and Open Source Research

10 Lessons from Bellingcat’s Logan Williams on Digital Forensic Techniques

Share

A Beach, a Hotel, and a Ferris Wheel

Related Resources

Struggling to Find the Right Open Source Tool? Try Bellingcat’s New Online Investigations Toolkit

10 Tips for Using Geolocation and Open Source Data to Fuel Investigations

Tipsheet for Investigative Journalists on War Crimes and Open Source Research

10 Lessons from Bellingcat’s Logan Williams on Digital Forensic Techniques

Related Stories

New Technology and Online Resources to Conduct Investigations With Old Photos

Can AI Chatbots Be Used for Geolocation?

New Open Source Tools and Tips to Investigate Bombing of Civilians

Using the Sun and Shadows for Geolocating Photos and Videos

Read other stories tagged with:

Republish this article

Read Next

Methodology Reporting Tools & Tips

New Technology and Online Resources to Conduct Investigations With Old Photos

Case Studies

Can AI Chatbots Be Used for Geolocation?

Databases Reporting Tools & Tips

New Open Source Tools and Tips to Investigate Bombing of Civilians

Data Journalism

Using the Sun and Shadows for Geolocating Photos and Videos