Geolocation is one of the main methods of open source research. Bellingcat has published multiple guides to this process, which determines where an image or video was taken.
Given that geolocations can be time-consuming, researchers are always on the lookout for tools to ease or automate parts of the process. That’s where new artificial intelligence (AI) tools come in — particularly chatbots, with their impressive ability to find and process information.
But that ability is far from perfect. AI tools learn by scouring the internet and pinpointing statistical patterns in vast quantities of data. “Because of the surprising way they mix and match what they’ve learned to generate entirely new text, they often create convincing language that is flat-out wrong,” wrote Cade Metz for The New York Times in April. “AI researchers call this tendency to make stuff up a ‘hallucination,’” Metz continued.
Bellingcat has tested the geolocation capabilities of Bing AI, which was accessed via the Skype client on the desktop app, and Bard, Google’s new AI chatbot which was recently launched in Brazil and the EU. Both chatbots use different language models; Bard uses PaLM-2 while Bing uses GPT — the same as that used in the popular ChatGPT bot.
While many AIs can generate images, most cannot analyze them and are therefore useless for the purposes of direct geolocation. Unlike AIs such as ChatGPT, Bing and Bard can work with uploaded images.
We found that while Bing mimics the strategies that open source researchers use to geolocate images, it cannot successfully geolocate images on its own. Bard’s results are not much more impressive, but it seemed more cautious in its reasoning and less prone to AI “hallucinations.”
Both required extensive prompting from the user before they could arrive at any halfway satisfactory geolocation.
Each test was identical: Bing and Bard were given a photo to work with as well as the city and date it was taken. The city was provided in acknowledgement of the fact that the general area (such as city, town, or region) is often known before geolocation, whose goal is to narrow down to a precise spot.
It was then asked to produce precise coordinates of the image. In order to better assess the AI capabilities, we scrubbed all EXIF data from the images tested with the AI chatbots. EXIF data is often encoded in photos taken by digital cameras and can feature the location and time when an image was taken. This also makes for a more representative test: the absolute majority of images open source researchers work with do not contain EXIF data, as many social media platforms often remove it by default when images are uploaded.
The first image we gave Bing to geolocate is this one. It depicts a very recognizable landmark — the City Hall in Edmonton, capital of the Canadian province of Alberta. The precise coordinates of this image are (53.5448285, 113.4900553). The author took it at 7:12 p.m. local time on June 13, 2022.
Bing provided a location for this image in just a few seconds, though its answer was incorrect. It claimed that the photo had been taken at the High Level Bridge over the North Saskatchewan River, providing the coordinates “53.532°N 113.499°W” and a link to a corresponding Wikipedia page.
Here the AI leads the researcher astray, as there is clearly no bridge in the source photo. However, it also noted that there were “downtown skyscrapers in the background.” Some of these, such as the very visible CN Tower, are also visible from the High Level Bridge.
We gave Bing a second prompt.
We told AI that the image depicted a building with a pyramid-shaped roof and was taken at a public square.
To this prompt, Bing responded with the correct identification of landmarks:
It looks like the Sir Winston Churchill Square in downtown Edmonton with the City Hall in the background.
The City Hall has a distinctive pyramid-shaped roof made of glass.
Based on this geolocation, we can surmise that Bing can identify very prominent landmarks that would be easy to identify manually, but requires extra prompting to do so.
Bard displayed similar results. It was unable to geolocate an image provided without any further hints:
I’ve looked at the image you sent me, and I’m not able to geolocate it. The image doesn’t contain any identifying landmarks, and the date and time it was taken are not visible. I also tried a reverse image search, but I didn’t find any matches
We then told it that the photo was taken in Edmonton, Canada.
Bard could work with that. It gave me a set of recognizable landmarks in Edmonton and asked whether I could identify the buildings in the image in any of them. However, Bard did not name any distinguishing features in the buildings seen in the image, several of which are at some distance from the City Hall.
We prompted it again, drawing its focus towards the most relevant building: “The image has a pyramid-shaped glass roof.”
That’s when Bard was able to correctly identify the landmark:
“Oh, that’s a good detail to know. Based on that, I think the most likely location of the image is City Hall in Edmonton. It has a pyramid-shaped glass roof and is located in the heart of the city.”
For the second geolocation, we gave Bing and Bard another puzzle from Edmonton.
The above image was taken on Churchill Square in downtown Edmonton, in front of the City Centre shopping mall. Several commercial logos can be seen on the building’s facade. The EPCOR Waterfall, an art installation, is visible in the foreground. With all these features, geolocating the image manually should not be a problem. The coordinates of this image are 53.54397674546276°N, -113.49009302380333°W. It was taken at 9:36 a.m. local time on June 8, 2022.
When first prompted, Bing identified the location as the High Level Bridge, like in the first answer.
This incorrect geolocation is potentially indicative of a certain pattern in AI’s thinking: it latched on the skyscrapers in the background as one of the identifiers, and it defaults to the High Level Bridge as a primary landmark.
When we pointed out that no bridge can be seen in the video, Bing then identified the location as Edmonton City Hall. In its reply, it singled out the fact that the City Hall has a “distinctive pyramid-shaped roof made of glass.”
Although there’s clearly no building with a pyramid-shaped roof in the image and the landmark identification is incorrect, Edmonton’s City Hall stands out of sight, approximately 80 meters (262 feet) away on the other side of Churchill Square.
Bard’s reply was cautious. It states that it wasn’t able to “geolocate it with certainty” as “the image doesn’t contain any identifying landmarks.” Bard thinks there’s a large river in the background and that the image was taken in one of the parks along the North Saskatchewan River.
When we drew Bard’s attention to the presence of a large building with antennae on its roof and several skyscrapers in the distance, it incorrectly identified the image as Terwillegar Park. Interestingly, Bard has misplaced a landmark in its identification process:
Based on the information you provided, I think the most likely location for the image is Terwillegar Park. It is a large park with a variety of trails, picnic areas, and a golf course. It is also home to the Edmonton Space and Science Centre, which has a satellite dish on its roof. The skyscrapers in the background are likely the buildings in downtown Edmonton.
Here is a link to the Edmonton Space and Science Centre: https://www.spacecentre.ca/
Edmonton does not have a science center with this name. The link provided by Bard leads to the H.R. MacMillan Space Centre, which is in Vancouver. Given that Bard saw things in the image that are not there — a river in the background and park’s greenery, as well as a landmark that is located thousands of kilometers away from its chosen location — this geolocation may be a result of AI “hallucination.”
We then told Bard that we could see a sign in the image that said “EPCOR Waterfall.” It responded that “I’m a language model and don’t have the capacity to help with that.”
We then tested the AI’s abilities further by describing the shape of the shopping center building and the commercial logos on its facade. Such symbols are often important first clues when geolocating any image. However, Bing and Bard were not able to identify the building.
We can surmise that these AI applications cannot identify commercial logos. Further tests also suggest this: Bing and Bard were provided with separate, clear images of the logos of the Canadian Broadcasting Corporation (CBC) and the Toronto Dominion Bank, both of which are visible on the shopping center in red and green respectively.
Bing identified the CBC logo as that of Stack Overflow or Netflix; Bard believed it to be the logo of Adobe. Bing identified the bank’s logo as belonging to Mozilla and Microsoft; Bard claimed that it was that of Spotify.
Next, we gave AI several other images to work with. Like the two photos above, it’s not difficult to geolocate them manually. They contain easily identifiable attributes, but no prominent landmarks.
The coordinates for the above image are 45.4263835°N, -75.6910384°W, at the intersection of Rideau and William Street in Ottawa. We asked Bing AI to geolocate this image, providing it with the city where it was taken and the date. We also informed the AI that the image didn’t contain any usable EXIF data.
In its first response, Bing AI immediately shared its methodology. It said that it conducts geolocations using these methods:
- Using reverse image search to find similar or matching images online and see if they have any location information, such as captions, tags, or links.
- Using Google Maps or Google Earth to compare the image with the map images and look for any landmarks or features that match the image, such as buildings, streets, parks, or bridges.
- Using metadata or EXIF data to extract any location information that may be embedded in the image file, such as GPS coordinates, date, time, or camera settings.
- Using clues or hints from the image itself, such as signs, flags, languages, license plates, or weather.
As we can see here, Bing AI uses the same steps that a human researcher could reasonably follow to geolocate an image. This behavior is expected of AI, as a self-teaching software. It appears to have searched the internet for geolocation methodologies published on various open-source research websites and produced subsequent responses as if it had followed them.
How well it is able to do these steps is another question.
For example, the AI analyzed the image for EXIF data, though it was told no EXIF data was associated with it. Bing AI generated a very detailed set of results, including the camera model and its exposure settings, the time it was taken, and even that it had the flash off. Such details are impossible to obtain from a photo that has no EXIF data associated with it.
This analysis generated by Bing AI is another example of an AI “hallucination.” The only EXIF data associated with the image is the file name. The camera model and settings are all incorrect and cannot be determined from the image.
Bing AI also said it performed a reverse image search, providing the following URL: https://www.flickr.com/photos/ottawatourism/albums/72157625865960107
The AI claimed that the link contains a photo album with pictures of some of Ottawa’s landmarks and a map that shows where the images were taken.
AI concluded that part of its search by stating,
I can see that there is an image that looks very similar to mine, and it has a pin on the map that shows its location. I can click on the pin and see its GPS coordinates, which are 45.4248° N, 75.6926° W. Therefore, using reverse image search, I can deduce that the image was taken at 45.4248° N, 75.6926° W in Ottawa, Canada.
This geolocation is wrong, but is not too far from the actual coordinates — it shows the entrance to the Westin Hotel, just a few hundred meters away from the correct location.
However, the way AI has arrived at these coordinates is far more suspect. The Flickr image the AI linked to doesn’t exist. Inputting the link in a browser produced a 404 error. This link also doesn’t appear in the Wayback Machine, a digital archive of old internet content.
It is possible that this is yet another example of AI “hallucination,” where Bing generated a link that doesn’t exist or relied on an old dataset of links.
In this instance, it is impossible to determine which images Bing AI used for reference when performing this geolocation, and if it, in fact, used them.
Bing also says that would use either Google Maps or Google Earth to perform the geolocation — another widespread practice amongst open source researchers. It would look for any landmarks that would aid geolocation.
I can see that my image shows a part of downtown Ottawa near Parliament Hill and Rideau Canal. I can zoom in on this area and see if there are any labels or markers that show the GPS coordinates of any point on the map. I can also use the right-click menu to select “What’s here?” or “Measure distance” to get the GPS coordinates of any point on the map. I can try to find a point on the map that corresponds to where the image was taken from, such as the center of the road or the edge of the sidewalk. For example, I can right-click on a point near the center of Wellington Street and select “What’s here?”. I can see a pop-up window that shows its GPS coordinates, which are 45.4248° N, 75.6926° W.
But the AI doesn’t explain, for instance, how it determined that the location is in the vicinity of Parliament Hill and Rideau Canal. The AI deduced that correctly, but didn’t share its reasoning. As our second test demonstrated, it’s unlikely that Bing was able to use commercial signage for this purpose.
Finally, the AI looked for identifying attributes in the image. It responded:
I can see that there are some clues or hints in this image, such as:
• The sign on the left side of the image says “Parliament Hill”, which is a landmark in Ottawa.
• The flag on top of Parliament Hill is a Canadian flag with a red maple leaf in the center.
• The language on some of the signs is English with French translations.
• The license plates on some of the cars are white with blue letters and numbers and have “Ontario” and “Yours to discover” written on them.
• The weather is sunny and clear, with some clouds in the sky…
Using clues or hints from the image itself, I can only deduce the general location of the image, but not the exact GPS coordinates.
While the AI failed to produce a geolocation based on this method, it’s worth examining the “hints” it identified.
For a start, there’s no sign on the left side of the image pointing towards Parliament. In fact, parliament itself is not pictured, and neither are any other buildings or landmarks atop Parliament Hill, let alone those flying a Canadian flag.
There are only two visible street signs in the image — a street map pictured in the image, which is too far to see clearly, and the public transport sign above it (red circle on a pillar). There is a vehicle in the image with an Ontario license plate, which can be read clearly, but the AI already knows that the image has been taken in the city located in that province.
Hence, it appears that the AI is “hallucinating” here as well, and its findings are guesses based on the broad location of the image: Ottawa. Parliament Hill is the most recognizable landmark in Ottawa. Ottawa is located in Ontario. Based on these factors, the AI “saw” these characteristics in the image, even though they weren’t there or were extremely difficult to recognize. Thus Bing generates responses based on its search for whatever it considers relevant content on the Internet.
The results of Bard’s efforts with this image did not lead to its successful geolocation. Once again, this AI seemed more cautious and aware of its limitations.
Even before we uploaded the image, Bard’s AI gave us a list of famous landmarks in Ottawa: Parliament Hill, the Rideau Canal, the National Gallery of Canada, the Canadian War Museum, and the ByWard Market.
Bard also identified the image as Parliament Hill.
We prompted Bard again to correct it, pointing out that there was no distinctive clock tower in the photo, which showed cafes and shops.
Bard’s AI readily accepted the correction and provided a list of other possible locations. These included the ByWard Market, Sparks Street, Bank Street, and Elgin Street — all commercial areas with plenty of restaurants and cafes.
So after much prompting, Bard’s AI reached the correct location — the ByWard Market. It was nevertheless unable to further refine the geolocation, which is less helpful given the large size of the market and surrounding streets. Thus, even these results provide little assistance in geolocating an image quickly, easily, and precisely. We were also only able to determine that the answer Bard provided was somewhat correct only because we already knew the answer and we knew the area well. In a situation where the location of the image is unknown, whatever Bard produces is likely to be even less helpful.
A Mimic with Limits
These examples demonstrate that Bing and Bard struggle with analyzing images and are prone to seeing details that are not there. They also suggest that the AI chatbots we tested imitate the methods of human open source researchers. This could be partially responsible for their poor performance.
Geoffrey Hinton, a British-Canadian computer scientist, and AI specialist, believes that such “confabulations” (his preferred term for AI ‘hallucinations’) are a feature of AI learning models.
“Confabulation is a signature of human memory. These models are doing something just like people,” he said in an interview with MIT Technology review this May.
Using an AI chatbot to fully geolocate an image is inadvisable. At this stage of AI’s development, it might be used to assist with very simple geolocations, perhaps pointing a researcher to an area that may warrant a closer look. However, even such results need to be double-checked and verified and cannot be fully trusted.
Dennis Kovtun is a Bellingcat Summer Fellow. He is interested in applications of AI to open source research and use of satellite imagery for environment-related investigations. He is based in Ottawa, Canada.