TL;DR
We discuss the intriguing progress of OpenAI’s o3 towards solving GeoGuessing (the task of guessing the geographic location where an image was taken) and discuss whether this constitutes progress towards non-trivial reasoning or just a case of training-data leakage.
GeoGuesser Progress
There was an interesting blog post this week testing OpenAI’s o3 model on the task of GeoGuessing. This is a quite difficult task for a human, and entails guessing the geographic location shown in a given photo. Non-experts will struggle quite a bit at this task. For example, take a look at the picture below and try to guess the location. (The answer is in the linked blog post above.)
There is a competitive GeoGuesser league where players attempt to guess locations and are scored on how far their guesses are from ground truth. There have been reports for some time that reasoning models perform quite well at this task, but there has been reasonable pushback that the models may well just be guessing position from metadata embedded in the images. A Master I class GeoGuesser put this to test by falsifying the EXIF metadata in images and played a round of GeoGuesser with o3. OpenAI’s model was allowed to use search to narrow down locations while the human GeoGuesser was not (although this seems to have only made a difference on one of the tested photos). The o3 model narrowly beats the Master I player (this is the second highest ranking, below Champion).
The question as always with proprietary models is how impressive is this? GeoGuesser variants have been tested informally on OpenAI models for some time. There are also datasets like GeoLocation that provide large training datasets of geolocated images. It is entirely possible that OpenAI’s team has included GeoLocation or other similar data in their training process. They may well have trained o3 on the task of GeoGuessing explicitly. If that were the case, it would still be an impressive feat for o3, but much less impressive than if it turned out that o3 was never formally trained on geolocation data. A cynic would suggest, given the market capitalizations at stake, that it is more likely that o3 has seen this task explicitly before.
How could we disprove the cynic and prove the AI optimist’s case that o3 is reasoning? We would need a fully open LLM that performs similarly and which we can verify has not been trained on the task of GeoGuessing. A fully open LLM is one in which training data and post-training procedures are fully documented, allowing for reasonable statistical estimates of how far the GeoGuesser task is from the training distribution. Note that releases like Llama or DeepSeek are not fully open, as only weights and not training data or training procedures are open sourced. I hope that funding agencies recognize the importance of fully-open LLMs for rigorous scientific analysis and fund such work in the near future.
Interesting Links from Around the Web
https://www.nextplatform.com/2025/04/25/no-quick-fixes-as-intel-losses-and-restructurings-continue/: Continuing pain at Intel
https://www.nature.com/articles/d41586-025-01394-4: A richer understanding of rose geometry
https://www.nature.com/articles/d41586-025-01135-7: An exciting new organometallic compound
https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.134.177301: A statistical theory of transfer learning in fully connected networks.
About
Deep Into the Forest is a newsletter by Deep Forest Sciences, Inc. Get in touch with us at partnerships@deepforestsci.com.
Credits
Author: Bharath Ramsundar, Ph.D.
Editor: Sandya Subramanian, Ph.D.