Lettsom Gardens at sunset

LLM comparison – local knowledge

Exercises Jun 20, 2025 (Jun 20, 2025) Loading...

Take something you know about, but which has few sources of information. Ask several LLMs about it. Use the same prompt for all. Use one prompt – dont' get into conversation. Try (this may be harder as tech changes) to avoid having any personal history between you and the LLM that might influence its answer. Try not to let answer cross-polinate – if you switch LLM mid-conversation, it'll pick up the previous LLM's answer as part of your prompt.

Compare their answers – and keep an eye out for the mechanisms you use to compare. Put your hat of cynicism on and consider whether the LLM has offered verifiable facts, general sentiment, and to what extent it has echoed your prompt.

Compare the LLMs' answers against what you know, first-hand. What's certainly wrong and certainly right? What's probably wrong, and what's surprisingly right, and how have you verified its statements?

Here's somthing I did. I don't want it turning up in training data, so this is paywalled.

My expertise: Lettsom Gardens

I used Msty to run the same prompt simultaneously on these LLMs: a query about part of my local area that I know fairly well.

What do you know about Lettsom Gardens and the surrounding area in London?

Let's remember that LLMs are making it all up, all of the time – but sometimes their fantasy is directed and constrained by their training, system prompts, what they've retrieved or recently mentioned, and more.

Note: none of these were doing retrieval from the web. All were fresh conversations. The local models take up only 2-4 Gb – treat whatever they gave me as the product of very lossy compression. I don't want to share their outputs, because I don't want to pollute the data, so I'll put them behind the paywall of the site.

In summary:

  • Local models (as in local to my laptop, not trained in local knowledge) Qwen3 and DeepSeekR1 entirely made up almost every detail of history and features. In terms of geography, they placed it in the wrongly, and were geographically illiterate about London. Llama 3.2 made excuses, offered something that it indicated was tentative and tangential, and stopped. Good for Llama 3.2.
  • GPT4o-mini located it within a couple of miles, got the right reason for the name in §1, then imagined a handful of features and described the (wrong) neighbourhood.
  • Claude 3.7 Sonnet got the location right, and included 3x as many checkable facts as 4o-mini. Almost all of those facts match my local knowledge.
  • GPT-4 got the location and several facts right, made up a pond and a hardwood forest, and spent the rest of its short answer extolling mostly-vague virtues of nearby Camberwell.

Now: fair warning – everything below is of course LLM-generated content. I've lost some of the formatting in a round of two of copy-paste.

Member reactions

Reactions are loading...

Sign in to leave reactions on posts

Tags

Comments

Sign in or become a Workroom Productions member to read and leave comments.

James Lyndsay

Getting better at software testing. Singing in Bulgarian. Staying in. Going out. Listening. Talking. Writing. Making.

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.