Everyone knows the shortest distance between two points is a straight line.
This is not that. Name two things — any two — and we’ll chart the scenic route between them through the world’s knowledge.
Charting the route…
The trail went cold.
Even the long way round has its limits — we rummaged through the atlas and couldn’t join these two before the lanterns ran low. Try once more (the dice fall differently each time), or pick a slightly less lonely pair.
Under the hood — how the scenic route is found
This is a live bidirectional stochastic beam search over the Wikidata knowledge graph — more than 100 million entities, queried in real time from your browser. Both endpoints grow a frontier outward along entity-to-entity statements, and the two constellations race to meet in the middle.
The trick is the objective. Shortest paths through a knowledge graph are almost always dull — everything is an instance of something, in some country, speaking some language. So instead of distance, we optimize interestingness:
- Property informativeness. Each of the graph’s relation types carries a hand-tuned weight, playing the role of inverse property frequency: taxonomic glue (instance of, country, gender) is penalized or pruned outright, while high-surprise relations (named after, inspired by, discoverer or inventor, present in work) are rewarded.
- Hub damping. Mega-hubs (the United States, “human”, the English language) would otherwise vacuum every path through themselves — a degree-based penalty keeps the route off the interstate, which is the same intuition behind degree-weighted path relevance and PageRank teleport damping.
- Diversity pressure. Repeating the same relation twice in a row is penalized, so the chain keeps changing key — no ten-link taxonomy ladders.
- Concreteness prior. Stops with a face — a photograph, a birthdate, a place on the map — make satisfying stories; bare abstractions do not. Nodes that are classes of things tax every route through them, and the very small set of ultra-generic concepts (“human”, “organization”, “education”) is barred from serving as a through-station at all: “X is a human, and so is Y” explains nothing.
- Launch and land on specifics. Generic glue relations are penalized extra hard on the first hop out of either endpoint, so journeys start and finish with their most particular facts.
- Endpoints look both ways. The search walks outgoing statements (which keeps mega-hubs tame), but an abstract destination has only abstractions downstream — its liveliest neighbors are the people, works and events whose own pages point at it. So both endpoints also sample their incoming statements with a live SPARQL query, giving concepts concrete first steps.
- Gumbel-perturbed beam. Each edge score gets Gumbel noise before the beam keeps its top-k, which makes every run a fresh sample from the near-optimal path distribution (the “Gumbel top-k” trick) — ask twice, get two different scenic routes.
- No pointless detours. A route should be an induced path: if an early stop links straight to a later one, the stop in between was padding. Such chords are pruned during the walk and rejected again at the finish — no more “label → album → band” when the label already knew the band.
- Hubs only through the front door. A mega-hub like the United States may be entered only by a relation that carries a story (named after, discovered by) — never by bureaucratic glue like country of citizenship.
- Meet-in-the-middle, then rerank. When the frontiers touch, joined paths are re-scored with a length-shaping term that prefers seven-ish-hop routes, and the finalists face one last judge — labels and claims fetched — that counts reader-visible defects (duplicate names, chords, edition items, unlabelled stops) and keeps the cleanest. This retrieve-then-rerank pass is why the route you read is legible even when the search that found it was messy. The dull direct shortcut, if one exists, is confessed to separately.
Endpoints themselves are resolved by notability, not keyword-guessing: among everything sharing a typed name, we pick the one described in the most Wikipedias — so “Cleopatra” is the queen, not the freshwater snail, and “The Beatles” is the band, not the 1968 double LP.
And the graph is yours to poke: click any dot on the map to inspect it, make it an endpoint, ban it, force the route through it, or wander its real connections one node at a time — a live knowledge-graph explorer. Every relation chip on a stop card links to the underlying statement on Wikidata. The narration is assembled relation-by-relation from direction-aware templates, and every fact in the chain is a real, sourced claim — the artwork is generated (decorative only); the connections are not.