Trust the AI? An exploration vs. interpolation trade-off
AI knowledge is jagged. An AI model can be brilliant on one question and confidently wrong on an almost
identical one. Karpathy
called this “jagged intelligence,” and Dell’Acqua et al. (2026) the
“jagged frontier.” The reason is simple: a model reliably knows the answers it has actually seen, and
for every other question it interpolates from those, guessing by filling in between what it knows.
Near its knowledge it is accurate; far from it, in the gaps, its answer is a confident-looking guess that can be
badly off. The map below is a conceptual model of this, and of how a user searches such a
model for correct answers.
What the map shows
x‑axis: the question, x ∈ [0, 100]. A one‑dimensional map of every question you could ask
(numbered 1–100). Questions that sit close together are similar, so their answers are related.
y‑axis: the answer, y(x) ∈ [0, 1]. The correct answer to that question.
Red curve: the truth. The correct
answer to every question. It is rough (neighbouring questions differ a little) and is hidden from you
while you play.
Blue dots: what the AI actually knows.
The questions it has learned exactly: its training data, or “knowledge points.”
Blue line: the AI’s answer everywhere. It simply connects its dots and reports that
value. Notice it looks equally confident across the whole space, yet it only hugs the truth near its dots and
drifts away in the wide gaps.
AI’s known points & interpolated answer the true answer (hidden from you)
x = the space of questions (similar questions sit near each other) · y = the correct answer (0–1)
Your task
You are a user querying this AI for correct answers. You face 15 questions. They can land anywhere on
the map, so you will often fall in the AI’s blind gaps. Each time you decide whether the AI’s
confident answer can be trusted here, or whether to pay to check it. That is the trade-off between
interpolating (relying on what the model already “knows”) and exploring (spending to
find out).
How points are scored
Each question you pick one of two actions:
Trust the AI (free): you submit the AI’s answer as your own and are paid for accuracy.
You get +100 if it is exactly right, and lose 2 points for every 0.01 the AI is off.
(Formula: 100 − 200×error, where error is the distance between the AI’s answer
and the truth on the 0–1 scale. It can go negative in a bad gap, down to −40.)
Verify (−20): you pay 20 to look up the true answer and submit that, so you are right, a guaranteed +80.
Because verifying locks in +80, trusting is the better bet only when you expect the AI to be off by less
than 0.10 (that is where 100 − 200×error = 80). In-game, a green “trust zone”
of ±0.10 is drawn around the AI’s answer: if the hidden truth turns out to lie inside it, trusting
won; outside it, you should have verified.
Example. The AI answers 0.60.
If the truth is 0.55 (off by 0.05) → Trust scores 100 − 200×0.05 = +90, which beats +80, so trust.
If the truth is 0.80 (off by 0.20) → Trust scores 100 − 200×0.20 = +60, below +80, so you should have verified.
Experimental conditions
These three switches are the levers of the model, and together they frame a defining question of the AI era:
what turns raw model capability into value people can actually use? A model can score well on average yet stay
jagged and opaque, so whether it truly helps depends on three things: how much it has learned
(coverage, i.e. scaling), whether you can tell where it is reliable (visibility, i.e. calibration
and discoverability), and whether using it makes it better (shared learning, i.e. a data flywheel). Flip
the switches below to reshape your playing environment. The plot updates to show what you would face, so you can
study when scaling, transparency, or user-generated data is what closes the gap between a high benchmark score
and real usefulness on the task in front of you.
How much has the AI learned?
More data = denser dots = safer to trust (the effect of AI “scaling”).
What can you see about reliability?
Blind = only the AI’s answer, no map (a truly blind user). Band = you see the AI’s known points, the line it interpolates, and a shaded band for where it is unsure (a calibrated user). The band is roughly a 95% range, so the true answer can occasionally fall outside it.
Shared learning: your checks train the AI
When on, every knowledge point you verify is added to the AI’s model: its line bends to pass through that point, and its blind gaps shrink where you searched. This is a data flywheel, where user data (not just bigger models) improves the AI. The improvement is endogenous, and targeted at the spots users probe. It lets you study whether verifying becomes an investment that pays off on later nearby questions rather than a pure cost, whether usage fills the widest gaps first and reduces jaggedness faster than uniform scaling, and the externality that your checks improve the model for everyone (a public-good problem in a multi-user version).
Your playing environment under these conditions
Correct answer:
+100
Verify costs:
−20
Your score:
0
Question:
1 of 15
AI knows AI’s answer AI’s uncertainty trust zone (±0.10) true answer (revealed)
This question:
—
The AI’s answer here:
—
Do you trust it?
Round complete
Your score: 0 out of a possible 0.
How you compared
What the AI actually didn’t know
Blue = the AI’s answer everywhere. Red = the true curve. The AI is accurate at its
dots and drifts away from the truth in the gaps, while looking just as confident.
How it works
The AI has learned the true answer at the blue dots. Everywhere else it draws the straight line
between neighbouring dots (it interpolates) and reports that value, confidently.
Trust the AI (free): submit the AI’s answer. Score = 100 − 200×(how far the AI is off)
: +100 if spot-on, less as the error grows, and negative in a bad gap.
Verify (−20): pay to reveal the true answer and submit it, for a guaranteed +80.
Setting the two equal, trusting beats verifying exactly when the AI’s error is below 0.10. The green
“trust zone” drawn around the AI’s answer marks that ±0.10 window: if the hidden truth lands
inside it, trusting was the better call; if outside, you should have verified. You must judge that before
seeing the truth, from how close the question sits to the AI’s known dots. This is the tension between
interpolation and exploration.