Trust the AI? An exploration vs. interpolation trade-off

AI knowledge is jagged. An AI model can be brilliant on one question and confidently wrong on an almost identical one. Karpathy called this “jagged intelligence,” and Dell’Acqua et al. (2026) the “jagged frontier.” The reason is simple: a model reliably knows the answers it has actually seen, and for every other question it interpolates from those, guessing by filling in between what it knows. Near its knowledge it is accurate; far from it, in the gaps, its answer is a confident-looking guess that can be badly off. The map below is a conceptual model of this, and of how a user searches such a model for correct answers.

What the map shows

x‑axis: the question, x ∈ [0, 100]. A one‑dimensional map of every question you could ask (numbered 1–100). Questions that sit close together are similar, so their answers are related.
y‑axis: the answer, y(x) ∈ [0, 1]. The correct answer to that question.
Red curve: the truth. The correct answer to every question. It is rough (neighbouring questions differ a little) and is hidden from you while you play.
Blue dots: what the AI actually knows. The questions it has learned exactly: its training data, or “knowledge points.”
Blue line: the AI’s answer everywhere. It simply connects its dots and reports that value. Notice it looks equally confident across the whole space, yet it only hugs the truth near its dots and drifts away in the wide gaps.

AI’s known points & interpolated answer the true answer (hidden from you)

x = the space of questions (similar questions sit near each other) · y = the correct answer (0–1)

Your task

You are a user querying this AI for correct answers. You face 15 questions. They can land anywhere on the map, so you will often fall in the AI’s blind gaps. Each time you decide whether the AI’s confident answer can be trusted here, or whether to pay to check it. That is the trade-off between interpolating (relying on what the model already “knows”) and exploring (spending to find out).

How points are scored

Each question you pick one of two actions:

Trust the AI (free): you submit the AI’s answer as your own and are paid for accuracy. You get +100 if it is exactly right, and lose 2 points for every 0.01 the AI is off. (Formula: 100 − 200×error, where error is the distance between the AI’s answer and the truth on the 0–1 scale. It can go negative in a bad gap, down to −40.)
Verify (−20): you pay 20 to look up the true answer and submit that, so you are right, a guaranteed +80.

Because verifying locks in +80, trusting is the better bet only when you expect the AI to be off by less than 0.10 (that is where 100 − 200×error = 80). In-game, a green “trust zone” of ±0.10 is drawn around the AI’s answer: if the hidden truth turns out to lie inside it, trusting won; outside it, you should have verified.

Example. The AI answers 0.60. If the truth is 0.55 (off by 0.05) → Trust scores 100 − 200×0.05 = +90, which beats +80, so trust. If the truth is 0.80 (off by 0.20) → Trust scores 100 − 200×0.20 = +60, below +80, so you should have verified.

Experimental conditions

These three switches are the levers of the model, and together they frame a defining question of the AI era: what turns raw model capability into value people can actually use? A model can score well on average yet stay jagged and opaque, so whether it truly helps depends on three things: how much it has learned (coverage, i.e. scaling), whether you can tell where it is reliable (visibility, i.e. calibration and discoverability), and whether using it makes it better (shared learning, i.e. a data flywheel). Flip the switches below to reshape your playing environment. The plot updates to show what you would face, so you can study when scaling, transparency, or user-generated data is what closes the gap between a high benchmark score and real usefulness on the task in front of you.

How much has the AI learned?

More data = denser dots = safer to trust (the effect of AI “scaling”).

What can you see about reliability?

Blind = only the AI’s answer, no map (a truly blind user). Band = you see the AI’s known points, the line it interpolates, and a shaded band for where it is unsure (a calibrated user). The band is roughly a 95% range, so the true answer can occasionally fall outside it.

Shared learning: your checks train the AI

When on, every knowledge point you verify is added to the AI’s model: its line bends to pass through that point, and its blind gaps shrink where you searched. This is a data flywheel, where user data (not just bigger models) improves the AI. The improvement is endogenous, and targeted at the spots users probe. It lets you study whether verifying becomes an investment that pays off on later nearby questions rather than a pure cost, whether usage fills the widest gaps first and reduces jaggedness faster than uniform scaling, and the externality that your checks improve the model for everyone (a public-good problem in a multi-user version).

Your playing environment under these conditions

Correct answer:

+100

Verify costs:

−20

Your score:

0

Question:

1 of 15

AI knows AI’s answer trust zone (±0.10) true answer (revealed)

This question:

—

The AI’s answer here:

—

Do you trust it?

Round complete

Your score: 0 out of a possible 0.

How you compared

What the AI actually didn’t know

Blue = the AI’s answer everywhere. Red = the true curve. The AI is accurate at its dots and drifts away from the truth in the gaps, while looking just as confident.