Lessons from the Frontier^AI

Daniel Rose

You are a human.

You have experience
You have good ideas*

*or at least can sometimes

LLMs are interns.

eagerly follows instructions + repeats at scale
will push ahead, even on a bad idea
has alien brain*

*occasionally brilliant

Curate your context

An average LLM context holds 200,000 tokens.

That means ~250 tokens for each of your 800 endpoints.

If each tool description was about as long as the text on this slide, you've used 1/4 of memory...

just by saying hello.

— Jeremiah Lowin

Curate your context

It's not about less or more.

It's about the right context, carefully chosen.

"We expanded prompts from ~100 words to 2,000+ — but every word was curated.
Explicit column definitions. Precise output formats. Concrete decision rules."
— Inngest, "How We Built Insights AI"

Curate your context

Think in checkpoints, not threads

You don't re-read chapters 1 to 9 before starting chapter 10.

/compact and /clear are your friends.

Summaries and decisions > raw conversation history

Curate your context

Training data has a cutoff

Models don't know about your latest library version
or last week's blog post.

The model will confidently use a deprecated API
if it doesn't know the new one exists.

Tell it to web search.

Curate your tools

Tools ≠ utils

We abstract code into small, composable functions.
That's good engineering. It's bad tool design.

How we write code

getColumns()
filterBy(field, op, value)
sortBy(field, direction)
setVisibility(field, bool)
applyChanges()

How agents need tools

buildView(intent)

Curate your tools

Code

Deterministic
- same input => output
Extremely fast
- Bagillion ops/second
Exact transformations
- x + y = z

LLMs

Probabilistic
- same input => output ..?
Slow
- Couple ops/second
Fuzzy transformations
- "make x more like y"

"Agents find needles in haystacks — by examining every piece of hay."
— Jeremiah Lowin

Curate your tools

Outcomes, not operations

✗ The Trap

get_columns(), set_filter(),
set_sort(), apply_changes()

Agent-as-glue
Fragile multi-step chains
Every step can fail

✓ The Fix

build_view(intent)

One tool = one agent story
Server handles composition
Name tools for agents (verb + intent)

Curate your tools

Write agent stories

User stories, but the persona is the agent.

User story

As a nursery worker,
I want to filter inventory by status,
so I can see what's ready to ship.

Agent story

As an AI assistant,
I want to build a complete view from intent,
so I can configure the grid in one call.

Curate your output

AI output doesn't need to match what the product needs.

Design a translation layer. Include unit tests.

AI shape

→

code bridge

→

product shape

Curate your output

The problem: MUI needs an exact shape

// MUI data grid needs BOTH of these, precisely correct:

columnVisibilityModel: {
  name: true, sku: true, status: true,
  qty: true, internalNotes: false, ...
}

orderedFields: [
  "__check__",        // ← system column (checkbox)
  "name", "sku",      // ← user columns in exact order
  "status", "qty",
  "actions"           // ← system column (buttons)
]

AI was unreliable at producing this — forgot system columns,
wrong ordering, flipped visibility booleans.

Curate your output

The fix: let each do what it's good at

AI returns categories

// Semantic — natural for an LLM
{
  identifyingColumns: ["name", "sku"],
  relevantColumns:    ["status", "qty"],
  insightColumns:     ["updatedAt"],
  hideColumns:        ["internalNotes"]
}

Code translates

// Deterministic — unit tested
mergeAIColumnVisibility(categories)
  → { name: true, sku: true, ... }

anchorSystemColumns(aiOrder)
  → [__check__, ...ai..., actions]

Case study: AI Data Grid Assistant

Natural language filtering for non-technical nursery staff

Case study

Ideas were human-led

The feature itself
- Relatively simple to build, plenty of sizzle
- Foundation for future AI features
The output shape
- Categories, not booleans — after watching AI fail at exact shapes
Edge cases caught by human testing
- Location tree hierarchies, column visibility merge bugs

Case study

Implementation was AI-led

Plan mode — 15+ times
- Human sets direction, AI proposes implementation
Fed it Vercel's useChat examples
- Training data cutoff — needed current docs and patterns
Fantastic at bulk repetitive work
- Adding isSearchable to 50+ column definitions across 12 grids
Translation layer: pure functions, unit tested
- mergeAIColumnVisibility() and anchorSystemColumns()

Curate ruthlessly

Curate your tools — 5–15, not 800

Curate your context — focused > bloated

Curate your workflow — checkpoints > threads

Curate your output — translation layers > direct mapping

"The most important word for MCP developers is curate." — Jeremiah Lowin

Summary

Humans have ideas and experience
LLMs are interns — eager, tireless, occasionally brilliant
right context > more (or less) context — web search
tools ≠ utils — design for outcomes with agent stories
AI output ≠ product shape — build a bridge with unit tests
Curate. Be bold

References: Jeremiah Lowin — "Your MCP Server is Bad" · Inngest — "How We Built Insights AI"

Lessons from the FrontierAI

Daniel Rose

You are a human.

LLMs are interns.

Curate your context

Think in checkpoints, not threads

Training data has a cutoff

Curate your tools

Tools ≠ utils

Code

LLMs

Outcomes, not operations

Write agent stories

Curate your output

The problem: MUI needs an exact shape

The fix: let each do what it's good at

Case study: AI Data Grid Assistant

Ideas were human-led

Implementation was AI-led

Curate ruthlessly

Summary

Lessons from the Frontier^AI