Lessons from the FrontierAI

Daniel Rose

You are a human.

  1. You have experience
  2. You have good ideas*

*or at least can sometimes

LLMs are interns.

  1. eagerly follows instructions + repeats at scale
  2. will push ahead, even on a bad idea
  3. has alien brain*

*occasionally brilliant

Curate your context

Curate your context

An average LLM context holds 200,000 tokens.

That means ~250 tokens for each of your 800 endpoints.

If each tool description was about as long as the text on this slide, you've used 1/4 of memory...

just by saying hello.

— Jeremiah Lowin
Curate your context

It's not about less or more.

It's about the right context, carefully chosen.

"We expanded prompts from ~100 words to 2,000+ — but every word was curated.
Explicit column definitions. Precise output formats. Concrete decision rules."
Curate your context

Think in checkpoints, not threads

You don't re-read chapters 1 to 9 before starting chapter 10.

/compact and /clear are your friends.

Summaries and decisions  >  raw conversation history

Curate your context

Training data has a cutoff

Models don't know about your latest library version
or last week's blog post.

The model will confidently use a deprecated API
if it doesn't know the new one exists.

Tell it to web search.

Curate your tools

Curate your tools

Tools utils

We abstract code into small, composable functions.
That's good engineering. It's bad tool design.

How we write code

getColumns()
filterBy(field, op, value)
sortBy(field, direction)
setVisibility(field, bool)
applyChanges()

How agents need tools

buildView(intent)

Curate your tools

Code

  1. Deterministic
    • same input => output
  2. Extremely fast
    • Bagillion ops/second
  3. Exact transformations
    • x + y = z

LLMs

  1. Probabilistic
    • same input => output ..?
  2. Slow
    • Couple ops/second
  3. Fuzzy transformations
    • "make x more like y"
"Agents find needles in haystacks — by examining every piece of hay."
— Jeremiah Lowin
Curate your tools

Outcomes, not operations

The Trap

get_columns(), set_filter(),
set_sort(), apply_changes()

  • Agent-as-glue
  • Fragile multi-step chains
  • Every step can fail

The Fix

build_view(intent)

  • One tool = one agent story
  • Server handles composition
  • Name tools for agents (verb + intent)
Curate your tools

Write agent stories

User stories, but the persona is the agent.

User story

As a nursery worker,
I want to filter inventory by status,
so I can see what's ready to ship.

Agent story

As an AI assistant,
I want to build a complete view from intent,
so I can configure the grid in one call.

Curate your output

Curate your output

AI output doesn't need to match what the product needs.

Design a translation layer. Include unit tests.

AI shape
code bridge
product shape
Curate your output

The problem: MUI needs an exact shape

// MUI data grid needs BOTH of these, precisely correct:

columnVisibilityModel: {
  name: true, sku: true, status: true,
  qty: true, internalNotes: false, ...
}

orderedFields: [
  "__check__",        // ← system column (checkbox)
  "name", "sku",      // ← user columns in exact order
  "status", "qty",
  "actions"           // ← system column (buttons)
]

AI was unreliable at producing this — forgot system columns,
wrong ordering, flipped visibility booleans.

Curate your output

The fix: let each do what it's good at

AI returns categories

// Semantic — natural for an LLM
{
  identifyingColumns: ["name", "sku"],
  relevantColumns:    ["status", "qty"],
  insightColumns:     ["updatedAt"],
  hideColumns:        ["internalNotes"]
}

Code translates

// Deterministic — unit tested
mergeAIColumnVisibility(categories)
  → { name: true, sku: true, ... }

anchorSystemColumns(aiOrder)
  → [__check__, ...ai..., actions]

Case study: AI Data Grid Assistant

Natural language filtering for non-technical nursery staff

Case study

Ideas were human-led

  1. The feature itself
    • Relatively simple to build, plenty of sizzle
    • Foundation for future AI features
  2. The output shape
    • Categories, not booleans — after watching AI fail at exact shapes
  3. Edge cases caught by human testing
    • Location tree hierarchies, column visibility merge bugs
Case study

Implementation was AI-led

  1. Plan mode — 15+ times
    • Human sets direction, AI proposes implementation
  2. Fed it Vercel's useChat examples
    • Training data cutoff — needed current docs and patterns
  3. Fantastic at bulk repetitive work
    • Adding isSearchable to 50+ column definitions across 12 grids
  4. Translation layer: pure functions, unit tested
    • mergeAIColumnVisibility() and anchorSystemColumns()

Curate ruthlessly

Curate your tools  — 5–15, not 800

Curate your context  — focused > bloated

Curate your workflow  — checkpoints > threads

Curate your output  — translation layers > direct mapping

"The most important word for MCP developers is curate." — Jeremiah Lowin

Summary

  1. Humans have ideas and experience
  2. LLMs are interns — eager, tireless, occasionally brilliant
  3. right context > more (or less) context — web search
  4. tools ≠ utils — design for outcomes with agent stories
  5. AI output ≠ product shape — build a bridge with unit tests
  6. Curate. Be bold
References: Jeremiah Lowin — "Your MCP Server is Bad" · Inngest — "How We Built Insights AI"