Skip to content

2026-05-08 · 6 min read

From URL to FAQ in 30 seconds

How the intake demo works: scrape, ground, generate. And what we learned shipping it as a marketing surface.

The intake demo on the homepage does one thing: you paste a URL, it scrapes the page, generates a FAQ grounded in the page content, and shows you the result. No sign-up. Median time-to-result on a real-world test against stripe.com: 25 seconds.

It’s the cold-start moment for thefaqapp — most visitors see it before they read a single line of marketing copy. We thought hard about every second.

The pipeline

Three steps, two of which are network-bound.

1. Scrape the URL. A Worker-side fetch with our own user-agent, a 5MB cap, 10s timeout. We resolve redirects, strip script/style tags, and convert the remaining HTML to a clean text representation. ~2-4 seconds depending on the source site.

2. Ground the content. We pass the cleaned text plus the URL to an LLM with a tight system prompt: “extract up to 8 FAQ-shape questions and answers grounded in the source. Cite the section of the source for each. No hallucinated facts.” Returns structured JSON. ~15-20 seconds with Gemini 2.5 Flash.

3. Sanitize and render. Each answer goes through DOMPurify on the way out. We return them as a list with anchors back to the source page. The frontend renders them as cards.

The whole pipeline is one Worker request to /api/v1/intake with the URL as a query param. Streaming would be nicer; for v1, we return the result on completion.

The AI fallback chain

Production ships with Gemini only — it’s the fastest model at the latency/cost point we wanted, and the quality is good enough for the cold-start demo. Behind the scenes, the same lib/ai.ts client supports OpenAI and Anthropic as fallbacks, configured via env. If Gemini is rate-limited or returns malformed JSON, the next provider kicks in.

// apps/api/src/lib/ai.ts (simplified)
async function generateStructured(prompt: string, schema: ZodSchema) {
  const providers = [openai, anthropic, google].filter(p => p.apiKey);
  for (const p of providers) {
    try {
      return await p.generate(prompt, schema);
    } catch (err) {
      if (isRetryable(err)) continue;
      throw err;
    }
  }
  throw new AiUnavailableError("No AI provider available");
}

The fallback chain matters less for the marketing demo than for paid features like AI translate, where a single-provider outage would mean degraded service.

What we didn’t do

No retrieval-augmented anything. The page content is the context. There’s no vector store, no embedding step, no semantic retrieval. For a single-page scrape, the whole page fits in context; retrieval would add latency without quality.

No multi-turn refinement. The demo returns the result and stops. We considered “now ask follow-up questions” as a feature; we cut it because the demo’s job is to show the cold-start experience, not the full product.

No save-to-org for the demo. The demo result is ephemeral. To save it, you sign up and import. The friction is deliberate — we want the demo to feel like a try-before-you-buy, not the actual product.

What it cost

Gemini 2.5 Flash, at the input/output sizes we see, runs roughly $0.001 per intake request. We can do ~10,000 demo runs per month before we’d think about it. Rate limiting is per-IP via Turnstile + KV, capped at 10 intake calls per IP per day.

What we learned

Latency is the demo. Going from 60s to 25s by switching from “GPT-4 + careful prompt” to “Gemini Flash + tight prompt” mattered more than any single quality improvement. Cold-start visitors don’t wait.

Specifics in the result are the trust signal. Returning “How do I integrate Stripe?” → “Stripe provides several SDKs…” is generic-AI output. Returning “Which Stripe SDK should I pick?” → “If you’re on Node.js, use stripe-node; for browser-only checkout, use Stripe.js…” with anchors back to specific sections of stripe.com is the demo working. The system prompt enforces specificity hard.

Bot protection is required. Without Turnstile, the intake endpoint would be a free OCR-via-LLM service for anyone with a scraping use case. Turnstile catches headless browsers; a per-IP cap catches everyone else.

The intake demo is the highest-stakes 30 seconds in the marketing funnel. If it doesn’t feel magical, the rest of the marketing copy fights uphill. So far, on the cold-start traffic we’ve measured, it does the job.

Try the thing this post is about.

Free plan: read API, one production key, fifty questions. No card. Five minutes to first call.