An Empty AI Response Corrupted Chat History

The spinner ran. The stream closed. The chat bubble stayed empty. No error anywhere.

I was building a conversational discovery tool for founders — a multi-step Gemini-powered flow that walked people through product decisions, collected answers, and built a structured brief. Complex setup: long system prompt, tool definitions, large user messages. Genkit's generateStream handling each turn.

Intermittently, a user would send a message and get nothing back. No timeout, no catch block firing, no non-2xx status. Just a clean stream completion with zero content inside.

What the Logs Said When I Added Them

Standard error handling gives you no signal here:

try {
  const { stream, response } = await ai.generateStream({ ... });
  for await (const chunk of stream) {
    // exits immediately — no chunks arrive
  }
  // response.text() returns ''
  // no exception thrown
} catch (err) {
  // never reached
}

Adding chunk-level logging made it visible. The stream was completing, but the one chunk that arrived looked like this:

Chunk #1 has no content.
Keys: [ 'index', 'role', 'content', 'custom', 'previousChunks', 'parser' ]
role: model
content.length: 0

The content property existed. It wasn't null. It was an empty array. The keys custom, previousChunks, and parser are Genkit's internal markers for a thinking chunk. The model had spent the entire response budget on internal reasoning and had nothing left to output. HTTP 200. Genkit reported success.

Two Ways to Get Nothing

Gemini 2.5 Flash ships with thinking mode enabled by default. Under normal inputs that's fine. Under heavy inputs — long system prompt plus tool definitions plus a long user message — it can exhaust the entire token budget on reasoning before producing a single output token.

There's a second cause that produces the same result: silent rate limiting. Rather than returning a 4xx, Gemini returns a valid, complete, empty stream. The observable symptom is identical. The detection is identical: assert that at least one content chunk arrived after the stream closes.

For the thinking mode case, the fix is one line in the Genkit config:

const { stream, response } = ai.generateStream({
  model: MODEL,
  system: systemPrompt,
  messages,
  tools,
  config: {
    thinkingConfig: { thinkingBudget: 0 },
  },
});

thinkingBudget: 0 disables extended thinking. For a conversational flow where latency matters more than deep reasoning, there's no reason to let the model spend the budget on internal traces.

Fix deployed. I moved on.

The Save That Made It Permanent

What I hadn't checked: the database. Every one of those empty responses had already been saved to Firestore. An empty string is a valid string. The save ran. Nothing flagged it.

The stream handler read finalResult.text after generateStream resolved and wrote it as the AI's message. When thinking mode ate the budget, finalResult.text was "". Firestore now held a record of every affected conversation — each one storing a legitimate-looking AI turn with no content.

History as Poison

When those users came back and sent new messages, getChatHistory pulled their messages from Firestore and formatted them for Gemini:

return messages.map((msg) => ({
  role: msg.role === "ai" ? "model" : "user",
  content: [{ text: msg.content }],
}));

When msg.content is "", that produces { role: "model", content: [{ text: "" }] }. A valid-looking empty model turn in the middle of a real conversation. Gemini received it, interpreted it as unfinished context, entered thinking mode to reason about it, exhausted the budget, returned nothing — which got saved as another empty message, which poisoned the next turn.

The conversation was permanently, silently broken. No exception at any layer. No signal the user could act on. Just a chat that would never respond again.

The Fix That Requires Two Places

Fixing only the stream detection isn't enough — the database is already corrupted. Fixing only the history filter isn't enough — new empty responses can still arrive and be saved. Both defenses are required.

Never write an empty AI message:

const finalText = accumulatedText || finalResult.text || "";
if (finalText) {
  await saveAIMessage(chatId, finalText);
} else {
  console.warn("[StreamHandler] Skipping empty AI message save");
}

And filter empty turns before sending history to the model:

return messages
  .filter((msg) => msg.content)
  .map((msg) => ({
    role: msg.role === "ai" ? "model" : "user",
    content: [{ text: msg.content }],
  }));

Miss either one and the loop can restart. The stream guard stops new corruption. The history filter handles the records already in the database.

The Retry That Made It Worse

The first instinct after detecting an empty stream was to retry. The naive retry called the same send function — which re-inserted the user's message into the messages array. The model received the question twice. On an already-stressed conversation with heavy context, this accelerated the problem rather than resolving it.

The fix is an isRetry flag that skips message insertion on retry calls:

async function streamMessage(content, sessionId, token, { isRetry = false } = {}) {
  if (!isRetry) {
    setChatMessages(prev => [
      ...prev,
      { id: userMsgId, role: 'user', content },
      { id: aiMsgId,   role: 'assistant', content: '' },
    ]);
  } else {
    setChatMessages(prev => [
      ...prev.filter(m => m.id !== aiMsgId),
      { id: aiMsgId, role: 'assistant', content: '' },
    ]);
  }

  await streamAIResponse(sessionId, token);
}

The user message stays in history exactly once. Without this, retry logic breaks an already-broken conversation faster.

Why Every Layer Said "Success"

What made this hard to debug: every layer reported success. HTTP 200, no caught exceptions, valid Firestore writes, clean history formatting. The failure was in the semantics, not the mechanics. An empty model turn is not a successful model turn — and asserting that distinction at each boundary is the only thing that stops the loop.

If your AI feature has classes of failure like this hiding in it — silent errors, semantic bugs, "everything looks fine" regressions — that's the kind of thing a code audit is for.

Jurij Tokarski