200 OK, Data Wrong

The Production Bugs That Never Threw an Error was about systems that reported success while running the wrong thing — stale tokens, cached artifacts, stripped paths. These five are different. Every one is an API that accepted valid input, returned a clean 200, and delivered the wrong output. The call worked. The result didn't.

The Image That Wasn't What I Asked For

Imagen has a prompt rewriter enabled by default — an LLM that rewrites your prompt before generation to "add more detail and deliver higher quality images." The rewritten version is only returned in the API response if your original prompt is under 30 words. Above that threshold, you get an image generated from a prompt you never see.

The image I got back was valid, well-composed, and completely wrong. The main subject was replaced by something adjacent. The response was 200. No flag, no warning, no indication that the input was rewritten. I assumed the safety filter had intervened — but the documentation says safety filters either block with an error or omit images entirely. They don't silently substitute. The prompt rewriter does.

Setting enhancePrompt: false in the request disables it. After that, the images matched the prompt.

The File Search That Found Nothing

For retrieval I was uploading source documents via the Files API with file search enabled. One batch worked correctly. Another batch would upload without error but return no results in search.

The difference was the filename. The batch that failed was uploaded with a generic name — something like upload_1 — with no extension. File search uses the filename to infer content type before indexing. A file without a recognized extension gets indexed as an unknown type, and the error is generic enough that it looks like a search quality issue rather than an upload problem.

Adding .pdf, .txt, or the correct extension to every filename at upload time fixed retrieval immediately.

The Transcription That Came Back as Garbage

A voice dictation feature recorded audio in the browser and sent the blob to a Lambda Function URL behind CloudFront. The Lambda passed it to Whisper. The transcription came back — but it was nonsense. No error, no rejection, just wrong text.

Lambda Function URLs base64-encode binary request bodies at the HTTP interface layer. The event includes an isBase64Encoded flag, but if you treat the body as raw bytes in all cases, the buffer is silently corrupted. Whisper doesn't throw on bad audio — it produces garbage.

const audioBuffer = event.isBase64Encoded
  ? Buffer.from(event.body, 'base64')
  : Buffer.from(event.body);

Any Lambda that accepts binary payloads — audio, images, PDFs — needs to check that flag before consuming the body. The cost of missing it is not an error. It's wrong output that looks like a model quality issue.

The Search Console That Had No Traffic

I wired up Google Search Console data fetching for a site with real traffic — I could see it in the GSC web UI. The API call went through, no errors, no 403. It returned zero rows.

The site was registered as a domain property. Domain properties require sc-domain:example.com as the siteUrl, not https://example.com. The API doesn't say "wrong format" or "property not found." It returns empty data as if the site has zero search traffic.

// Returns empty data, no error
const res = await webmasters.searchanalytics.query({
  siteUrl: 'https://example.com',
  requestBody: { startDate, endDate, dimensions: ['query', 'page'] },
});

// Returns actual data
const res = await webmasters.searchanalytics.query({
  siteUrl: 'sc-domain:example.com',
  requestBody: { startDate, endDate, dimensions: ['query', 'page'] },
});

Calling sites.list() shows the exact format the API expects. I spent time checking date ranges and service account permissions before running that call and seeing sc-domain: staring back at me.

Silent Truncation in Structured Outputs

With json_schema and strict: true, OpenAI guarantees valid JSON — except when the response hits max_output_tokens. When that happens, the stream ends with truncated JSON and response.status set to 'incomplete'. This is not surfaced as an error. response.completed still fires normally.

if (event.type === 'response.completed' && 'response' in event) {
  if (event.response.status === 'incomplete') {
    const reason = event.response.incomplete_details?.reason || 'unknown';
    log.error('Response truncated', { reason });
    await params.onChunk('internal.error', 'The AI response was too long and got cut off.');
    await params.onChunk('internal.finished', '');
    return;
  }
  captureUsageStats(event.response.usage);
}

This one had a bonus bug that made it harder to find. Before OpenAI supported structured output streaming, I used XML-like tags in the prompt to get parseable responses — <next_action>, <message>, that kind of thing. When structured outputs shipped, I switched to json_schema but left the XML parser in the catch branch:

try {
  return JSON.parse(input);
} catch (jsonError) {
  return parseXMLResponse(input); // left in "just in case"
}

When a truncated JSON response hit this code, JSON.parse failed, the catch branch fired, and the XML parser found no tags. It returned nextAction: null with the entire raw JSON string stuffed into the message field. The failure surfaced as a null-check bug three layers downstream — not as a parser problem. Dead code from a previous architecture, silently eating every truncation error.

What These Five Have in Common

Every failure surfaced downstream as something that didn't look like an API problem. Corrupted audio looked like a model quality issue. Empty GSC results looked like a permissions problem. Truncation looked like a null-check bug three layers away. The API boundary said success, and the real problem hid behind that signal.

The only reliable defense is asserting on the output, not the status code — the kind of thing a code audit catches systematically. Check that the image matches the prompt. Check that the buffer is actually binary. Check that the response has rows. Check that the JSON is complete. If you only verify that the call succeeded, you'll find the failure when your users do.

Jurij Tokarski