The Production Bugs That Never Threw an Error

Six production failures. Every log said success. The API returned 200. The job exited clean. Each one cost real time — not because the bug was hard to find once I knew where to look, but because the system accepted the input, confirmed receipt, and executed something different from what I intended. No exception. No warning. Just a quietly wrong outcome at the other end.

The OAuth Token That Baked In the Past

A content automation script returned a 403 from the Twitter v2 API. The message: Your client app is not configured with the appropriate oauth1 app permissions for this endpoint.

I'd already upgraded the app from Read to Read+Write in the Developer Portal. The settings showed the correct value. The error said otherwise.

OAuth 1.0a tokens carry the permission scope active at the moment they were generated. Changing the app's permissions afterward does nothing to existing tokens — they permanently hold the scope they were issued with. The Developer Portal shows you a clean green state with no indication that your tokens are now stale relative to your updated settings. The 403 message says "app configuration," which points you at the thing you already fixed.

Lesson: After any permission change on Twitter, regenerate the Access Token and Access Token Secret immediately. Don't test anything first.

BTW: Since X moved to pay-per-use pricing in early 2026, the same 403 with the same "oauth1 app permissions" message can also mean your account has no active billing. The Free tier officially supports POST /2/tweets, but developers report it returning 403s intermittently with no configuration changes on their end (1, 2, 3). If you've regenerated tokens and the error persists, check whether your account needs a paid plan or a billing top-up before debugging further.

The Cache That Survived the Uninstall

Removed @sentry/nextjs from a project. Pulled it from package.json, ran install, cleaned up the config. Next dev run threw this:

Error: Cannot find module '@sentry/nextjs'
Require stack:
- .next/server/instrumentation.js

The package wasn't in node_modules. Wasn't in package.json. Nowhere. But Next.js kept looking for it.

Inside .next/server/ was a compiled instrumentation.js from a previous build — one Sentry had hooked into during installation. The incremental build never touched that file because I hadn't changed the instrumentation source, only the package. It just sat there, referencing something that no longer existed.

rm -rf .next

Then yarn dev. No errors. Thirty seconds of actual work after ten minutes of confusion.

Lesson: Removing a plugin that hooks into Next.js instrumentation means deleting .next as part of the removal. Not after the next error — as part of the removal.

The Job That Reported Success While Running Nothing

A scheduled launchd job on macOS. The plist was configured, the wrapper script pointed at the right Node script, everything looked right. launchd reported completed successfully on every run. Nothing was being posted.

I added logging, ran it manually. The log showed the Node process starting, then silence. Ran it directly from the terminal — it worked fine.

With verbose output piped to a log file, the job finally showed something:

Error: spawn claude ENOENT

The claude binary lives at ~/.local/bin/claude. My terminal knows that because my shell config adds that path. launchd doesn't. It starts processes with a stripped-down environment — no user shell, no ~/.local/bin, nothing accumulated over years of machine setup. The Node script was swallowing the subprocess error and exiting 0 regardless. launchd saw a clean exit and called it a success.

Fix: One line in the wrapper script:

export PATH="/Users/jurijtokarski/.local/bin:/opt/homebrew/bin:$PATH"

Lesson: Use absolute paths in launchd plists. Test jobs with the same stripped environment launchd uses — not from your terminal.

The Node That Activated Fine, Then Didn't

Added an IF node to an n8n workflow to branch between two processing paths. Saved cleanly. Validated cleanly. The editor showed no warnings. Activated the workflow and got:

Cannot read properties of undefined (reading 'execute')

No node name. No stack trace. Nothing pointing anywhere useful.

I checked the Code nodes. Checked the Merge node. Checked the connections. The IF node wasn't even on my radar — it had saved without complaint. Eventually I pulled the raw workflow JSON:

{
  "type": "n8n-nodes-base.if",
  "typeVersion": 2.3,
  "parameters": { ... }
}

The n8n instance didn't have typeVersion: 2.3 of the IF node. The editor accepted it — it doesn't validate typeVersion against what's installed on the runtime. The execution engine hit an undefined handler and threw.

Downgrading to 2.2 fixed it immediately.

Lesson: The n8n editor and the n8n runtime have different views of what's valid. When an activation error is opaque and traceless, check typeVersion before anything else.

The Stream That Delivered No Audio

Building a screen and audio capture feature. Every API call succeeded. Production recordings came back with only the microphone — no shared app audio. No error in the console. getDisplayMedia had resolved cleanly, the stream object was there, the video track was present.

I spent a while assuming the AudioContext mixing was wrong before checking something obvious:

const stream = await navigator.mediaDevices.getDisplayMedia({
  video: true,
  audio: true
});

console.log(stream.getAudioTracks().length); // 0

Zero audio tracks. The user had gone through the picker and selected a tab without checking the "Share audio" checkbox. The browser doesn't reject the promise in that case. No warning, no error, no indication the audio side of the request was skipped. The spec gives you an empty array and moves on.

Lesson: Check audioTracks.length immediately after resolution. If it's zero, surface an explicit re-prompt before proceeding. A resolved getDisplayMedia call is not a guarantee that you got what you asked for.

The Sub-Agent Searching the Wrong Store

A multi-step analysis pipeline: an orchestrator that reads documents, spawns specialist agents to evaluate content, streams structured results back to the UI. The orchestrator chains turns via previous_response_id. Sub-agents were supposed to be isolated, stateless calls.

At one step, agent responses were coherent but consistently wrong. Clean outputs, plausible reasoning, wrong knowledge base.

What previous_response_id carries isn't just conversation history — it inherits the full tool configuration of the parent response, including attached file_search vector stores. The orchestrator had a tender documents store bound to it. Every chained orchestrator call accumulated that binding. When the orchestrator's final response ID was passed to a specialist agent — one explicitly configured with a completely different store — the API silently merged the orchestrator's tool configuration in. The agent queried the wrong store. No error. No warning.

Fix: Agent calls are stateless. They have no legitimate reason to continue a conversation chain.

// Before
const resp = await this.model.responses.create({
  model: DEPLOYMENT_NAME,
  instructions,
  input,
  tools,
  previous_response_id: previousResponseId,
  stream: false,
});

// After — agents never inherit the orchestrator's chain
const resp = await this.model.responses.create({
  model: DEPLOYMENT_NAME,
  instructions,
  input,
  tools,
  stream: false,
});

Lesson: Any sub-agent that needs isolated tools must be a fresh, stateless request with no chain ID. Explicitly configuring different tools does not override what the chain carries in.

What These Six Have in Common

None of them failed at the point of input. The token was accepted. The build succeeded. The job exited. The editor saved the node. The stream resolved. The agent returned a clean response. Every failure happened at the output — in the actual result, not the API boundary.

The gap between "I accepted your input" and "the right thing occurred" is where these live. The fix isn't adding more logging to the call sites. It's verifying at the output layer: check audioTracks.length after resolution, not before. Pull the raw JSON of a node that failed at activation. Log err.data on a 403, not just err.message. Check what the agent actually searched, not what you told it to search.

Success at the API boundary tells you the system is running. It tells you nothing about what the system is doing.

If your codebase is shipping bugs that pass every check on the way out — silent failures, semantic mismatches, "it worked yesterday" regressions — that's the kind of thing a code audit is for.

Jurij Tokarski