Three Bugs That Were Actually My Prompts

Three debugging sessions. Three different features. Every investigation eventually landed in the same place: my own prompt files.

The AI wasn't broken. I was a contradictory author.

The STRICT Rule That Was Overriding Itself

I built a structured interview tool — the kind that walks a founder through their idea one question at a time. The system prompt had this near the top:

STRICT: Ask only ONE question per message. Never bundle questions.

Users kept getting messages like "How will you make money? What are the major costs to build and run this?" I read the prompt again. Rule was right there. Added emphasis. Still happened. Moved it higher. Still happened.

Then I read the interview flow section — the part describing what topics to cover across the session. Step 4 read:

**Revenue Streams + Cost Structure** — How will you make money?
What are the major costs to build and run this?

The model wasn't defying the STRICT rule. It was following the flow description, which listed two topics as a single step and framed them as two inline questions. That structure implicitly granted permission to bundle. The more specific instruction — a concrete flow item with actual question text — overrode the more abstract one.

The fix was two things. Unbundle every flow item into separate steps. And add a concrete bad example directly inside the STRICT rule — not just the prohibition:

STRICT: Ask only ONE question per message. Never bundle questions.
Example of what NOT to do: "How will you make money? What are your costs?"
is TWO questions — send one, wait for the answer, then ask the next.

Abstract rules lose to specific structural descriptions. The model resolves contradictions by specificity, not by which rule came first or which one you emphasized. If your flow section describes two questions in the same bullet, that description is an instruction — regardless of what you wrote elsewhere.

The Tool That Read the Prohibition

After a discovery session, users could request a full report by email. The tool was registered. The backend handler existed. Users clicked the button. The AI said it couldn't send emails.

I checked tool registration — correct. Checked the API call — correct. Checked the backend handler — correct. Everything looked wired up properly at every technical layer.

The issue was in a place I hadn't thought to look. I grepped the prompt files for "report":

grep -r "report" mod/discovery/steps/*/prompt.md

Every single step prompt had lines like:

Do NOT offer to send a report.
Do NOT mention sending a report.

I'd written those prohibitions months earlier during a different phase of the project. The tool didn't exist yet when I wrote them. By the time it did, I'd forgotten those lines were there.

The model wasn't defective. It was obedient to instructions I'd authored and then lost track of. Ten minutes of grepping would have found this immediately. Instead I spent days checking tool registration and API calls.

Before investigating code when an AI-powered feature does nothing, grep your prompt files for explicit prohibitions against the behavior you're expecting. Search for "do not" and "don't" across your entire prompt corpus against the relevant action. It takes ten seconds and it would have saved me days on this one.

The Precondition That Lived Only in Prose

After fixing the prohibitions, a new problem surfaced. The model was supposed to ask for the user's email before calling send_report. The prompt said:

ALWAYS ask for the user's email before calling send_report.
Never call send_report without confirmed contact details.

In testing, the tool got called with founder@example.com. A placeholder the model had generated rather than asking for a real address. The instruction was clear. The model treated it as a suggestion.

I made the prompt stronger. Same result — it would comply sometimes, skip the step other times, depending on how the conversation had flowed. Prompt-only enforcement of a precondition is probabilistic.

The fix was to move validation into the tool handler itself:

const PLACEHOLDER_DOMAINS = ['example.com', 'test.com', 'placeholder.com'];

function validateEmail(email) {
  const domain = email.split('@')[1]?.toLowerCase();
  if (!email || !domain) {
    return {
      error: 'No email provided. Ask the user for their email address before calling this tool.'
    };
  }
  if (PLACEHOLDER_DOMAINS.some(d => domain.includes(d))) {
    return {
      error: `"${email}" looks like a placeholder. Ask the user for their real email address.`
    };
  }
}

Two things to notice. First, the validation returns errors instead of throwing them. A thrown exception terminates the tool call with a runtime error the model can't act on. A returned error lands back in the model's context as a tool result — the model reads it, understands what went wrong, and retries:

// This crashes. The model gets a runtime error and no useful signal.
if (!user_email) throw new Error('user_email is required');

// This works. The model reads the error and asks for the real address.
if (!user_email) {
  return {
    error: 'user_email is required. Ask the user for their email address, then call this tool again.'
  };
}

Second, required in a tool schema is a hint to the model, not a runtime guarantee. Models will omit required fields — sometimes because the value wasn't extracted yet, sometimes for reasons that aren't obvious from the logs. Treat every parameter as potentially absent at the handler boundary.

ALWAYS X in a prompt is a suggestion. Enforcing X belongs in code.

The Prompt Is the Program

All three bugs came from the same misread of what a system prompt is. I was treating it as documentation — a description of intended behavior that the real system (the code) would enforce. For an LLM-powered feature, that's backwards.

The system prompt isn't documentation. It's source code executed by a natural-language interpreter. Contradictions in it don't fail to compile — they resolve according to specificity and proximity rules you never wrote down. Prohibitions execute. Structure is semantics. A flow description with two inline questions is an instruction to ask two questions, regardless of the STRICT rule above it.

The debugging instinct to check the API, the tool registration, the network logs — all of that is valid. But it should come after you've read your own prompts as a hostile reader looking for contradictions, prohibitions, and preconditions that only exist in prose.

The model is rarely the bug. Read your prompts first.

Jurij Tokarski

Three Bugs That Were Actually My Prompts

The STRICT Rule That Was Overriding Itself

The Tool That Read the Prohibition

The Precondition That Lived Only in Prose

The Prompt Is the Program

About Jurij Tokarski