
Jurij Tokarski
Filling Forms No Tool Can Template
Every tender form is different, templating tools need placeholders you can't insert, markdown round-trips destroy the document, and only some models can do XML surgery on the original file.
It Works, But You Can't Ship It covered the compliance wall — code execution sandboxes can fill DOCX forms, but they only exist in regions that don't match every customer's data residency policy. This post covers what I learned building the feature before that discovery.
Every Form Is Different
Tender response forms have nothing in common with each other. One agency sends a form with merged table cells and numbered question blocks. The next sends checkboxes inside conditional formatting with section breaks in unexpected places. There is no shared structure, no recurring field names, no predictable layout.
Every DOCX templating tool I evaluated — docx-templates, easy-template-x, docxtemplater — works the same way: you prepare a template with {variable_name} placeholders, pass in data, get a rendered document. That assumes you control the template. Tender forms come from government agencies. You don't control anything. You can't insert placeholders into a form you receive the day the tender opens.
Filling these forms requires understanding an arbitrary document's structure, finding the insertion points, and knowing what content goes where. That's not a deterministic templating problem. It's a comprehension problem.
DOCX Is a ZIP of XML
My first PoC tried the next obvious thing: extract the DOCX to markdown, send it to the model with draft content, get back filled markdown, convert to a new DOCX. Clean pipeline, completely useless output. The regenerated document lost every merged cell, every checkbox, every conditional format. The output was a different document that happened to contain similar text.
A DOCX file is a ZIP archive of XML. word/document.xml holds the content in OOXML format. The correct approach is to give the model the original binary, let it read the XML, find the insertion points, write modifications back, and save the modified ZIP. XML surgery on the original file — not regeneration.
The only way to do this through an API, without deploying a separate Python service, is a code execution sandbox. Both OpenAI's code_interpreter and Anthropic's code execution tool provide a sandboxed Python environment where python-docx is available and the model can operate on the file directly.
Once that architecture clicked, the API quirks started.
OpenAI: The File Goes in the Container
My first attempt passed the uploaded DOCX as an input_file content block in the user message — the pattern you'd use for images or PDFs.
Expected context stuffing file type to be a supported format... but got .docx
Context stuffing only supports PDFs, images, and plain text. The file has to go into the code_interpreter container instead:
tools: [
{
type: 'code_interpreter',
container: { type: 'auto', file_ids: [uploadedFile.id] }
}
]
The message itself is plain text — you tell the model the filename so it knows what to look for in the sandbox. No file reference in the content block at all.
Getting the filled file back had its own problem. The SDK exposes client.containers.files.content, which looks callable. It isn't — it's a resource object. The working call is client.containers.files.content.retrieve(containerId, fileId). Neither the types nor the error message make this obvious. I found it by running Object.getOwnPropertyNames on the object at runtime.
Anthropic: Five Things at Once
Claude has the same capability, but it requires five specific pieces in a single request. Miss any one and you get a cryptic failure.
The file upload needs an explicit MIME type — not inferred from the extension. The API call needs two beta flags active simultaneously: files-api-2025-04-14 and code-execution-2025-08-25. The file must be referenced as container_upload in the content block — not document, not file. The tool declaration needs the full versioned type string code_execution_20250825. And the download call needs the same beta flags passed again.
const response = await client.beta.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 16384,
betas: ['files-api-2025-04-14', 'code-execution-2025-08-25'],
messages: [{
role: 'user',
content: [
{ type: 'text', text: userMessage },
{ type: 'container_upload', file_id: uploaded.id },
],
}],
tools: [{ type: 'code_execution_20250825', name: 'code_execution' }],
});
No single documentation page covers all five requirements together. Each piece is documented somewhere. The combination isn't.
GPT-5.1 Broke the Document
I tested the form-filling flow across GPT-5.1, 5.2, and 5.4, on both Azure OpenAI and the public API.
GPT-5.1 on Azure — our production deployment — wrote code that opened the DOCX but ignored formatting preservation entirely. Merged cells collapsed, checkboxes vanished, section breaks shifted. The output was a broken document. Same result on the public API — not an infrastructure issue, a model capability issue. GPT-5.2 was inconsistent: partially filled on one test, failed on the next. GPT-5.4 was the first in the lineup that reliably understood the OOXML structure, applied targeted modifications with python-docx, and returned a valid binary with all formatting preserved.
Every Claude Model Could Do It
After the GPT results I tested Claude 4.6 — Opus, Sonnet, and Haiku — through Anthropic's code execution sandbox. Opus and Sonnet completed the task cleanly. The OOXML structure stayed intact, insertions landed in the right cells, formatting survived the round trip. Haiku was inconsistent — similar to GPT-5.2, partially filling on some runs and failing on others.
The gap between the top-performing models was stark. GPT-5.1 couldn't preserve the structure at all. Claude Opus and Sonnet preserved it reliably. The model version matters more than the provider for this task. But which model you can actually deploy depends on where your customer's data is allowed to live. I didn't.
Subscribe to the newsletter:
About Jurij Tokarski
Hey 👋 I'm Jurij. I run Varstatt and create software. Usually, I'm deep in the work shipping for clients or building for myself. Sometimes, I share bits I don't want to forget: mostly about software, products and self-employment.
x.comlinkedin.commedium.comdev.tohashnode.devjurij@varstatt.comRSS