
Jurij Tokarski
Null Bytes, Dead Streams, Last Chunk
SSE adds overhead for mixed event types, silent streams hang without error, and the last audio chunk vanishes on page close — three workarounds for LLM streaming.
Streaming LLM output to a browser means wiring together SSE, TCP, fetch, and browser lifecycle APIs that weren't designed for this combination. Each one has constraints that only surface when you integrate them.
The Parser That Choked
Server-Sent Events is the natural choice for streaming. SSE supports multiple event types via the event: field and handles multiline JSON by splitting across data: lines. But when every chunk needs an event: line, one or more data: lines, and a blank line delimiter — and you're sending hundreds of small text fragments interleaved with structured tool call events — the framing adds up and the parser becomes more complex than the problem requires.
A null byte as the delimiter is simpler. \0 is rare enough in practice — it can appear as \u0000 in JSON but almost never does in LLM output or natural language — that it works as a reliable record separator without escaping.
// Server: wrap each event
function sendEvent(stream, event) {
stream.write(JSON.stringify(event) + '\0');
}
// Client: split and route
let buffer = '';
decoder.on('data', (chunk) => {
buffer += chunk;
const parts = buffer.split('\0');
buffer = parts.pop(); // keep the incomplete trailing segment
for (const part of parts) {
if (part) handleEvent(JSON.parse(part));
}
});
Each event is a JSON object with a type field — text_chunk, tool_call, tool_result, done. The client splits on null bytes, parses each segment, routes by type. Text chunks accumulate in the UI. Tool events trigger loading states or commit structured data.
The Stream That Stopped Talking
TCP keepalive keeps a connection open. It doesn't tell you the connection has gone silent at the application level. Occasionally — maybe once every few hundred sessions — a stream stops mid-sentence. No error event. No close event. The connection is alive, the response is still "streaming," and the user is staring at a half-finished message with a spinner that will never resolve.
The LLM API hasn't errored — it just stopped sending chunks.
An idle timer catches this. Reset it on every incoming chunk. Fire it if silence crosses a threshold.
let idleTimer;
function resetIdleTimer(controller) {
clearTimeout(idleTimer);
idleTimer = setTimeout(() => {
controller.abort();
}, 30_000);
}
stream.on('data', (chunk) => {
resetIdleTimer(controller);
processChunk(chunk);
});
stream.on('end', () => {
clearTimeout(idleTimer);
});
Thirty seconds is generous for interactive chat — users notice after five. The threshold isn't the important part. The pattern is: connection-level timeouts don't catch application-level silence. You need to track it yourself.
The Chunk That Vanished
Browsers kill in-flight fetch() calls during page unload. If you stream audio in chunks via POST, the final chunk — whatever is still buffered when the user stops recording or closes the tab — lives in memory until the next flush. That flush never happens. The final segment of every session is silently dropped.
// Killed on page close:
await fetch('/v3/audio/stream_chunk', { method: 'POST', body: chunk });
// Survives:
fetch('/v3/audio/stream_chunk', {
method: 'POST',
body: chunk,
headers: { Authorization: `Bearer ${token}` },
keepalive: true,
});
No await. No .then(). You can't await a response during unload — any result is swallowed. Fire and forget. The browser queues the request and completes it even after the page is gone, as long as the total payload is under ~64KB.
navigator.sendBeacon() survives unload too, but it doesn't support custom headers. If your backend expects an auth header, fetch({ keepalive: true }) gives you the full request API.
The Gaps Between Protocols
Every integration has these. You wire together two or three tools that work fine on their own, but nobody tested them together — and no documentation covers the seams. The workarounds aren't published as best practices. They accumulate as know-how, one project at a time. These are three I've accumulated for LLM streaming.
Subscribe to the newsletter:
About Jurij Tokarski
I run Varstatt and create software. Usually, I'm deep in work shipping for clients or building for myself. Sometimes, I share bits I don't want to forget.
x.comlinkedin.commedium.comdev.tohashnode.devjurij@varstatt.comRSS