There’s a moment when “vibe coding” feels like a cheat code. You describe what you want, the code appears, and suddenly you’ve got a working prototype. For a lot of work, that’s genuinely valuable. It’s faster scaffolding, faster iteration, faster confidence.
And then you try to ship it.
This is a real story from a backend refactor that looked trivial on paper: take an image generation service running in Docker (Node.js + skia-canvas) and make it run in parallel. Same inputs, same outputs, just more throughput. If you’ve ever built something like this, you already know the ending: the “parallel” part is where reality shows up.
We had a backend service that generated images on demand. It accepted a payload, rendered an image using Skia via skia-canvas, and returned the result. In low volume, it behaved nicely. Latency was acceptable. The code was understandable. The Docker container was stable.
Then usage changed. We needed to handle multiple requests in parallel — not “slightly faster,” but “handle more simultaneous work without falling apart.” This is the exact kind of problem where vibe coding can lull you into thinking it’s just a refactor.
The request sounded simple: “Let’s run multiple renders concurrently.”
The first set of changes looked straightforward:
Promise.all() or a concurrency limiter.worker_threads for “parallelism.”On a developer laptop, with a few manual requests, it worked. The outputs looked correct. The service responded. We told ourselves we were close.
Then we put it under load.
The first symptom wasn’t “it’s slower.” It was uglier: requests started to hang. Some eventually timed out. Others would complete unpredictably late, like the system was swallowing work and coughing it back up when it felt like it.
This is the kind of issue that doesn’t show up in a happy-path demo. Under load, the backend wasn’t just “busy.” It was stuck. Event loop latency spiked. In-flight requests piled up. Connections stayed open. Retries from callers made things worse. The classic death spiral.
Here’s the key lesson: in a CPU-heavy service, “more concurrency” without backpressure is not throughput. It’s a queue you didn’t mean to create.
When your service is already at capacity, accepting more work doesn’t make it faster. It just increases the amount of work waiting to be done while everything else gets slower.
The second symptom was that CPU usage shot to 100%… and throughput barely improved.
This is where a lot of vibe-coded advice gets slippery, because the word “parallel” is overloaded — especially in Node.js:
We saw exactly that. The service looked “active” (busy logs, busy CPU), but it wasn’t producing finished images at the expected rate. Meanwhile, latencies climbed until the caller experience was effectively broken.
In other words: we didn’t build a parallel renderer. We built a parallel way to fall over.
The third symptom was the one that kills confidence fastest: corrupted output. Images that were blank. Images that were partially rendered. Images that looked like the wrong state was leaking between requests.
This is where the conversation leaves “JavaScript patterns” and enters “native library reality.” skia-canvas isn’t just JS. It’s a native binding to a rendering engine. Under concurrency, you start to discover properties that aren’t obvious from a quick skim of docs:
Vibe coding is great at producing “a plausible refactor.” It’s not great at predicting which parts of a native graphics stack will behave badly when you scale concurrency inside one process.
And if you’re building this in Docker, the constraints get sharper. Memory pressure rises faster. CPU throttling becomes visible. A small spike becomes a restart loop. The service becomes unpredictable.
When you do backend image rendering, you’re not just building an API endpoint. You’re running a small graphics workload inside a server. That comes with non-negotiable operational questions:
In the hype world, you add concurrency and everything scales. In the real world, you add concurrency and then spend a week learning what your system was quietly relying on.
The fix wasn’t a clever snippet. The fix was turning the system into something that can be operated safely.
Instead of “let every request render immediately,” we moved to an explicit design:
If the renderer can handle N concurrent renders, we accept that fact and encode it. Excess requests are queued deliberately, with visibility, limits, and predictable behaviour. If the queue is full, we fail fast with a clear error (or a 429/503 depending on the caller contract), rather than silently hanging until timeouts.
For CPU-heavy rendering, a worker pool can help, but only if it’s sized to the actual resources available. The “right” number is rarely “as many as possible.” It’s usually tied to CPU cores and memory-per-job, and it often requires measurement rather than guessing.
Crucially, you avoid sharing unsafe state between jobs. Each job should have its own canvas/context and avoid accidental reuse of buffers or global singletons.
A hung render is worse than a failed render. We added strict time budgets and ensured the API returned a consistent response when a job exceeded its allowed runtime. This prevented a handful of “stuck” requests from consuming all capacity indefinitely.
Once you’re dealing with timeouts, pegged CPU, and corrupted output, you need visibility:
This is where a lot of teams discover the uncomfortable truth: the system wasn’t “fine,” it was just never observed under real conditions.
At some point, the simplest form of “parallel” is multiple container replicas behind a load balancer — as long as you still have per-instance concurrency limits and backpressure. That gives you isolation. If one instance gets wedged, you don’t take out the whole service.
It also makes capacity planning easier: you scale by adding known-good units rather than turning one container into a high-stakes science experiment.
AI tools helped us move quickly at the start. They were good at generating scaffolding and suggesting patterns: worker pools, queues, concurrency limiters, and general architecture ideas.
But the hard part wasn’t writing code. The hard part was dealing with the consequences:
That’s the wall. Not because AI is useless — it’s not. But because production systems require ownership, not just output. Someone has to be accountable for what happens when the service is under load, inside constraints, with real users waiting.
If your system does heavy work (rendering, OCR, video, ML inference, complex transforms), treat “make it parallel” as a design change, not a refactor. Start with the constraints:
Vibe coding can help you get started. It can’t replace the uncomfortable work of making something reliable.
Ready to pressure-test a backend system before it hits production? Get in touch with us and we’ll help you design for throughput, stability, and predictable behaviour under real load.