March 01, 2026

Vibe Coding Hits a Wall

Vibe Coding Hits a Wall - Featured Image

Vibe Coding Hits a Wall

There’s a moment when “vibe coding” feels like a cheat code. You describe what you want, the code appears, and suddenly you’ve got a working prototype. For a lot of work, that’s genuinely valuable. It’s faster scaffolding, faster iteration, faster confidence.

And then you try to ship it.

This is a real story from a backend refactor that looked trivial on paper: take an image generation service running in Docker (Node.js + skia-canvas) and make it run in parallel. Same inputs, same outputs, just more throughput. If you’ve ever built something like this, you already know the ending: the “parallel” part is where reality shows up.

The starting point: one container, one render at a time

We had a backend service that generated images on demand. It accepted a payload, rendered an image using Skia via skia-canvas, and returned the result. In low volume, it behaved nicely. Latency was acceptable. The code was understandable. The Docker container was stable.

Then usage changed. We needed to handle multiple requests in parallel — not “slightly faster,” but “handle more simultaneous work without falling apart.” This is the exact kind of problem where vibe coding can lull you into thinking it’s just a refactor.

The request sounded simple: “Let’s run multiple renders concurrently.”

The vibe-coded refactor: more concurrency, more problems

The first set of changes looked straightforward:

  • Make the handler asynchronous and run multiple jobs concurrently.
  • Use Promise.all() or a concurrency limiter.
  • Optional: move rendering into worker_threads for “parallelism.”

On a developer laptop, with a few manual requests, it worked. The outputs looked correct. The service responded. We told ourselves we were close.

Then we put it under load.

Wall #1: timeouts and hanging requests

The first symptom wasn’t “it’s slower.” It was uglier: requests started to hang. Some eventually timed out. Others would complete unpredictably late, like the system was swallowing work and coughing it back up when it felt like it.

This is the kind of issue that doesn’t show up in a happy-path demo. Under load, the backend wasn’t just “busy.” It was stuck. Event loop latency spiked. In-flight requests piled up. Connections stayed open. Retries from callers made things worse. The classic death spiral.

Here’s the key lesson: in a CPU-heavy service, “more concurrency” without backpressure is not throughput. It’s a queue you didn’t mean to create.

When your service is already at capacity, accepting more work doesn’t make it faster. It just increases the amount of work waiting to be done while everything else gets slower.

Wall #2: CPU pegged, with no throughput gain

The second symptom was that CPU usage shot to 100%… and throughput barely improved.

This is where a lot of vibe-coded advice gets slippery, because the word “parallel” is overloaded — especially in Node.js:

  • Async concurrency (Promises) is good for I/O (network, disk), but it doesn’t magically parallelise CPU-heavy work.
  • CPU-bound work competes for actual CPU cycles. If you schedule ten renders at once on a limited CPU budget, you don’t get ten times faster. You just get contention.
  • Node can appear to “do more at once,” while actually doing less useful work per second, because the overhead of context switching, memory pressure, and GC goes up.

We saw exactly that. The service looked “active” (busy logs, busy CPU), but it wasn’t producing finished images at the expected rate. Meanwhile, latencies climbed until the caller experience was effectively broken.

In other words: we didn’t build a parallel renderer. We built a parallel way to fall over.

Wall #3: corrupted or blank images

The third symptom was the one that kills confidence fastest: corrupted output. Images that were blank. Images that were partially rendered. Images that looked like the wrong state was leaking between requests.

This is where the conversation leaves “JavaScript patterns” and enters “native library reality.” skia-canvas isn’t just JS. It’s a native binding to a rendering engine. Under concurrency, you start to discover properties that aren’t obvious from a quick skim of docs:

  • Are the objects you’re using actually safe to share between jobs?
  • Does the native module have global state you didn’t account for?
  • Are you inadvertently reusing buffers, canvases, fonts, or image resources across requests?
  • What happens when two renders race for the same underlying resource?

Vibe coding is great at producing “a plausible refactor.” It’s not great at predicting which parts of a native graphics stack will behave badly when you scale concurrency inside one process.

And if you’re building this in Docker, the constraints get sharper. Memory pressure rises faster. CPU throttling becomes visible. A small spike becomes a restart loop. The service becomes unpredictable.

Why this happens: production constraints are not optional

When you do backend image rendering, you’re not just building an API endpoint. You’re running a small graphics workload inside a server. That comes with non-negotiable operational questions:

  • Backpressure: what happens when 50 requests arrive and you can only safely render 4 at a time?
  • Time limits: when do you stop a render that’s stuck, and what do you return to the caller?
  • Resource limits: how do you prevent one busy period from starving everything else (including health checks)?
  • Isolation: what do you do when a single job triggers a weird native failure?

In the hype world, you add concurrency and everything scales. In the real world, you add concurrency and then spend a week learning what your system was quietly relying on.

The turning point: we stopped “refactoring” and started designing

The fix wasn’t a clever snippet. The fix was turning the system into something that can be operated safely.

Instead of “let every request render immediately,” we moved to an explicit design:

1) A bounded queue (intentional waiting, not accidental waiting)

If the renderer can handle N concurrent renders, we accept that fact and encode it. Excess requests are queued deliberately, with visibility, limits, and predictable behaviour. If the queue is full, we fail fast with a clear error (or a 429/503 depending on the caller contract), rather than silently hanging until timeouts.

2) A worker pool sized to CPU and memory

For CPU-heavy rendering, a worker pool can help, but only if it’s sized to the actual resources available. The “right” number is rarely “as many as possible.” It’s usually tied to CPU cores and memory-per-job, and it often requires measurement rather than guessing.

Crucially, you avoid sharing unsafe state between jobs. Each job should have its own canvas/context and avoid accidental reuse of buffers or global singletons.

3) Timeouts, cancellation, and predictable failure

A hung render is worse than a failed render. We added strict time budgets and ensured the API returned a consistent response when a job exceeded its allowed runtime. This prevented a handful of “stuck” requests from consuming all capacity indefinitely.

4) Instrumentation that answers “what is happening?”

Once you’re dealing with timeouts, pegged CPU, and corrupted output, you need visibility:

  • queue depth over time
  • render duration (p50/p95/p99)
  • worker utilisation
  • memory usage per job (and overall)
  • error rates by failure type

This is where a lot of teams discover the uncomfortable truth: the system wasn’t “fine,” it was just never observed under real conditions.

5) Scaling the right way: replicas, not chaos

At some point, the simplest form of “parallel” is multiple container replicas behind a load balancer — as long as you still have per-instance concurrency limits and backpressure. That gives you isolation. If one instance gets wedged, you don’t take out the whole service.

It also makes capacity planning easier: you scale by adding known-good units rather than turning one container into a high-stakes science experiment.

What this taught me about AI coding tools

AI tools helped us move quickly at the start. They were good at generating scaffolding and suggesting patterns: worker pools, queues, concurrency limiters, and general architecture ideas.

But the hard part wasn’t writing code. The hard part was dealing with the consequences:

  • the difference between async concurrency and CPU parallelism
  • native module behaviour under concurrency
  • backpressure and load shedding
  • instrumentation and operations
  • making failures predictable, not mysterious

That’s the wall. Not because AI is useless — it’s not. But because production systems require ownership, not just output. Someone has to be accountable for what happens when the service is under load, inside constraints, with real users waiting.

If you’re vibe coding a backend service, here’s the practical takeaway

If your system does heavy work (rendering, OCR, video, ML inference, complex transforms), treat “make it parallel” as a design change, not a refactor. Start with the constraints:

  • How many renders can one instance do safely?
  • What’s the maximum queue depth you can tolerate?
  • What’s the timeout budget?
  • What does failure look like to the caller?
  • What metrics will tell you it’s healthy?

Vibe coding can help you get started. It can’t replace the uncomfortable work of making something reliable.

Ready to pressure-test a backend system before it hits production? Get in touch with us and we’ll help you design for throughput, stability, and predictable behaviour under real load.

Recent Blogs