A few weeks ago we were discussing a Java component that starts a Spark cluster. Its job is mostly coordination. It starts the machinery, passes configuration around, waits for the right signals, and gets out of the way.

My first reaction was simple: it should need one CPU and maybe 2 GB of memory if we are being generous. It is a launcher. Even saying 2 GB felt strange, because this was production, not a toy running on my laptop. But that reaction is exactly the problem. Somewhere along the way, we started treating small numbers as unserious just because the system is important. Production should make us more careful with resources, not less.

I think we all know how that happens. If you’ve been in this business for a while, you have done versions of it yourself. A pipeline fails once, and someone bumps the memory. A service gets squeezed a bit, and someone raises the number of CPUs. A rollout goes poorly, and the next engineer adds a massive buffer because nobody wants to be paged again for trying to be clever with resources. The bigger number works, the system feels resilient, and we move on.

Six months later, our original cause has vanished from everyone’s memory. It becomes the de facto setting for the service. Nobody thinks of it as a temporary fix anymore. We read it as a hard requirement. The patch has hardened into fact. And nobody wants to challenge that because it works.

Here is the paradox. The JVM has absorbed decades of optimization, garbage collectors have gotten dramatically better, CPUs are way faster, and cloud computing is trivial to provision. On paper we should be swimming in the gains. Instead we keep spending them, proving Niklaus Wirth's 1995 law: "software is getting slower more rapidly than hardware becomes faster."

Just In Case EngineeringJust In Case Engineering

Where the Progress Went

The gains did not disappear. We spent them on larger runtimes, deeper dependency graphs, heavier containers, more telemetry, wider safety margins, and platform defaults that make every service look the same. When a higher-level abstraction makes coding more efficient, we do not write less code or build simpler systems. We spend the efficiency on more complexity, more integration, more moving parts.

When an engineer inflates a container's memory limit just in case, the JVM's ergonomics read that headroom as permission. Default heap sizing grows to a fraction of whatever it sees, garbage collection gets lazier because there is slack to burn, and the runtime settles comfortably into the larger footprint it was handed. The software does not get hungrier because the work grew. It expands to fill the runway it was given.

Now, some of this weight is legitimate. I want to avoid the trap of technical nostalgia. We have to separate unavoidable complexity from avoidable waste. Modern software runs in a hostile, globally networked environment. Some weight is real: security, accessibility, distributed systems, compliance, observability, and global scale. A modern system carries work old software never had to carry. Our mistake is using that true statement to defend every bad default that came along for the ride.

When you look at what’s avoidable, you can see it in bloated dependency trees, where large portions of imported dependencies are rarely, barely, or never exercised at runtime. Every layer of the modern stack has an appetite. Logging, tracing, the platform SDK, and the base image want their own share. None of them looks outrageous alone, but together they make a small thing stop feeling small. Bloat arrives as a long sequence of rational and local decisions.

The Machine Used to Say No

Old software was not fast because engineers were morally superior. It was fast because the machine said no. In the 2000s, it was normal to run a web service on a machine with a few gigabytes of memory and one or two CPUs. The box was small, so the work had to be shaped properly. You had to prioritize, tune, and understand what the service was actually doing. Constraint made people care.

Modern infrastructure is far more forgiving. We happily add buffers, retries, and standard platform layers, each one just in case. Forgiveness lets us build larger systems, but it also lets waste survive long enough to look normal. A bigger instance type can hide confusion. A bigger heap can hide a leak. A standard platform can hide the fact that a tiny coordinator inherits the appetite of a much larger service because 2 GB looks like a joke.

On the flip side, a service utilizing a microscopic 10% of its allocated 64 GB shows up as a beautiful, calming green on a monitoring chart. It signals health to the reliability engineering team, effectively masking a potential optimization failure as a triumph of operational safety. We have built exhaustive alerts for system saturation, but we rarely build alerts for structural emptiness.

The distance between the engineer and the machine collapsed. When adding capacity meant ordering hardware, installing RAM, waiting for a server, or asking someone to approve a real purchase, you had to think twice. There was friction. It was useful, and slowed things down. Now the same decision is a config change.

We did not become wasteful because engineers stopped caring. We became wasteful because waste stopped having a moment where we had to explain why we are doing what we are doing.

Hardware Became the Buffer

Throwing hardware at a problem might be easy, and a lot of the time, it is the correct economic answer. Human labor is far more expensive than computing. If a few dollars of extra compute saves an hour of expensive engineering time, then hand-optimizing the code is the bad investment, not the good one.

The problem is though letting hardware scaling become the only move. Once it is, every bottleneck turns into a provisioning issue. If we see a node is under pressure, we move to a larger one. The system becomes stable, and that is where it gets tricky, because stability can mean two very different things. Sometimes a stable system is elegantly shaped. Other times, we throw enough money at a problem so that it doesn’t become a midnight issue.

The second kind is easy to mistake for success, because production goes uneventfully. Uneventful feels like winning. It also lets us forget what we paid to get there

Bloat Survives Through Handoffs

Bloat survives because the cost moves. The person making the decision is rarely the person paying the full cost. The person who adds a dependency gets the development speed today; the future maintainer inherits the upgrade pain. The platform team adds a standard sidecar once; every service pays the latency tax forever. The product team adds tracking to monetize attention; the user pays in battery drain and cognitive load.

So bloat survives even when everyone dislikes it, because the feedback loops are broken. When a service goes down, it triggers an incident, a postmortem, and a sense of urgency. When an enterprise tool is merely sluggish, it produces nothing but daily drag. Anyone who has used bad enterprise software knows this kind of slowness never becomes an incident. It becomes the background tax of doing the job.

Bring Back Budgets

The answer is resource budgets. Simple, tedious, explicit budgets. A launcher gets a memory budget, a service gets a startup budget, a container gets a size budget. When one of those limits is crossed, someone has to explain what changed and what the extra cost actually buys.

This does not mean hand-optimizing every byte. Sometimes, hardware is cheaper than human coordination. Some abstractions buy enough safety to earn their heavy footprint. The goal isn't poverty; the goal is intention. Let’s make the trade-off explicit before an inflated guess hardens into a baseline.

If a simple Spark launcher genuinely needs thirty gigabytes, fine. But show me the receipts. What gets loaded at startup? What stays resident? How much of that memory is buying structural reliability? How much is just paying interest on an old decision?

If nobody can answer those questions, that allocation isn't engineering. It's a superstition. So the next time a component asks for an enterprise-sized footprint, don’t validate it by checking what everything else is using. Look at the shape of the work. Ask what the machine is actually doing, and what it would take for us to stop being so terrified of a small number.