Production and Reliability

What software looks like under production pressure: debugging, overload, incidents, quality, and reliability.

April 12, 20135 min read

Buggy Code on Production, Survived

Areca is the name of the billing engine I am working on for Turk Telekom. Funny enough, it is also the name of the flowers we bought to freshen the office. We wanted the office...

October 15, 20155 min read

Local vs Production Debugging

I have been debugging this data workflow tool we built in house lately. It has an Angular UI and a Java backend, and it moves data between different systems like Postgres to Hiv...

March 20, 20226 min read

Update Statements on Production

Executing update statements on a production database is always a big challenge. It’s one of those tasks that looks deceptively simple until something breaks in ways you didn’t i...

December 29, 202114 min read

Service Overload Strategies

Service overload happens a lot. If you haven't seen one, count yourself lucky. The first time I watched it take a system down, I realized how serious it’s to get the basics righ...

October 6, 20249 min read

Balancing Act of Reliability

Software development involves both creating and maintaining systems. Once you put anything into production, reliability becomes critical. When your systems are not reliable, you...

November 2, 20236 min read

Silent Guardians of Quality

In the realm of software development, testers are the silent guardians. Their role is often misunderstood and underappreciated, especially when they do their job so well that no...

October 2, 20234 min read

Why Metrics Don’t Equal Quality

In 1902, Hanoi was drowning in rats. The government was getting nervous about plague. Hence, the city put a bounty per rat tail. Suddenly, the system had a scoreboard, something...

November 26, 202117 min read

Promoting Learnings in Incidents

Incidents are used for the negative consequences of an action. The incident comes from an action that fails to result in the expected outcome. For instance, deploying a code to...

June 5, 20246 min read

Operational Skills Needed

Over the years, I've interviewed many candidates. One crucial skill that often gets overlooked is operational reflexes during oncalls. Surprisingly, few companies test for this,...