Articles tagged with Incident Management

Posts on production incidents, response, coordination, and learning after failure.

3 min read

Managers Have Been Vibe Coding All Along

Everyone’s been talking about vibe coding lately. I’ve been doing it myself. two projects. and . It’s the kind of work where you don’t analyze, architect, or overthink. You star...

11 min read

Balancing Act of Reliability

Once something is in production, you are no longer just building software. You are also keeping it alive. That sounds obvious, but teams forget it all the time. We get excited a...

6 min read

Operational Skills Needed

Over the years, I've interviewed many candidates. One crucial skill that often gets overlooked is operational reflexes during oncalls. Surprisingly, few companies test for this,...

6 min read

Update Statements on Production

Executing update statements on a production database is always a big challenge. It’s one of those tasks that looks deceptively simple until something breaks in ways you didn’t i...

8 min read

Engineering Roles and Responsibilities

Engineering roles exist whether you define them or not. In some teams, ownership is explicit. People know who drives incident management, who keeps an eye on risk, who pushes on...

6 min read

Essential Engineering Principles

Engineering principles give teams a practical foundation for how to build and operate software. They guide decisions, shape behaviours, and help groups stay aligned even as syst...

6 min read

Addressing Technical Debt

Tech debt occurs when we solve a software problem with our limited understanding of the business at the time. We start building a solution to get feedback as early as possible....

14 min read

Service Overload Strategies

Service overload happens a lot. If you haven't seen one, count yourself lucky. The first time I watched it take a system down, I realized how serious it’s to get the basics righ...

17 min read

Promoting Learnings in Incidents

Incidents are used for the negative consequences of an action. The incident comes from an action that fails to result in the expected outcome. For instance, deploying a code to...

5 min read

Buggy Code on Production, Survived

Areca is the name of the billing engine I am working on for Turk Telekom. Funny enough, it is also the name of the flowers we bought to freshen the office. We wanted the office...