I come from the old world, from a time where everything was planned, written down, standardized. The world of Service Level Agreements. Of protocols, of liability, of blame, of tickets. And even though everyone always tried to account for any eventuality in advance, there was always one constant:
Sh… Stuff happens.
I’m in the business of helping customers deliver new solutions, and in general, help them set up or improve their agility /DevOps-ness. A couple of weeks ago at a customer we had a meeting and were discussing some incidents that happened, how it occurred, how to deal with it in the future. One pervasive issue that kept popping up was where the incident was supposed to end up, who had to solve it.
And before I even knew it, this was my response to the discussion:
Trying our best, we could not find someone to blame. Reason for this: no one was to blame. Everyone tries their hardest to have a proper working system. The actual point that needs to be addressed – the customer has an issue – needs to be solved. Rather than looking at where you should direct the problem, ask yourself this question:
How can I help?
To me this is the essence of DevOps: everything works.
- If something doesn’t work, fix it.
- If something doesn’t work good enough, improve it.
- If something is missing, build it.
- You don’t know how: learn it.
In general, if there is a problem, do not try to find the ‘correct’ location to drop it. Instead, cooperate to improve.
- If that Open Source library has a bug, go fix and make a Pull Request.
- If that component is a single point of failure, make it redundant.
- If your data center has one internet provider, go to the cloud.
- If your cloud provider doesn’t offer sufficient availability, implement multi-cloud redundancy.
- If the planet might get destroyed by a solar flare, launch a backup system to another planet.
You should not care about availability, nor bugs, nor dependencies.
Treat every occurrence as an opportunity to improve and do it together.