How OpsGenie help me and Why I returned to the pdf long Incident report?

 Hi!


First of all, I would like to say a big thank you for the OpsGenie team for each integration (Slack, call routings, apps, Jira, etc.). Because that integration helps me to reduce the time of reaction to the incident.
So, let's start step by step:
We are using OpsGenie since the end of 2016. On that period OpsGenie had a smooth option for delivery SMS notification for our Nagios in the critical problem. Also, it was additional pros as an escalation workflow. Based on flow which is described on official docs we go forward.

undefined

Source: https://docs.opsgenie.com/docs/alert-notifications-flow

Now our scheme a little bit complicated and more integrated with another part of the infrastructure.

My one of the purpose was autogenerating of the incident report, which was required in our company. Why did I want to have autogenerated? Because we had a pdf template that includes around 50 questions.

Снимок экрана 2019-12-01 в 01.27.15.png

As a result, I hate the process of fill info into that report and every time as post mortem I thought what I need to do for improving availability. As an example If I reduce the number of the incident I will not fill that report, hence no incident - no report.  

image.png

 

Once we had all things as automatic, I just click and put everything into Jira and forgot about follow up incident report and additional investigations.

As an example, after the last incident, I did not think about improves, which is so bad. Now I especially open the old template with many questions and fill just for fun. As result during the answer to questions for Incident report I hate again and start to think what I need to improve to sleep better :)

Conclusion: 

Sometimes we need to balance automatic things and manual, in my use case I returned to the manual process. Because in my view if we should have so specific manual report which motivates you to improve the root cause process issue. :) 

What do you think? If you have other opinions feel free to write here, please.

As bonus you can use that interesting book which really include a lot of aspects of incident and accident reporting. http://www.dcs.gla.ac.uk/~johnson/book/C_Johnson_Accident_Book.pdf  

https://www.opsgenie.com/webinars/best-practices-for-incident-management

 

 

Cheers,

Gonchik Tsymzhitov

Comments

Popular posts from this blog

How only 2 parameters of PostgreSQL reduced anomaly of Jira Data Center nodes

Atlassian Community, let's collaborate and provide stats to vendors about our SQL index usage

Stories about detecting Atlassian Confluence bottlenecks with APM tool [part 1]