AIOps for Service Management

0

In the IT Operations world, nowadays you hear the term AIOps frequently. Gartner defines it as that which, “… combines big data and machine learning to automate IT operations processes, including event correlation, anomaly detection and causality determination.”  In simple words, AIOps is the application of AI and Automation to IT processes, for faster resolution of issues.

Very often, we get all excited about the prospects of using AI and ML on IT event data. However, it’s also important to consider that Automation is an equal contributor to any AIOps implementation. The role of AI & ML is used for generating the necessary signal or insight about a problem. It can also be used for classification, categorization and detection of anomalies around events. The role of Automation is to remediate and take actions related to an incident or problem.

The diagram below shows a conceptual view of how the various components work together.

It has the following components:

  1. Alerts model: This is an AI/ML model, pre-trained with past Events, Metrics, Traces and Logs from the various systems of IT Operations (database, storage, servers, networks, etc.). The model will be used to correlate similar events, check on false positives and classify actionable vs. non-actionable events
  2. Alert BOT: Although we use the word “BOT”, it could also even be a program or script. The BOT takes live events from the monitoring systems, and act as an interface to the Alerts model. It also forwards actionable events as incidents/problems to the ITSM solution
  3. ITSM: The core system of record for all user tickets and actionable alerts. It could be ServiceNow, Remedy, Jira or any similar application. This also hosts the respective KB articles, that are referred by the resolver groups.
  4. Tickets model: This is another AI/ML model, pre-trained with past tickets from the ITSM solution.
  5. Tickets BOT: Another BOT, program or script, that acts as the interface between the ITSM solution and the Tickets model. Every new ticket data is sent to the Tickets model for classification of Assignment Groups, Priority and check for existing KB articles. The assignment groups can also include any existing automation scripts/BOTs that can be auto triggered.
  6. BOT factory: The repository of existing Automation BOTs and scripts, that are either auto triggered for remediation, or triggered by human resolvers wherever required.
  7. Resolution Team: If automation isn’t available to resolve an incident, then the Tickets BOT will assign tickets by default to the human resolution team. In such cases, the BOT also provides the relevant KB article (as applicable), so that the resolver can close the ticket faster.

This concept is especially useful for clients who are fairly mature in their Automation & AI journey – since it requires, access to IT events data from their monitoring tools, and pre-built BOTs/scripts for automation.

If the above statement does not apply to you, then do reach out to us and we can help you get started.

Roji Philip

About

Roji Philip has over 23 years of diverse experience in the IT industry covering Application Development, Maintenance, Operations, Innovation and Business Development. In his current role, he works as a Solutions Architect for Automation and Artificial Intelligence technologies. He holds a master’s degree in Computer Applications, and a Post Graduate Diploma in Machine Learning and Artificial Intelligence. He is also a certified Project Management Professional (PMP) and a certified Scrum Product Owner (CSPO)

More on Roji Philip.

Related Posts

Your email address will not be published. Required fields are marked *