Also see: What is AIOps: Definition, Use Cases, and Future Trends
One of the questions I get asked a lot by customers, prospects, and partners is, “Will AIOps make them irrelevant?”
To them, AIOps is often equivalent to automated remediation; an AIOps system automatically detects an incident and kicks off a remediation process in response to this incident, knowing exactly what process will solve the problem. IT is out of the loop, data centers and NOCs just keep humming along unattended, end users are none the wiser.
Unfortunately, the above scenario is more science fiction than reality. While AIOps can and does automate routine IT processes, fully automated remediation without human intervention is still a long way off, if it ever does arrive.
The real benefits of AIOps are a bit more prosaic, but no less strategic. A more common scenario would be eliminating nearly all events and alerts by applying machine learning to alerts and correlating multiple alerts to the same events, removing duplicates. Now what if you can anticipate which alerts repeat regularly and become performance-impacting incidents, identify seasonal alert patterns that your customers can get in front of, and forecast repetitive alerts that can be ignored, reducing incident volumes? That’s a bit closer to the state of the art today.
The Value of Automated Remediation
Automated remediation typically complements rather than replaces the work of a human operator. Suppose a database supporting a key application fails with a file-system full alert. You could use auto-remediation to provision new database resources for an application, but you also would want to direct a human DBA to fix the database that crashed, either by provisioning more disk space or deleting unnecessary files. In a typical scenario, your AIOps system should offer suggested remediations rather than acting without human intervention.
Another common example where auto-remediation works hand in hand with human remediation is when the incident management system grants secure remote access to, say, a cloud instance to a human support team.
In these ways, AIOps doesn’t make the ITOps pro irrelevant, but it can save them from repetitive tasks, it can help them get to the root cause of a problem faster, it can make them a lot more effective in their jobs. Successful pilots allow you to expand the number of systems you’re managing in this way, improving your overall reliability and user experiences.
Barriers to AIOps Success
AIOps doesn’t happen in a vacuum. You don’t buy an AIOps platform and voila, you have AIOps. It starts with discovering your IT infrastructure and mapping your business services to them. You want to reduce the number of tools you’re using to monitor that IT infrastructure so you’re not duplicating efforts, metrics and alerts. You need to be able to make sense of all the data you’re collecting.
If there are barriers to AIOps success, they revolve around people not getting the initial steps right. They don’t know where their infrastructure is, who’s responsible for it, what business services it supports. They have siloed IT management with multiple teams (server, network, storage, applications, database, web, etc.) all using their own tools and no correlation of that data. AIOps won’t help these folks, they’ll be drowning in uncorrelated alerts regardless.
Another barrier is skillsets. There’s still a lot of demand for data scientists in IT organizations as the ability to manage and analyze large datasets is paramount for AIOps. Some experience with system design and architecture, security—a big cause of IT events—and a commitment to cross-functional teamwork are also good traits for an AIOps pro to have.
Future of AIOps
So where do we go from here? To sum up, today’s AIOps is mostly about correlating alerts, reducing IT incidents, resolving IT incidents faster, directing the right resources–human and system–for remediation.
In the future, I would look for AIOps to play a greater role in keeping systems, networks, even ITOps platforms themselves more available and optimized. I think it will also play more of a role in finding the right infrastructure–cloud or on-premises–to run a workload. We talk a lot about reducing tool sprawl, but I think ultimately AIOps will help to reduce infrastructure sprawl.
What else does the future hold? We’re going to hear a lot about cloud repatriation in the months and years ahead as organizations move workloads back from the cloud or from one cloud to another. We’re going to hear more about sustainability and IT’s role there, retiring unnecessary power-chugging server and storage resources, for example, and tracking power metrics alongside of performance metrics. AIOps will play a role in all these areas.
At the end of the day, AIOps is about getting the right data and knowing what to do with it. ITOps teams still will need to bring domain expertise and analytical skills to make those systems work. And AIOps in turn won’t replace those ITOps teams but make them better at their jobs.
About the author:
Sheen Khoury, Chief Revenue Officer, OpsRamp