How to achieve incident management best practice

By Jere Jutila, Director of Business Development at Miradore.

  • 9 months ago Posted in

Businesses face the risk of disruption every day, and as IT environments become larger and more complex, these risks only increase.

A minor oversight or flaw in the IT environment can cause data loss or a lack of service for the end user which can affect progress, reputation and the bottom line. And even more so in the event of an external cyber-attack which are becoming increasingly common, no matter a business’s industry or size.

In the event of an incident, companies should have established plans in place to efficiently and quickly assess and overcome the challenge to return to business as usual. This is called ‘Incident Management’.

Here, I explore the concept of incident management, the key steps and stages of incident management and critical incident management, and best practices to employ in the face of a problem.

What is incident management?

Most often utilised by DevOps teams, incident management aims to mitigate, manage, and improve company responses to a technical problem or unplanned event that may disrupt operations and services.

This process prepares businesses for such an event and ensures effective strategies are in place to restore operations as quickly and with as few consequences as possible.

Examples of an incident can include network outages, hardware malfunction, and malware attacks which are on the rise.

In 2022, an estimated 236.1 million ransomware attacks occurred globally. Despite this, just 45 percent of companies have an incident response plan in place to combat a problem like this.

Incident management measures are essential to minimise the impact of an unanticipated event, as when left unaddressed it can affect the finances, resources, and reputation of an organisation.

Not only do incident strategies outline the risks posed by unplanned events and enact a swift response, they also help to identify weaknesses in business operations, services, and existing plans that can be strengthened ahead of future disruption.

Although problem and incident management seem similar, the key difference between the two is incident management aims to resolve an issue as it occurs, while problem management focuses on understanding how a problem arose post-incident and enacting measures to prevent it from occurring again.

Both, however, are important to ensure the smooth running of operations.

The five stages of incident management

Improving incident management strategies is a priority for many organisations, as 55 percent of surveyed organisations stated they would like to improve their incident response and containment time.

An effective management process should be structured around five key stages to maximise its effectiveness.

The incident must first be identified by assigning it a unique reference, the date it occurred, a concise description of the problem, and the name of the employee(s) tasked with addressing it – I.e., the ‘incident manager.’

Below is an example of a simple incident management ticket:

· Incident ID: DDoS #4432-B

· Description: A DDoS attack brought down our stock website for thirty minutes.

· Date of Occurrence: 13/02/2023

· Incident Manager: John Smith

Identification ensures all essential information is collated which will make it more traceable and confirms employee responsibility from the outset.

The second step is categorisation. The identified incident must be grouped with others of a similar timeframe or nature, as this will make it easier for employees to find these incidents quickly which will increase the speed of resolution and ultimately improve end-user satisfaction.

Prioritisation is the next step. Businesses must consider team workloads and other problems that need addressing before agreeing on an order of urgency. Incidents with the largest and most immediate impact should be addressed first, reducing the time essential services are non-operational.

Once these steps have been completed, the incident can receive a response. This should be handled by the team most equipped based on the information gathered in step one.

The problem must be resolved quickly, so the team must have plenty of workload capacity as well as training and an adequate number of team members assigned.

Building an effective incident response is crucial for swiftly rectifying problems and learning from them. Delegation, completing regular reports, and communication is key.

Finally, once an incident has been resolved the report can be closed and archived for future reference.

A post-incident meeting can be scheduled to discuss the event, how it was handled, improvements that could be made ahead of another problem, and a plan of action to avoid future incidents.

How does critical incident management differ?

The label of a ‘critical incident’ is reserved for occurrences that may risk the safety of shareholders, clients, or the work processes of the entire business.

An incident that stops employees from performing their responsibilities or inhibits a user’s ability to access a service also qualifies as a critical incident.

Like incident management, responding to a critical incident follows similar steps. The key differences are that a critical problem needs to be labelled as critical to be deemed high priority and all stakeholders must be informed of the incident.

Incident management best practice

When it comes to incident management, there are four important considerations to achieve best practice.

Firstly, planning and rehearsals are essential for laying the basic foundations for incident management. This training prepares teams for the most common incidents and organises a set response that can be followed when faced with a real-time incident – leaving no room for panic.

A company’s ability to resolve an incident is dependent on the ability of its employees. Team members must be able to work both collaboratively and independently to manage a problem and implement solutions.

This rapport and necessary skill base can be established via team-building exercises and upskilling programmes to identify and close skill gaps in the workforce which will equip employees with the skills they need to effectively and confidently manage an incident.

Clear and open communication between individuals and teams is pivotal when responding to an incident efficiently and effectively, making sure important information can reach those who need it quickly to help inform accurate decisions.

Communication channels should be established and made clear before an incident occurs.

Perhaps most importantly, once an incident has been resolved, it must be learned from. Constructive criticism and identifying a set of positive actions that can be taken to close vulnerabilities and weaknesses in any processes are crucial to protect future business continuity.

Obsolete actions and plans will slow the incident management process and complicate matters for teams attempting to fix issues. So, understanding what precisely went wrong and amending best practices and training accordingly will ensure that management strategies are up to date and can rectify a problem before damage is caused.

By Gregg Ostrowski, CTO Advisor, Cisco Observability.
By Richard Eglon. CMO Nebula Global Services and Joanne Ballard, MD Mundus Consulting.
By Rosemary Thomas, Senior Technical Researcher, AI Labs, Version 1.
By Ram Chakravarti, chief technology officer, BMC Software.
Anders Brejner, Investment Director and Enabling Solutions Lead at Circularity Capital, discusses...