The summer brings holidays and sunshine, but for those of us working in technology, the correlation between hot weather and the risk of outages becomes very real. Not only does internet activity continue at its normal blistering pace consuming server power all over the world, but data centre operators are also challenged by the increased wear on overworked cooling equipment which ensures services remain up and running.
The impact of global warming means that this is now becoming an annual problem in the UK. While most enterprise data centres run at less than full utilisation, allowing reserved cooling capacity to help them in extreme heat, the story is very different for cloud providers in hyperscale data centres whose utilities are driven harder. This explains the outages suffered by Google and Oracle last July when the barometer hit 40˚ c.
Companies can’t afford critical server outages, either financially or reputationally. According to a recent survey conducted by Enterprise Management Associates (EMA), facilitate by BigPanda, the average monetary cost for unplanned outage downtime is $12,900 per minute, a number much higher than any typically cited across the industry. And with user expectations increasing by the day, the inability to access business critical applications, carry out video calls or connect to digital healthcare services, spells danger for brands whose reputations rely on high performing, always-on services.
Multi-cloud delivers resiliency
Enterprises have learned the hard way that if they’re building and delivering applications, they must invest in infrastructure and a distributed footprint that doesn’t put all its eggs in one data centre — or one cloud — basket. Taking a multi-cloud approach has allowed them to build resiliency and availability, while delivering performance at scale.
There’s no doubt that distributing the digital load across multiple providers makes sense. If one goes down, another one is available to ensure applications remain available. It removes the risk that comes with single-points-of-failure that take systems and services offline. There are other benefits, including not being locked into one cloud provider, cost and performance improvements, and the ability to create operational diversity.
But while the use of multiple clouds has proven to be a success for many, it has also required enterprises to make strategic decisions about how they manage their increasingly complex, distributed infrastructure. The use of multiple clouds that operate independently means companies are forced to think more carefully about how they steer their traffic, how they observe it, and what they need to do to control their cross-cloud workloads.
Traffic steering and observability
Workload distribution across multiple clouds cannot be traced with standard tooling, or even tooling provided by hyperscale data centres. Delivering consistency of service is best approached using solutions that are dedicated to traffic steering. These provide real-time information that enterprise network teams can respond to as conditions change. In the event of an outage suffered by one cloud provider, for example, intelligent traffic steering will use a variety of criteria to determine the issue and quickly reroute traffic away from the problem to ensure a seamless experience for users. But in
addition to this, intelligent, automated traffic steering solutions make life easier for organisations, so they can balance application delivery performance, capacity and cost using telemetry that provides full visibility into internet and cloud conditions.
To ensure traffic steering is finely tuned, companies need to have clear visibility into their traffic flows. This is not just to help identify an outage, but to observe and extract useful business insights, debug problems in the network and identify security weaknesses. Network observability tools help to tap into network data streams and analyse them in real-time and have proven to be invaluable in helping companies better understand the masses of data they are generating so they can act on the insights and improve performance.
Companies using a multi-cloud approach with the aim of building resiliency and availability need both traffic steering and observability so they can pivot as and when cloud service providers fail to perform as they should. If they put in the work upfront to calibrate the tools based on real-time conditions experienced by end users, and create policies to fully utilise these tools, they will be able to shift workloads automatically to available resources if they are faced with an outage.
Whether the summer heat, an extraordinary demand on Black Friday or a malicious cyber-attack causes the next outage, it will undoubtedly be neither the first nor the last, and companies need to act now to shore up their multi-cloud policies. Building resiliency will help them to address future problems, bringing redundant infrastructure, the right configuration, observability tools and dynamic traffic steering together to keep services, applications and users online.