Cost-Effective AI: Prioritizing the full tech stack over raw compute

By Narek Tatevosyan, Product Director at Nebius AI.

  • 2 months ago Posted in

The Generative AI boom is rapidly advancing technology and reshaping industries, but it's also driving an insatiable demand for computing power. Many AI start-ups are falling into the “compute trap” and focusing on gaining access to the latest, most powerful hardware regardless of cost, rather than optimizing their existing infrastructure or exploring more effective and efficient solutions to building GenAI applications.

While GPU power is undeniably critical to training large AI models and other machine learning applications, it is far from the only factor involved. Without the latest CPUs, high speed network interface cards like the InfiniBand 400 ND, DDR5 memory, and a motherboard and server rack that can tie it all together, it’s impossible for top-tier GPUs like the NVIDIA H100 to perform at their full potential. Taking a broader view of compute, combined with a holistic approach to AI development—focusing on efficient data preparation, optimized training runs, and developing scalable inference infrastructure—allows for more sustainable growth of AI applications.

The limits of compute: A costly pursuit for AI start-ups

In theory, more compute and larger datasets result in more powerful AI models. Take, Meta’s Llama 3.1 8B and 405B LLMs. They were both trained using the same 15 trillion token dataset on NVIDIA H100s – but the 8B version took 1.46 million GPU hours while the significantly more powerful 405B version took 30.84 million GPU hours.

In the real world, of course, there are also practical concerns, and very few AI companies can afford to compete directly with tech giants like Meta. Instead of falling into the compute trap and trying to match the compute spend of some of the richest companies in the world, many companies would benefit from focusing on the entire tech stack driving their ML development. 

While Llama 8B isn’t as powerful as Llama 405B, it still outperforms many older, larger models—thanks to innovations beyond just deploying more compute.

The power of integration: full-stack AI development

Managing the entire AI development lifecycle – from data preparation and labelling to model training, fine-tuning, and inference – on a single platform offers significant advantages. 

Developing and deploying an AI application on a single full-stack provider means your team only has to learn a single set of tools, rather than multiple different platforms. Keeping data on a single platform eliminates the inefficiencies of multi-cloud environments. Perhaps most usefully, if you run into any issues you are dealing with a single support team who understands the whole stack. 

There are potential financial benefits too: Using a single infrastructure for data handling, training, and inference can often lead to better pricing, lowering the overall cost of your AI operations.

Exploring alternatives: tailored platforms for AI start-ups

While major cloud providers like AWS, Microsoft Azure, and Google Cloud might seem like obvious choices, they’re not always the optimum fit for AI start-ups and scale-ups. 

Of all the cloud computing platforms available, the Big Three are the most expensive. If you have lots of venture funding or are a massive tech company, this might not be an issue – but for most AI companies bigger cloud providers don’t offer a good ROI. Furthermore, they aren’t optimised for AI specific operations, and so you are likely to pay significant premiums for features you don’t need.

Dedicated full-stack AI platforms like Nebius offer a much more attractive alternative. These platforms are designed specifically for AI development, providing affordable compute and hardware setups tailored for both training and inference. You can focus on developing, training and optimising your AI models confident that they’re running on the right hardware for the job, not navigating a sprawling server backend or wondering why your expensive GPUs aren’t getting the data throughput they should. 

While leveraging a full-stack approach to ML development requires thought and planning, doing it at the start will minimise your ongoing infrastructure costs. A better optimised application not only reduces the cost of training runs but should also reduce the cost of inference. And these savings can compound over multiple generations of AI models. A slightly more efficient prototype can lead to a much more efficient production model, and so on into the future. The choices made now could be what gives your company the runway it needs to reach an IPO.

By John Kreyling, Managing Director, Centiel UK.
By David de Santiago, Group AI & Digital Services Director at OCS.
By Krishna Sai, Senior VP of Technology and Engineering.
By Danny Lopez, CEO of Glasswall.
By Oz Olivo, VP, Product Management at Inrupt.
By Jason Beckett, Head of Technical Sales, Hitachi Vantara.