Today, we are witnessing the price of progress. As generative AI swiftly evolves amidst a booming landscape of adoption, the marvels of artificial intelligence are met with astounding costs and challenges. The allure from the VC community and tech giants, who have invested billions of dollars into startups specializing in generative AI technologies have not considered the underlying reality of these high costs that threaten this current boom.
As of June 2023, ChatGPT has received 60 million visits daily, with 10 million queries per day. As of April 2023, it was estimated that to run ChatGPT would cost $70,000 per day at an average cost of $0.36 per question. In June, however, “Tom Goldstein, an AI ML Professor at Maryland University, has estimated the daily cost of running ChatGPT to be approximately $100,000 and the monthly cost to be USD$3 million.”
This recent article profiled one startup, Latitude, which found itself grappling with exorbitant bills as their AI-powered games, like AI Dungeon, gained popularity. Latitude’s text-based role-playing game utilized OpenAI’s GPT language technology, resulting in soaring costs proportional to the game’s usage. Content marketers’ unexpected usage of AI Dungeon for generating promotional copy further exacerbated the startup’s financial strain.
One of the primary reasons for the high cost of generative AI is the substantial computing power required for “training and inference.” Training large language models (LLM) demands billions of calculations, and specialized hardware, such as graphics processors (GPUs). Nvidia, a leading GPU manufacturer, offers data center workhorse chips that can cost up to $10,000 each. Estimates suggest that training models like OpenAI’s GPT-3 could exceed $4 million, while more advanced models may reach the high-single-digit millions in training costs.
“For instance, Meta’s latest LLaMA model required a staggering 2,048 Nvidia A100 GPUs and over 1 million GPU hours, incurring costs over $2.4 million.” This may create a further toll on industry players like Microsoft, currently leveraging the technology, necessitating infrastructure costs reaching billions of dollars to cater to user demand.
I met with Laurent Gil, former lead of Oracle’s Internet Intelligence Group and current Cofounder of CAST AI, which is an ML powered cloud optimization platform that analyzes millions of data points, looking for the optimal balance of high performance at the lowest cost. CAST AI determines how much you can save, then reallocates your cloud resources in real time to hit the target with no impact to performance.
We discussed the true cost of embracing more advanced AI models.
Gil revealed that cloud services like AWS, Azure, and Google have a considerable portion of their bills allocated to compute power. This includes CPUs and memory, making up about 90% of costs, while the other half covers various services like storage and databases. He acknowledges that his answer would have been different 3 months ago.
“For an AI company, they are squeezing more towards compute and less over the rest, because most of the costs of running this model are on compute GPUs… We have many customers on the cloud, are we are currently managing and optimizing millions of CPUs every day.”
Recent observations reveal a surge in AI companies investing substantial amounts in training specialized AI models. These training processes involve immense compute usage, sometimes ranging from minimal CPU usage to tens of thousands of CPUs and GPUs running for hours to train the models effectively. This distinction is vital as it emphasizes that these high compute costs are specifically related to training AI models and not their inference or practical usage.
Gil explains that there are two types of AI engines: generic and specialized models. The generic models require extensive compute resources and are used by large companies dealing with vast amounts of data. Due to the high costs there might be fewer players in this category. However, he expresses excitement about the second type –specialized models. These models focus on solving specific problems exceptionally well and do not require extended periods of compute usage like generic models. He sees this specialization as the future of the industry, where companies will offer unique and powerful solutions based on their specialized data, leading to a new economy in the AI field.
What CAST AI provides is their customer’s ability to benefit from substantial cost optimization in managing their cloud expenses for AI operations. The real-time optimization process has shown an impressive average cost reduction of approximately 80%. Before utilizing these services, customers may typically spend $100 on AI model training, but with optimization, this cost diminishes to $20. By employing an AI engine into their solution, they can accurately understand and cater to these compute needs, providing precise allocations without any excess. When the AI model training is completed or the compute demand decreases, the system swiftly and automatically shuts down unnecessary machines, further contributing to cost savings.
For startups venturing in more sophisticated model development, Gil says to them,
“We help a lot these young startups that need to train an engine that is very expensive. We tell them, ‘Look, it’s very expensive, but it’s going to be 5 times less because our engine knows exactly what you need and we supply it in real time.’ This is universal in every company that comes to us.”
Bitcoin’s decentralized ledger has been a known culprit to the computing concerns, and the downstream environmental impacts. The emergence of more advanced artificial intelligence that becomes a panacea across industry seemingly replicates this same issue, but now across wider market adoption. Gil acknowledged the significant environmental impact as technology moves mainstream. However, he pointed out a critical aspect: the energy consumption of these GPUs. While the cost optimization achieved by CAST AI results in dollar savings, it also directly impacts energy consumption. By optimizing CPU usage, the energy required for computation decreases significantly. This energy efficiency is a fascinating byproduct leading the company to explore how to measure CO2 savings as well.
Gil explains that when CPUs are not in use, they consume minimal energy and emphasizes the two key impacts of their optimization process: reduced energy consumption and increased availability of machines for others. He elaborates,
“The energy consumption is much lower because you don’t need this machine for that long. But also, these machines become available for someone else, and that’s a great byproduct. If you use them for two hours instead of four hours, it means for the other two hours, the cloud provider can actually resell them somewhere else, so they don’t need to build more because they have capacity.”
This results in a win-win situation for both their clients and the cloud providers. By efficiently utilizing compute resources, CAST AI prevents wastage, making cloud providers able to accommodate more clients without building new data centers.
Gil explains the essence of compute and the value of cloud’s elasticity, “Cloud was invented for a great reason that we love: it’s elastic, which means you buy when you need it, you return it when you don’t need it. And if you have tools that follow the utilization I mentioned, then you can save in energy and cost because you only pay for what you use.”
He highlights the shift from the past approach of uncertainty, where users would add excessive resources just in case they needed them. However, with CAST AI’s smart technology, Gil emphasizes, “Just pay for what you need when you need it. We are one of many providers in the same field, and it’s great because we help solve the shortage of machines. We help solve the energy consumption problem. We help the data center better utilize their investment because now they can resell a few times more of what they own, and in the end, it’s better for the economy and the industry.”
What’s the next milestone for CAST AI? Gil noted they will be adding 100 people, after closing a recent investment round of $20M led by Creandum. The current boom is allowing them to “surf the waves”. For CAST AI, the more Nvidia sells GPUs the better off the world will be: “That’s how we see it, and the better the consumption and utilization of these resources will be. It’s really the beginning of it. The next phase is using them, and that’s where the economy will grow, and the industry will explode.”
The demand for compute resources, particularly GPUs, is surging, leading to potential strain on the energy grid and infrastructure. Despite the challenges, there are promising solutions like CAST AI to help reduce costs and energy consumption while making better use of available resources. This not only benefits individual businesses but also contributes to the overall efficiency of the industry.
As the demand for AI-powered applications continues to rise, the need for accessible and affordable compute resources will remain a critical concern. The key lies in finding a balance between innovation and sustainability, ensuring that as AI evolves, it can flourish without compromising the environment and economic resources.
Read the full article here