Price plummets 70%: How the AI computing power rental bubble burst?

CN
链捕手
Follow
5 hours ago

Source: Eugene Cheah Substack Account

Author: Eugene Cheah

Translation: J1N, Techub News

The decline in AI computing power costs will spark a wave of innovation among startups utilizing low-cost resources.

Last year, due to a tight supply of AI computing power, the rental price of H100 reached as high as $8 per hour, but now, with an oversupply in the market, prices have dropped to below $2 per hour. This is because some companies signed computing power rental contracts early on and, to avoid wasting the excess capacity, began reselling their reserved computing resources. The market largely opted for open-source models, leading to a decrease in demand for new models. Currently, the supply of H100 in the market far exceeds demand, making renting H100 more cost-effective than purchasing, and investing in new H100 has become unprofitable.

A Brief History of the AI Race

The price of GPU computing power skyrocketed, with the initial rental price of H100 around $4.70 per hour, peaking at over $8. This was due to project founders needing to train their AI models quickly to secure the next round of funding and persuade investors.

ChatGPT was launched in November 2022, using the A100 series of GPUs. By March 2023, NVIDIA released the new H100 series GPUs, claiming that the performance of H100 was three times that of A100, while the price was only twice as high.

This was a huge attraction for AI startups, as the performance of GPUs directly determines the speed and scale of the AI models they can develop. The powerful performance of H100 meant that these companies could develop AI models that were faster, larger, and more efficient than before, potentially catching up to or surpassing industry leaders like OpenAI. Of course, this all depended on them having enough capital to purchase or rent a large number of H100.

With the significant performance boost of H100 and fierce competition in the AI field, many startups invested heavily to acquire H100 to accelerate their model training. This surge in demand caused the rental price of H100 to skyrocket, initially at $4.70 per hour, later exceeding $8.

These startups were willing to pay high rental prices because they were eager to train their models quickly to attract investor attention in the next funding round, aiming to secure hundreds of millions of dollars to continue expanding their businesses.

For computing centers (farms) with a large number of H100 GPUs, the demand for renting GPUs was extremely high, akin to "money on the table." This was because these AI startups were eager to rent H100 to train their models, even willing to prepay rental fees. This meant that GPU farms could rent out their GPUs at a long-term rate of $4.70 per hour (or higher).

According to calculations, if they could continue renting GPUs at this price, the return period on their investment in purchasing H100 (i.e., the time to recoup the purchase cost) would be less than 1.5 years. After the return period, each GPU could generate over $100,000 in net cash flow income annually.

Due to the sustained high demand for H100 and other high-performance GPUs, investors in GPU farms saw significant profit potential, leading them not only to agree to this business model but also to make larger investments to purchase more GPUs for greater profits.

“The Stupidity of Tulips”: After the first recorded speculative bubble in history, tulip prices soared in 1634 and crashed in February 1637

With the growing demand for artificial intelligence and big data processing, enterprises have seen a surge in demand for high-performance GPUs (especially NVIDIA's H100). To support these compute-intensive tasks, global enterprises initially invested about $600 billion in hardware and infrastructure to purchase GPUs, build data centers, etc., to enhance computing capabilities. However, due to supply chain delays, the price of H100 remained high for most of 2023, even exceeding $4.70 per hour, unless buyers were willing to pay large upfront deposits. By early 2024, as more suppliers entered the market, the rental price of H100 dropped to about $2.85, but I began receiving various sales emails reflecting the increased competition after the market supply expanded.

Although the initial rental price of H100 GPUs ranged from $8 to $16 per hour, by August 2024, auction-style rental prices had dropped to between $1 and $2 per hour. Market prices are expected to decline by 40% or more annually, far exceeding NVIDIA's forecast of maintaining a price of $4 per hour over the next four years. This rapid price decline poses financial risks for those who recently purchased high-priced new GPUs, as they may not be able to recover costs through rentals.

What is the Capital Return Rate for Investing $50,000 in an H100?

Without considering electricity and cooling costs, the purchase cost of H100 is about $50,000, with an expected lifespan of 5 years. There are generally two rental models: short-term on-demand rentals and long-term bookings. Short-term rentals are more expensive but offer greater flexibility, while long-term bookings are cheaper but more stable. The article will analyze the returns of these two models to calculate whether investors can recoup their costs and make a profit over 5 years.

Short-term On-Demand Rentals

Rental prices and corresponding returns:

>$2.85: Exceeds stock market IRR, profitable.

$2.85: Returns lower than stock market investment returns.

$1.65: Expected investment loss.

Using a "mixed price" model, it is predicted that rental prices may drop to 50% of the current price over the next 5 years. If rental prices remain at $4.50 per hour, the internal rate of return (IRR) exceeds 20%, making it profitable; however, when the price drops to $2.85 per hour, the IRR is only 10%, significantly reducing returns. If the price falls below $2.85, the investment return may even be lower than stock market returns, and when the price drops below $1.65, investors will face serious loss risks, especially for those who recently purchased H100 servers.

Note: The "mixed price" is a hypothesis that assumes the rental price of H100 gradually declines to half of the current price over the next 5 years. This estimate is considered optimistic, as current market prices are declining by over 40% annually, so considering a price drop is reasonable.

Long-term Booking Leases (3 years or more)

During the AI boom, many established infrastructure providers, based on past experiences—especially during the early Ethereum PoW era of cryptocurrency, which experienced cycles of GPU rental price surges and crashes—introduced high-priced prepayment rental contracts of 3-5 years in 2023 to lock in profits. These contracts typically require customers to pay prices above $4 per hour, even prepaying 50% to 100% of the rental fees. With the surge in AI demand, especially among foundational model companies in the image generation field, to seize market opportunities and be the first to use the latest GPU clusters, these companies had no choice but to sign these high-priced contracts to quickly complete their target models and enhance competitiveness. However, once model training was completed, these companies no longer needed these GPU resources, but due to contract lock-ins, they could not easily exit. To mitigate losses, they chose to resell these rented GPU resources to recover some costs. This led to a large number of resold GPU resources in the market, increasing supply and affecting rental prices and supply-demand relationships.

Current H100 Value Chain

Note: The value chain, also known as value chain analysis or value chain model, was proposed by Michael Porter in 1985 in his book "Competitive Advantage." Porter pointed out that for a company to develop a unique competitive advantage, it must create higher added value for its goods and services. Business strategy structures the company's operating model, becoming a series of value-adding processes, and this series of value-adding processes is the "value chain."

The H100 value chain spans from hardware to AI inference models, and the participants can be roughly divided into the following categories:

  • Hardware suppliers collaborating with Nvidia
  • Data center infrastructure providers and partners
  • Venture capital funds, large companies, and startups: planning to build foundational models (or have already completed model building)
  • Capacity distributors: Runpod, SFCompute, Together.ai, Vast.ai, GPUlist.ai, etc.

The current H100 value chain includes multiple links from hardware suppliers to data center providers, AI model development companies, capacity distributors, and AI inference service providers. The main pressure in the market comes from unused H100 capacity distributors continuously reselling or renting idle resources, as well as the widespread use of "good enough" open-source models (such as Llama 3), leading to a decrease in demand for H100. These two factors together have resulted in an oversupply of H100, putting downward pressure on market prices.

Market Trends: The Rise of Open-Source Weight Models

Open-source weight models refer to those whose weights have been publicly distributed for free, despite not having formal open-source licenses, and are widely used in commercial applications.

The demand for these models is primarily driven by two factors: the emergence of large open-source models (such as LLaMA3 and DeepSeek-v2) comparable in scale to GPT-4, and the maturity and widespread adoption of small (8 billion parameters) and medium-sized (70 billion parameters) fine-tuned models.

As these open-source models become increasingly mature, companies can easily access and utilize them to meet the needs of most AI applications, particularly in inference and fine-tuning. Although these models may slightly underperform compared to proprietary models in certain benchmark tests, their performance is sufficient to handle most commercial use cases. Therefore, with the proliferation of open-source weight models, the market demand for inference and fine-tuning is rapidly growing.

Open-source Weight Models Have Three Key Advantages:

First, open-source models offer high flexibility, allowing users to fine-tune the models based on specific domains or tasks, thus better adapting to different application scenarios. Second, open-source models provide reliability, as the model weights do not get updated without notice like some proprietary models, avoiding development issues caused by updates and increasing user trust in the models. Finally, they ensure security and privacy, as companies can ensure that their prompts and customer data are not leaked through third-party API endpoints, reducing data privacy risks. These advantages are driving the continued growth and widespread adoption of open-source models, especially in inference and fine-tuning.

Shift in Demand for Small and Medium-Sized Model Creators

Small and medium-sized model creators refer to businesses or startups that lack the capability or plans to train large foundational models (such as 70B parameter models) from scratch. With the rise of open-source models, many companies have realized that fine-tuning existing open-source models is more cost-effective than training a new model from scratch. As a result, an increasing number of companies are opting for fine-tuning rather than training models themselves. This significantly reduces the demand for computing resources like H100.

Fine-tuning is much cheaper than training from scratch. The computing resources required for fine-tuning existing models are far less than those needed to train a foundational model from scratch. Training large foundational models typically requires 16 or more H100 nodes, while fine-tuning usually only requires 1 to 4 nodes. This shift in the industry has reduced the demand for large clusters among small and medium-sized companies, directly decreasing reliance on H100 computing power.

Additionally, investment in foundational model creation has decreased. In 2023, many small and medium-sized companies attempted to create new foundational models, but now, unless they can bring innovation (such as better architecture or support for hundreds of languages), there are unlikely to be new foundational model creation projects. This is because there are already sufficiently powerful open-source models, like Llama 3, making it difficult for small companies to justify the rationale for creating new models. Investor interest and funding have also shifted towards fine-tuning rather than training from scratch, further reducing the demand for H100 resources.

Finally, the surplus capacity of reserved nodes is also an issue. Many companies reserved H100 resources long-term during the peak of 2023, but due to the shift towards fine-tuning, they found that these reserved nodes were no longer needed, and some hardware was outdated by the time it arrived. These unused H100 nodes are now being resold or rented out, further increasing market supply and leading to an oversupply of H100 resources.

Overall, with the proliferation of model fine-tuning, the reduction in the creation of small and medium-sized foundational models, and the surplus of reserved nodes, the demand for H100 in the market has significantly declined, exacerbating the oversupply situation.

Other Factors Leading to Increased GPU Supply and Decreased Demand

Large Model Creators Moving Away from Open-Source Cloud Platforms

Large AI model creators like Facebook, X.AI, and OpenAI are gradually shifting from public cloud platforms to building their own private computing clusters. First, existing public cloud resources (such as clusters with 1,000 nodes) can no longer meet their needs for training larger models. Second, from a financial perspective, building their own clusters is more advantageous, as purchasing data centers, servers, and other assets can increase company valuation, while renting public cloud resources is merely an expense that does not enhance assets. Additionally, these companies have sufficient resources and professional teams, and they can even acquire small data center companies to help them build and manage these systems. Therefore, they no longer rely on public clouds. As these companies move away from public cloud platforms, the demand for computing resources in the market decreases, potentially leading to unused resources re-entering the market, increasing supply.

Vast.ai essentially operates as a free market system where suppliers from around the world compete with each other

Simultaneous Launch of Idle and Delayed H100s

The simultaneous launch of idle and delayed H100 GPUs has increased market supply, leading to price declines. Platforms like Vast.ai adopt a free market model where global suppliers compete on price. In 2023, due to delays in H100 shipments, many resources were not launched in time; now, these delayed H100 resources are entering the market, along with new H200 and B200 devices, as well as idle computing resources from startups and enterprises. Owners of small and medium-sized clusters typically have 8 to 64 nodes, but due to low utilization and depleted funds, their goal is to quickly recoup costs by renting out resources at low prices. To achieve this, they choose to compete for customers through fixed rates, auction systems, or free market pricing, especially in auction and free market models, where suppliers compete to ensure resources are rented out, ultimately leading to a significant drop in overall market prices.

Cheaper GPU Alternatives

Another major factor is that once computing costs exceed budgets, there are many alternative options for AI inference infrastructure, especially if you are running smaller models. There is no need to pay extra for using Infiniband with H100.

Nvidia Market Segmentation

The emergence of cheaper alternatives for AI inference tasks using H100 GPUs directly affects the market demand for H100. First, while H100 excels in training and fine-tuning AI models, many cheaper GPUs can meet the needs in inference (i.e., running models), especially for smaller models. Inference tasks do not require the high-end features of H100 (such as Infiniband networks), allowing users to choose more economical alternatives to save costs.

Nvidia itself also offers alternative products in the inference market, such as the L40S, a GPU specifically designed for inference, which has about one-third the performance of H100 but costs only one-fifth. Although L40S is not as effective as H100 in multi-node training, it is powerful enough for single-node inference and fine-tuning of small clusters, providing users with a more cost-effective option.

H100 Infiniband Cluster Performance Configuration Table (August 2024)

AMD and Intel Alternative Suppliers

Additionally, AMD and Intel have also launched lower-priced GPUs, such as AMD's MX300 and Intel's Gaudi 3. These GPUs perform excellently in inference and single-node tasks, are cheaper than H100, and offer more memory and computing power. Although they have not yet been fully validated in large multi-node cluster training, they are mature enough for inference tasks, making them strong alternatives to H100.

These cheaper GPUs have proven capable of handling most inference tasks, especially inference and fine-tuning tasks on common model architectures (such as LLaMA 3). Therefore, after resolving compatibility issues, users can opt for these alternative GPUs to reduce costs. In summary, these alternatives in the inference domain are gradually replacing H100, particularly in small-scale inference and fine-tuning tasks, further reducing the demand for H100.

Decline in GPU Usage in the Web3 Sector

Due to fluctuations in the cryptocurrency market, the usage of GPUs in crypto mining has decreased, leading to a large influx of GPUs into the cloud market. Although these GPUs are not capable of handling complex AI training tasks due to hardware limitations, they perform well in simpler AI inference tasks, especially for budget-conscious users handling smaller models (below 10B parameters), making these GPUs a cost-effective choice. With optimization, these GPUs can even run larger models at a lower cost than using H100 nodes.

What is the Current Market Like After the GPU Computing Rental Bubble?

The challenges facing new entrants: New public cloud H100 clusters entering the market late may struggle to be profitable, and some investors may incur significant losses.

The profitability challenges faced by new public cloud H100 clusters entering the market. If rental prices are set too low (below $2.25), they may not cover operational costs, leading to losses; if priced too high (at $3 or above), they may lose customers, resulting in idle capacity. Additionally, clusters that enter the market later, having missed the early high prices ($4/hour), find it difficult to recoup costs, putting investors at risk of not being profitable. This makes cluster investment very challenging and could even lead to significant losses for investors.

Profitability situation for early entrants: Medium or large model creators who signed long-term rental contracts early have recouped costs and achieved profitability.

Medium and large model creators have gained value from long-term rental of H100 computing resources, the costs of which were covered during financing. Although some computing resources are not fully utilized, these companies have leveraged these clusters for current and future model training through the financing market, extracting value from them. Even with unused resources, they can generate additional income through resale or rental, which lowers market prices, reduces negative impacts, and overall has a positive effect on the ecosystem.

After the Bubble Burst: The Low-Cost H100 Can Accelerate the Wave of Open-Source AI Adoption

The emergence of low-cost H100 GPUs will drive the development of open-source AI. As H100 prices decline, AI developers and hobbyists can run and fine-tune open-source weight models more affordably, leading to broader adoption of these models. If future closed-source models (like GPT5++) do not achieve significant technological breakthroughs, the gap between open-source and closed-source models will narrow, promoting the development of AI applications. As the costs of AI inference and fine-tuning decrease, it may trigger a new wave of AI applications, accelerating overall market progress.

Conclusion: Do Not Purchase Brand New H100s

Investing in brand new H100 GPUs now is likely to result in losses. However, it may be reasonable to invest only under special circumstances, such as if a project can purchase discounted H100s, has access to cheap electricity, or if its AI product has sufficient competitiveness in the market. If you are considering investing, it is advisable to allocate funds to other areas or the stock market for better returns.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Share To
APP

X

Telegram

Facebook

Reddit

CopyLink