Author: Lin Zhijia
Source: Titanium Media
With NVIDIA taking the lead in the AI large model computing chip market and surpassing a trillion-dollar market value, Intel, AMD, and Chinese GPU chip companies are quietly competing, hoping to get a piece of the AI computing chip market.
On September 19, at the Intel On Technology Innovation Conference held in San Jose, 62-year-old Intel CEO Pat Gelsinger "fired up" with push-ups as the opening speech.
At this conference, Gelsinger released a series of new technological products in one go: the Intel Core Ultra processor, codenamed "Meteor Lake," based on Intel 4 (5nm) process; a preview of the fifth-generation Xeon server chip and the subsequent Xeon product roadmap; and the disclosure of the 5nm AI chip Gaudi 3.
Intel CEO Pat Gelsinger doing push-ups on the scene
Compared to previous years, Gelsinger, "transformed into an old leather jacket," talked extensively about the economic impact of AI computing power for nearly 2 hours. According to statistics from the Titanium Media App, Gelsinger mentioned "artificial intelligence" and "deep learning" related terms approximately 200 times in his speech.
Almost at the same time, AMD, a competitor of Intel and NVIDIA, released the latest EPYC 8004 CPU (central processing unit) chip and is expected to ship the MI300 series AI chip by the end of the year to compete with NVIDIA. In addition, in China, AI chip manufacturers such as Huawei and Tiansu Zhixin are actively deploying large model training inference and AI computing products.
"We are in fierce competition with the leader in the AI computing chip market, NVIDIA. But whether it's Gaudi2 or Gaudi3, we have taken a big step forward compared to them. We are gaining momentum, and the market is beginning to realize that there is another opportunity among the leaders in the AI chip industry," Gelsinger told CNBC on September 20.
Intensified market competition, "Old Huang" finds it difficult to monopolize the trillion-dollar AI computing power market
Since 2023, the AI large model "boom" represented by ChatGPT has swept the world, driving AI towards a more universal direction.
At the same time, the scarcity and high cost of computing power have become core factors restricting the development of AI. Computing power has also become an important cornerstone for the digitalization and intelligence transformation of the entire society, thereby driving a surge in demand for intelligent computing power.
According to data provided by AMD CEO Lisa Su, the potential total market size of global data center AI accelerators will reach approximately $30 billion in 2023, and it is expected to exceed $150 billion (about 10.95 trillion yuan) by 2027, with a compound annual growth rate of over 50%.
Manuvir Das, Vice President of Enterprise Computing at NVIDIA, provided another set of data showing that the potential market size (TAM) where AI is located will grow to $600 billion. Among them, $300 billion will be allocated to chips and systems, $150 billion to generative AI software, and another $150 billion will be contributed by NVIDIA's enterprise software.
It is obvious that the AI computing chip market is a large "cake."
However, currently, NVIDIA occupies 82% of the global data center AI accelerator market and monopolizes 95% of the market share in the global AI training field, making it the biggest winner in this round of AI competition. Huang Renxun and his NVIDIA company have made a fortune, with a market value exceeding $1 trillion.
At the same time, the surge in computing power demand has directly led to NVIDIA GPUs (graphics processing units) being "hard to find." The number of NVIDIA A100 graphics cards a company has has become a standard for measuring its computing power.
In fact, if a company wants to develop a universal large model, it needs to pay attention to two aspects at the computing power level: the number and price of graphics cards.
In terms of the number of graphics cards, OpenAI used 10,000-30,000 NVIDIA GPUs to train the GPT-3.5 model. According to the latest report from Jiabang Consulting, running ChatGPT may require the use of up to 30,000 NVIDIA GPU cards if calculated based on the processing power of the NVIDIA A100 graphics card. In addition, in terms of open source models, the Llama model was trained on 2048 80GB A100s, with the entire training computing power approaching 2000 PTOPS.
In terms of price, the price of the H800 currently available in China has risen to 200,000 yuan per card, while the price of the A100/A800 has risen to around 150,000 and 100,000 yuan per card. For example, with a demand for 2000P computing power, the total card price for the H800 GPU is estimated to be 200 million yuan, while the A800 single card computing power is approximately 0.625P, requiring 3200 cards, with an estimated total card price of up to 320 million yuan.
In addition to buying GPU cards, servers also need to consider the overall configuration, including CPU, storage, NV-Link communication connections, as well as factors such as power consumption, site rental, and operation and maintenance costs.
Currently, A800 and H800 servers are mainly in the 8-card model. To meet the 2000P computing power, it is necessary to configure 125 8-card H800 servers or 400 8-card A800 servers, with prices of 300 million and 560 million yuan, respectively. Moreover, due to the support for PCIe 5.0, the new generation of CPUs, and memory, the price needs to be increased to maximize its optimal computing power performance.
Therefore, from the perspective of large model training, the total cost of purchasing H800 is lower than that of A800, with higher cost performance, and lower than the cost of building a CPU—this is also what NVIDIA CEO Huang Renxun has been saying recently: "The more you buy, the more you save."
Of course, if you can't afford to buy, NVIDIA also thoughtfully launched the DGX super AI computing system for online leasing, which is open to enterprises for leasing. It is equipped with 8 H100 or A100 GPUs per node, 640GB of memory, and a monthly rent of $37,000, so companies do not need to build their own data centers to purchase a large number of GPU cards. This type of leasing has a high gross profit margin. According to a report on Microsoft's "cloud computing leasing" service, the business has a gross profit margin as high as 42%, making it Microsoft's new "cash cow."
In the domestic market, companies such as Inbo Technology, SenseTime AIDC, and more than 11 intelligent computing centers/cloud vendors also offer similar services. For large models, the overall price is more than 20% lower than the self-built price.
In addition, there is also the issue of large model training time. NVIDIA's latest NVIDIA L40S GPU is more efficient in training than the A800/H800 model. For a 7 billion parameter model, it takes 17 hours for HGX A800 to complete, while the L40S is 1.3 times faster, completing it in just half a day, not to mention a 175 billion parameter model, which can be trained in a weekend with L40S.
Overall, for a company to develop a large model, it needs to invest hundreds of millions of yuan in computing power costs, and this is just the "entry ticket."
Earlier, there were reports that Baidu, ByteDance, Tencent, and Alibaba had placed orders worth $5 billion with NVIDIA for chips, in addition to the previously hoarded number of graphics cards, the total value of NVIDIA GPU cards in China exceeds hundreds of billions of yuan. Market research firm Counterpoint released a report stating that despite the cyclical downturn in the semiconductor industry, Chinese companies such as Tencent and Baidu are still making large purchases of NVIDIA A800 chips.
Therefore, in such an important trillion-dollar market, both chip companies and downstream customers do not want to see "NVIDIA" monopolize the market. Therefore, AMD, Intel, and Chinese GPU chip companies are all trying to challenge NVIDIA's dominant position in the AI chip market.
AMD strikes first.
In the AI chip sector, at the Consumer Electronics Show (CES) in January 2023, AMD's Chairperson and CEO Lisa Su officially announced the next-generation data center APU (accelerated processor) product Instinct MI300, which adopts a Chiplet architecture design combining TSMC's 5nm+6nm process, integrating CPU and GPU, with 13 small chips and a high transistor count of 14.6 billion. The AI performance and performance per watt are 8 times and 5 times that of the previous generation MI250 (using sparse FP8 benchmark testing), and it is expected to be mass-produced and supplied in the second half of 2023.
In June, AMD also announced new AI acceleration chips, including Instinct MI300X and Instinct MI300A, specifically designed for generative AI, with 153 billion transistors and improvements in storage capacity and interconnect bandwidth. The transistor count of MI300X is twice that of H100, and the HBM3 high-bandwidth memory is 2.4 times that of H100. A single chip can run large models with 800 billion parameters, and it is expected to be shipped before the end of this year.
This not only comprehensively demonstrates the data center AI technology capabilities after the acquisition of Xilinx but also challenges NVIDIA's dominant position in AI computing chips.
Of course, it's not just GPUs and AI chips; AMD's expertise lies in making CPUs (central processing units), as data centers require the general computing capabilities of CPUs. In November last year, AMD released the fourth-generation data center EPYC 9004 series using the Zen 4 architecture, codenamed "Genoa," with not only architectural upgrades but also an extreme configuration: TSMC's 5nm process, 96 cores, 192 threads, 384M L3 cache, and support for PCle 5.0.
Compared to Intel's eight-core processors, AMD's data center and edge computing CPU series have seen significant improvements in energy consumption and performance, including a 40% reduction in the area of the Genoa chip and a 48% increase in energy efficiency.
In September this year, AMD introduced the latest fourth-generation EPYC 8004 series, introducing the "Zen 4c" core into dedicated CPUs, providing solutions from intelligent edge (such as retail, manufacturing, and telecommunications) to data centers and cloud computing.
In fact, Amazon Web Services (AWS) released the Genoa M7A general computing instance, with performance improvements of 50% compared to the previous generation, and 1.7-1.9 times performance improvements in multiple application scenarios compared to Intel's fourth-generation Xeon Platinum 8490H version. The overall energy efficiency has improved by 1.8 times, and it is widely used in high-performance computing fields such as financial modeling, weather simulation, and drug development. In addition, in IoT Edge gateway workloads, servers powered by the latest eight-core EPYC 8024P provide approximately 1.8 times the total throughput performance for every 8kW rack.
Overall, whether it's CPUs, GPUs, FPGAs, DPUs data center processors, or AMD's ROCm software stack, AMD is ready and "sharpening its knives" to challenge NVIDIA's products.
As a chip giant established for over 60 years, Intel also does not want to hand over the market.
On July 11 this year, chip giant Intel announced the release of the AI chip Habana Gaudi2 for the Chinese market, using a 7nm process, capable of running large language models, accelerating AI training and inference, with a performance per watt for running ResNet-50 approximately twice that of NVIDIA A100, and a 40% higher cost-effectiveness compared to NVIDIA-based solutions in AWS cloud, and is expected to surpass the latest H100 from NVIDIA in September this year.
Sandra Rivera, Intel's Executive Vice President, stated to Titanium Media App in July that it is not possible for a single company to dominate the AI chip market. The market needs diversity, and customers also want to see more chip companies take a leading role in the AI field.
Intel CEO Pat Gelsinger
In September, at the Intel On Technology Innovation Conference held in San Jose, Gelsinger announced the launch of the AI chip Gaudi 3 using a 5nm process next year, with its computing power being twice that of Gaudi 2, and the network bandwidth and HBM capacity being 1.5 times.
At the same time, Gelsinger also previewed the fifth-generation Intel Xeon Scalable server processor, stating that the next-generation Xeon will have 288 cores, which is expected to increase rack density by 2.5 times and improve performance per watt by 2.4 times. In addition, Intel also released Sierra Forest and Granite Rapids, with AI performance expected to increase by 2 to 3 times compared to the fourth-generation Xeon.
Alibaba Cloud's Chief Technology Officer Zhou Jingren stated that Alibaba will use the fourth-generation Intel Xeon Scalable processor for its generative AI and large language models, such as the "Alibaba Cloud Tongyi Qianwen Large Model," and Intel's technology has significantly reduced model response times, with an average acceleration of up to 3 times.
In addition, for AI large model training, the software ecosystem is important. Intel announced a collaboration with Arm to deploy its Xeon products on Arm CPUs, and also launched the AI inference and deployment runtime tool suite OpenVINO, which not only supports pre-trained models but also allows deployment across any available key cross-platform support with just one write, and has already supported Meta's Llama 2 model.
At the same time, the Linux Foundation announced the establishment of the Unified Accelerator (UXL) Foundation this week, providing open standards for accelerator programming models to simplify the development of high-performance, cross-platform applications. The core is the evolution of Intel's oneAPI plan for acceleration, with founding members including Arm, Google Cloud, Intel, Qualcomm, Samsung, and other companies—NVIDIA is not among them.
Intel's Senior Vice President and Chairman of Intel China, Wang Rui, stated to Titanium Media App and others that Intel will release a processor with 288 cores in the future. Data centers will continue to grow, and Intel will launch products such as Gaudi3 and Falcon Shores, forming a product matrix that will constitute the roadmap for the future development of accelerators and AI computing.
"We have embedded AI capabilities into the chip. Depending on different needs, the embedded AI capabilities will use different computing power and architectures to provide support," Wang Rui stated. In the data center, from the client to the edge, and then to the cloud, AI has permeated various application scenarios; from training large language models to training small-scale, universal language models, the influence of AI is everywhere.
At the end of August this year, Gelsinger stated that he believes Intel is moving towards achieving its grand restructuring goal and is moving towards restoring its leading position in the industry. When talking about NVIDIA, Gelsinger admitted that NVIDIA has a good layout and can seize the demand for systems needed to support AI software expansion, but he said that Intel will soon start winning orders in the accelerator chip market.
"They are doing well, and we commend them. But we are about to show our strength," Gelsinger said.
NVIDIA's market value evaporates by a trillion dollars, can domestic chips seize the opportunity?
NVIDIA's brilliant performance in 2023 seems to have weakened in the past two months.
According to Refinitiv data, although NVIDIA's stock price has risen by about 190% this year, its stock price performance in September has been poor: since August 31, NVIDIA's stock price has fallen by more than 10%, and its total market value has evaporated by more than $176 billion.
In fact, there are many factors contributing to the decline in NVIDIA's stock price.
First, the market is concerned about the Federal Reserve's prolonged maintenance of high interest rates to curb inflation, which has put pressure on the entire stock market, with the S&P 500 index averaging a 0.7% decline in September and a nearly 4% decline year-to-date.
Secondly, with the release of open-source models represented by LIama 2, more enterprises are starting to directly use these models, requiring only AI inference chips for application, leading to a reduced demand for computing training chips.
Finally, according to The Information, NVIDIA has been paying close attention to the supply of graphics cards to some small and medium-sized cloud computing companies in the United States. In the current situation of "difficult to obtain cards," NVIDIA's subsequent services and graphics card supply to large companies such as Google, Meta, and Chinese enterprises no longer seem to be a top priority, directly leading to market doubts about NVIDIA's product supply capabilities.
Of course, every rose has its thorn. NVIDIA has a significant first-mover advantage in the AI computing market. In addition to leading GPU chip performance, its extensive AI software ecosystem CUDA is unmatched by many. Furthermore, NVIDIA's NVLink high-speed GPU interconnect technology has become a "key weapon" for improving large model technology, with its impact far exceeding that of a single GPU card.
Wang Xiaochuan, founder and CEO of Baichuan Intelligence, mentioned that in this industry, the cost of GPU computing power accounts for approximately 40%-70%, with the ratio of network connection costs to GPU card costs being approximately 3:1.
"In the future, if we develop towards higher models, computing power reserves are crucial. From the perspective of training and inference, domestic AI chips are needed for inference, not just NVIDIA, but currently, NVIDIA is the best in training. In this 'computing power battle,' domestic AI chip companies in China must be able to compete," Wang Xiaochuan said.
In fact, in addition to the two major chip giants, with the surge in demand for AI computing power in the domestic "Battle of the Hundred Models," multiple AI chips from NVIDIA AI00/H100 and others are restricted from being exported to China, making it increasingly difficult for domestic enterprises to obtain high-end chips from the United States.
In October 2022, the Bureau of Industry and Security (BIS) under the U.S. Department of Commerce issued new export control regulations, quantifying various chip computing power, bandwidth, process, and other indicators, restricting U.S. companies from exporting high-computing power chips to China. This directly impacts the development of related industries such as AI, supercomputing, and data centers in China. NVIDIA, a GPU (graphics processor) manufacturer that mainly meets AI needs, received a notice from the U.S. government in August about the restriction of advanced chip exports.
NVIDIA responded quickly, producing the A800 chip in the third quarter of 2022 to replace the A100, which could no longer be shipped to China. This was the first "special supply" product launched by a U.S. company for this purpose. NVIDIA did not publicly disclose detailed parameters about the A800, but the product manual provided by its distributors showed that the peak computing power of the A800 is consistent with the restricted export A100, but the transmission rate is limited to two-thirds of the A100 to comply with the relevant requirements of the U.S. government. The latest "China Special Edition" H800 has a training computing power that is about 40% worse than the H100. Without the interconnect module provided by NVIDIA, the computing power gap can exceed 60%.
In July 2023, Intel launched a Chinese version of its Gaudi 2 product. Gaudi 2 is an ASIC (Application-Specific Integrated Circuit) chip mainly designed for high-performance deep learning AI training. Compared to the international version announced in May 2022, the Chinese version of Gaudi 2 integrates 21 Ethernet ports, reduced from 24. At the time, Intel stated that this was a relatively small change with limited impact on actual performance. Gelsinger recently stated that the company is currently selling the Chinese version of Gaudi 2 in China and hopes to continue doing so in the future.
Therefore, under the impact of foreign chip restrictions, Huawei, Horizon Robotics, Cambricon Technologies, and other domestic AI computing companies are actively positioning themselves to fill the gap in domestic AI computing power.
Currently, the domestic AI computing power market is mainly divided into three major factions: first, Huawei's Kunpeng and Ascend AI ecosystem computing solutions, with no participation from NVIDIA GPUs; second, hybrid computing support, with a large number of NVIDIA A100 chips and in some environments, AMD, Intel, as well as chips from Horizon Robotics, Cambricon Technologies, and Cambricon Technologies, and Hailight to support the training of large models; third, renting cost-effective server cloud computing power to supplement insufficient computing power.
At the 19th Summer Summit of the 2023 Yabuli China Entrepreneurs Forum held in August this year, Liu Qingfeng, founder and chairman of iFLYTEK, stated that Huawei's GPU technology capabilities are now comparable to NVIDIA A100, and it has now achieved a benchmark against NVIDIA A100.
On September 20, Huawei's Vice Chairman, Acting Chairman, and CFO Meng Wanzhou stated that Huawei has launched a new architecture Ascend AI computing cluster, which can support the training of large models with trillions of parameters. Huawei will continue to build a solid computing power foundation.
Guo Rujiang, Chairman and CEO of Horizon Robotics, revealed that several domestic large model companies have started using domestic GPU cards, and the company has supported the training of models with 7 billion parameters. In addition, most other domestic GPU companies are in the AI inference training stage.
Guo Rujiang believes that in China, NVIDIA's market share in the training field is over 95%, with some reaching 99%, essentially achieving a monopoly, mainly due to its hardware architecture and widely used CUDA ecosystem—already with over 3 million global users. Currently, domestic GPU companies are facing the challenge of ecosystem migration, and the transition to a new ecosystem will require a lot of time and cost due to the numerous CUDA-based codes.
At a recent roundtable event, Wang Ping, co-founder and chief architect of Horizon Robotics, mentioned that for AIGC customers, they not only need solutions for text generation and images but, more importantly, practical products. Therefore, there is a need for computing power products with high computing power and strong versatility to create value for customers. It is reported that Horizon Robotics' new generation AI chip products have an energy consumption advantage of more than three times that of mainstream global general-purpose GPU products.
Guo Rujiang stated that for Horizon Robotics, the next step is to optimize product iterations, relying on data, customer feedback, and technological innovation, and adjust to meet the specific needs in China. At the same time, the company will actively improve the ecosystem and software stack to ensure that users obtain the best experience in efficiency, cost, performance, and cost-effectiveness, to further promote the commercialization of products.
Wang Ping believes that due to the increased difficulty in obtaining high-end chips from the United States, although no domestic enterprise can produce chips that can truly replace them, he believes that domestic computing power will continue to grow. Chips need to be continuously iterated, and the more users and feedback, the more domestic AI chip companies can improve in subsequent iterations and enhance the user experience.
"For domestic general-purpose GPU companies, this is a major opportunity," Guo Rujiang told Titanium Media App.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。