Source: Alpha Community
Image Source: Generated by Wujie AI
Insufficient computing power is a problem currently faced by the entire AI industry. Just last week, following OpenAI's Devday, a series of new features attracted a large number of users to try out, causing widespread and prolonged outages of the ChatGPT and GPT APIs. Sam Altman also announced the suspension of new registrations for Plus members.
Currently, in the field of AI computing power, NVIDIA's GPUs dominate almost monopolistically. Whether it's the A100, H100, or the recently released H200, they are all benchmarks for AI computing chips. However, their GPUs face a problem: the deployment of GPU computing clusters in data centers cannot provide data quickly enough due to network connectivity, leading to underutilization during certain times, resulting in wasted computing power and ultimately increasing the total cost of ownership (TCO).
Enfabrica, a startup company, uses network chips developed specifically for AI data centers to increase the utilization of GPU performance nodes by 50% and reduce the computing cost of AI inference and training.
Recently, Enfabrica completed a $125 million Series B financing round led by Atreides Management, with NVIDIA participating as a strategic investor. Other investors in this round of financing include IAG Capital Partners, Liberty Global Ventures, Valor Equity Partners, Infinitum Partners, and Alumni Ventures. Its early investor, Sutter Hill Ventures, also continued to increase its investment.
This round of financing has increased the company's valuation by more than 5 times compared to the previous round, bringing its total financing to $148 million. Gavin Baker, founder of Atreides Management, has joined the board to assist the company's development and strategic direction.
Aiming at the major challenges in the AI computing power field, two senior chip industry professionals join forces to start a business
According to 650 Group, a research firm specializing in cloud computing supply chains, the scale of AI/ML computing demand may increase by 8 to 275 times every 24 months. Over the next ten years, servers based on AI/ML are expected to grow from 1% of the market to nearly 20%.
However, due to the characteristics of AI computing, the movement of data and metadata between distributed computing elements has created bottlenecks. SemiAnalysis analyst Dylan Patel pointed out that the growth rate of floating-point operations per second (FLOPs) for each generation of chips/packages exceeds the data input/output speed. This mismatch is becoming increasingly severe.
Enfabrica was co-founded by Rochan Sankar, who was previously the engineering director at chip giant Broadcom, and Shrijeet Mukherjee, who was responsible for network platforms and architecture at Google. They have a deep understanding and rich experience in chip and network architecture.
In terms of organizational structure, Sankar serves as CEO, and Mukherjee serves as CTO. Enfabrica's core team includes senior engineers from companies such as Cisco, Meta, and Intel in the AI, network, and chip fields.
Enfabrica targets the growing demand in the AI industry for "parallel, accelerated, and heterogeneous" basic computing power infrastructure (i.e., GPUs).
Rochan Sankar stated, "The biggest challenge brought about by the current AI revolution is the expansion of AI infrastructure—both in terms of computational cost and sustainability. Traditional network chips, such as switches, have difficulty keeping up with the data movement demands of modern AI workloads, creating bottlenecks for AI training or fine-tuning that require large datasets."
The AI computing field urgently needs to bridge the gap between the growing demand for AI workloads and the overall cost, efficiency, sustainability, and ease of expansion of computing clusters.
Enfabrica has launched Accelerated Compute Fabric (ACF-S) devices and solutions, which complement GPUs, CPUs, and accelerators, addressing critical network, I/O, and memory expansion issues in data center AI and high-performance computing clusters. It can reduce the computing cost of data center GPUs and accelerated computing clusters by 50%, expand memory by 50 times, and reduce the computing cost of large model inference by approximately 50% at the same performance point, achieving a reduction in total cost of ownership (TCO).
According to Dell'Oro Group, AI infrastructure investment will exceed $500 billion in data center capital expenditures by 2027. Additionally, according to IDC's forecast, hardware investments targeting AI are expected to have a compound annual growth rate of 20.5% over the next five years.
By 2027, the market size of interconnect semiconductors used in data centers will double from nearly $12.5 billion in 2022 to nearly $25 billion.
Gavin Baker, who joined Enfabrica's board, is the Chief Information Officer and Managing Partner of Atreides Management. The firm has previously invested in companies such as Nutanix, Jet.com, AppNexus, Dataminr, Cloudflare, and SpaceX, and he has served as a board member for some of these companies.
When discussing AI computing infrastructure, he mentioned several important improvements: "By using faster storage, better backend networks (especially Enfabrica), and now emerging linear pluggable/co-packaged optical devices and improved CPU/GPU integration (NVIDIA's GraceHopper, AMD's MI300, and Tesla's Dojo) to improve GPU utilization, these combined breakthroughs break the 'memory wall' and will further increase the return on investment for training—directly reducing training costs and indirectly increasing profitability by reducing inference costs."
In summary, architectures that have an advantage in "useful computing per unit of energy" will win, and we are rapidly moving towards more useful computing per unit of energy.
Helping NVIDIA GPU computing clusters break the "memory wall"
In the field of AI accelerated computing, the "memory wall" is a real problem, referring to the increasing gap between processing performance and the memory bandwidth required to provide that performance.
Compared to traditional CPU computing, the problem is more severe in AI computing using GPUs because GPUs have more cores, higher processing throughput, and a huge demand for data.
The data used in AI must first be organized and stored in memory before it can be processed by the GPU. Providing the necessary memory bandwidth and capacity for AI is an urgent problem that needs to be solved.
To address this issue, several key technologies can be leveraged: memory performance/capacity tiering and caching architectures previously used in CPU and distributed cluster computing; Remote Direct Memory Access (RDMA) network technology that supports scalable AI systems; and the widely recognized and adopted Compute Express Link (CXL) interface standard.
Enfabrica's solution integrates key technologies such as CXL.mem decoupling, performance/capacity tiering, and RDMA networks, creating a scalable, high-bandwidth, high-capacity, and latency-bounded memory hierarchy structure to serve any large-scale AI computing cluster.
Its first chip is called the Accelerated Compute Fabric (ACF) switching chip, which allows GPU computing pools to directly connect to tens of terabytes of local CXL.mem DRAM pools with extremely low latency.
Specifically, ACF further promotes memory tiering construction, achieving high-bandwidth access to PB-level DRAM distributed across the rest of the computing cluster and data center through 800GbE network ports. This builds a hierarchical data storage with near memory, near-far memory, network-far memory, and strict latency limits at each memory level. With the help of ACF, NVIDIA GPUs performing data processing can extract data from multiple locations without encountering speed barriers.
Enfabrica's solution, ACF-S, consists of multiple ACF chips, with 8-Tbps AI infrastructure network nodes, 800G Ethernet, PCIe Gen 5, and CXL 2.0+ interfaces. Compared to systems with eight NVIDIA H100 GPUs such as the NVIDIA DGX-H100 system and Meta Grand Teton, it can reduce I/O power consumption by up to 50% (saving 2 kilowatts per rack).
"ACF-S is an integrated solution that eliminates the need for traditional, disparate server I/O and network chips, such as rack-level network switches, server network interface controllers, and PCIe switches," explained Rochan Sankar.
ACF-S devices enable companies handling AI inference tasks to use as few GPUs, CPUs, and other AI accelerators as possible. This is because ACF-S can more effectively utilize existing hardware by quickly moving large amounts of data.
Furthermore, Enfabrica's solution can be used not only for large-scale AI inference but also for AI training, as well as non-AI use cases such as databases and grid computing.
Enfabrica plans to sell chips and solutions to system builders (cloud providers, data center operators) rather than building systems themselves. Sankar revealed that Enfabrica has a deep fit with the NVIDIA ecosystem, but they also plan to collaborate with a wider range of AI computing companies.
He said, "ACF-S remains neutral to the type and brand of AI processors used for AI computing, as well as the exact models deployed, allowing the construction of AI infrastructure across multiple different use cases and supporting multiple processor vendors without proprietary technology lock-in."
Faster Speed, Lower Energy Consumption, the Next Generation of AI Computing Power System is Taking Shape
Just one year after the release of the H100, NVIDIA has introduced the H200, demonstrating its urgency in maintaining its leading position in the field of AI computing power. Due to the explosive growth of generative AI over the past year, its competitors have also introduced powerful AI computing products, whether it's AMD's MI300 series chips or Microsoft's Maia chip, which is positioned against the H100.
AI computing power is an industry with a concentration of technology and capital. Faced with the "fight of the century" among industry giants, how can AI computing power startups survive? Enfabrica and d-Matrix, which we introduced earlier, have their own answers.
d-Matrix focuses on AI inference and has introduced AI inference-specific chips that are faster and more energy-efficient than NVIDIA's similar products. Enfabrica, on the other hand, is not directly "competing with NVIDIA," but rather, as an important part of the AI computing power system, it helps NVIDIA's GPUs (and other AI computing chips) break the "memory wall," reduce idle computing power, and overall improve the utilization of computing power systems.
Like all computing power systems, AI computing power systems have two important factors: speed and energy consumption. Although large-scale AI computing (whether training or inference) is run by computing clusters, faster computing speed and lower energy consumption are still the industry's overall focus.
NVIDIA's GPUs have a clear advantage in faster computing speed, while companies like Enfabrica are striving for lower energy consumption.
As Enfabrica's founder Rochan Sankar said, "For AI computing to truly become widespread, the cost curve must come down. The key is whether the computing power of GPUs is being better and more efficiently utilized."
Clearly, NVIDIA's investment in Enfabrica is also based on this logic. As Enfabrica's technology further increases the utilization of NVIDIA's GPU computing power, its leading position in the industry is expected to be further solidified.
However, in the face of this obvious and urgent demand, Enfabrica is not the only company in the industry addressing it. Industry giants such as Cisco have also introduced the Silicon One G200 and G202 series AI network hardware, and Broadcom is also active in this field. For Enfabrica to further grow, it still faces competition.
If the overseas AI industry is currently facing a temporary shortage of computing power, the Chinese AI industry is facing a long-term shortage of AI computing power. With further restrictions on NVIDIA's GPUs, the industry has a strong demand for domestic AI computing products. Currently, companies such as Huawei, Alibaba, Baidu, Moore Threads, and Cambricon are developing in the field of AI computing power, and it is hoped that they, as well as more companies, can help establish China's own AI computing power system.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。