Author: APPSO
In the past week, the DeepSeek R1 model from China has stirred the entire overseas AI community.
On one hand, it achieves performance comparable to OpenAI's o1 with lower training costs, illustrating China's advantages in engineering capabilities and large-scale innovation; on the other hand, it upholds the spirit of open source and is eager to share technical details.
Recently, a research team led by Jiayi Pan, a PhD student at the University of California, Berkeley, successfully replicated the key technology of DeepSeek R1-Zero—"Aha Moment"—at an extremely low cost (under $30).
So it’s no surprise that Meta CEO Mark Zuckerberg, Turing Award winner Yann LeCun, and DeepMind CEO Demis Hassabis have all given high praise to DeepSeek.
As the popularity of DeepSeek R1 continues to rise, this afternoon, the DeepSeek App briefly experienced server congestion due to a surge in user traffic, even "crashing" at one point.
OpenAI CEO Sam Altman also attempted to leak usage limits for o3-mini to reclaim the front page of international media—ChatGPT Plus members can query 100 times a day.
However, little known is that before its rise to fame, DeepSeek's parent company, Hquant, was actually one of the leading firms in the domestic quantitative private equity sector.
DeepSeek Model Shakes Silicon Valley, Value Continues to Rise
On December 26, 2024, DeepSeek officially released the DeepSeek-V3 large model.
This model performed excellently in multiple benchmark tests, surpassing mainstream top models in the industry, especially in knowledge Q&A, long text processing, code generation, and mathematical capabilities. For instance, in knowledge tasks like MMLU and GPQA, DeepSeek-V3's performance is close to the internationally top models Claude-3.5-Sonnet.
In terms of mathematical ability, it set new records in tests like AIME 2024 and CNMO 2024, surpassing all known open-source and closed-source models. Additionally, its generation speed improved by 200% compared to the previous generation, reaching 60 TPS, significantly enhancing user experience.
According to analysis from the independent evaluation site Artificial Analysis, DeepSeek-V3 surpassed other open-source models on multiple key metrics and performed on par with the world's top closed-source models GPT-4o and Claude-3.5-Sonnet.
The core technological advantages of DeepSeek-V3 include:
- Mixture of Experts (MoE) architecture: DeepSeek-V3 has 671 billion parameters, but in actual operation, only 37 billion parameters are activated for each input. This selective activation greatly reduces computational costs while maintaining high performance.
- Multi-Head Latent Attention (MLA): This architecture has been validated in DeepSeek-V2, enabling efficient training and inference.
- Load balancing strategy without auxiliary loss: This strategy aims to minimize the negative impact of load balancing on model performance.
- Multi-token prediction training objectives: This strategy enhances the overall performance of the model.
Efficient training framework: Utilizing the HAI-LLM framework, it supports 16-way Pipeline Parallelism (PP), 64-way Expert Parallelism (EP), and ZeRO-1 Data Parallelism (DP), and reduces training costs through various optimization methods.
More importantly, the training cost of DeepSeek-V3 is only $5.58 million, far lower than the $78 million training cost of GPT-4. Moreover, its API service pricing continues to follow a user-friendly approach.
Input tokens cost only 0.5 yuan per million (cache hit) or 2 yuan (cache miss), while output tokens cost only 8 yuan per million.
The Financial Times described it as "a dark horse that shocked the international tech community," believing its performance is comparable to that of well-funded American competitors like OpenAI. Chris McKay, founder of Maginative, further pointed out that the success of DeepSeek-V3 may redefine the established methods of AI model development.
In other words, the success of DeepSeek-V3 is also seen as a direct response to the U.S. export restrictions on computing power, as this external pressure has stimulated innovation in China.
DeepSeek Founder Liang Wenfeng, the Low-Key Zhejiang University Genius
The rise of DeepSeek has left Silicon Valley restless, and the founder Liang Wenfeng, who is behind this global AI industry model disruption, perfectly embodies the traditional trajectory of genius in China—achieving success at a young age and remaining relevant over time.
A good AI company leader needs to understand both technology and business, have vision and pragmatism, possess innovative courage and engineering discipline. Such composite talents are themselves a scarce resource.
At 17, he was admitted to Zhejiang University, majoring in Information and Electronic Engineering, and at 30, he founded Hquant, leading a team to explore fully automated quantitative trading. Liang Wenfeng's story confirms that geniuses always do the right thing at the right time.
- 2010: With the launch of the CSI 300 stock index futures, quantitative investment welcomed development opportunities, and the Hquant team seized the momentum, rapidly increasing self-operated funds.
- 2015: Liang Wenfeng co-founded Hquant with alumni, launching the first AI model the following year, which went live with trading positions generated by deep learning.
- 2017: Hquant claimed to have fully AI-automated its investment strategies.
- 2018: Established AI as the company's main development direction.
- 2019: Fund management scale surpassed 10 billion yuan, becoming one of the "four giants" in domestic quantitative private equity.
- 2021: Hquant became the first quantitative private equity firm in China to surpass 100 billion yuan in scale.
You cannot only remember this company during its successful moments; however, just like the transformation of quantitative trading companies into AI, which seems unexpected but is actually a natural progression—because they are both data-driven, technology-intensive industries.
Jensen Huang only wanted to sell gaming graphics cards to profit from us gamers, but unexpectedly became the world's largest AI arsenal; Hquant's entry into the AI field is quite similar. This evolution is more vital than many industries that are awkwardly applying AI large models.
Hquant has accumulated a wealth of data processing and algorithm optimization experience during its quantitative investment process, and possesses a large number of A100 chips, providing strong hardware support for AI model training. Since 2017, Hquant has been laying out AI computing power on a large scale, building high-performance computing clusters like "Firefly One" and "Firefly Two" to provide robust computing power support for AI model training.
In 2023, Hquant officially established DeepSeek, focusing on the research and development of large AI models. DeepSeek inherits Hquant's accumulation in technology, talent, and resources, quickly emerging in the AI field.
In a deep interview with "Dark Current," DeepSeek founder Liang Wenfeng also demonstrated a unique strategic vision.
Unlike most Chinese companies that choose to replicate the Llama architecture, DeepSeek directly addresses the model structure, aiming for the grand goal of AGI.
Liang Wenfeng candidly acknowledges the significant gap between current Chinese AI and international top levels, stating that the comprehensive gap in model structure, training dynamics, and data efficiency requires four times the computing power to achieve the same effect.
▲ Image from CCTV News screenshot
This attitude of facing challenges head-on stems from Liang Wenfeng's years of experience at Hquant.
He emphasizes that open source is not only about technology sharing but also a cultural expression; the real moat lies in the team's continuous innovation capability. DeepSeek's unique organizational culture encourages bottom-up innovation, downplays hierarchy, and values the enthusiasm and creativity of talent.
The team mainly consists of young people from top universities, adopting a natural division of labor model that allows employees to explore and collaborate autonomously. During recruitment, more emphasis is placed on employees' passion and curiosity rather than traditional notions of experience and background.
Regarding the industry's prospects, Liang Wenfeng believes that AI is currently in a period of technological innovation explosion, rather than an application explosion. He emphasizes that China needs more original technological innovations and cannot remain in a stage of imitation forever; someone needs to stand at the forefront of technology.
Even though companies like OpenAI are currently in a leading position, opportunities for innovation still exist.
Turning Silicon Valley Upside Down, DeepSeek Makes the Overseas AI Community Uneasy
Although the industry's evaluations of DeepSeek vary, we have also gathered some comments from industry insiders.
NVIDIA GEAR Lab project leader Jim Fan gave high praise to DeepSeek-R1.
He pointed out that this represents non-American companies practicing OpenAI's original open mission, achieving influence through the public release of raw algorithms and learning curves, while also implicitly critiquing OpenAI.
DeepSeek-R1 not only open-sourced a series of models but also disclosed all training secrets. It may be the first open-source project to demonstrate significant and sustained growth in the RL flywheel.
Influence can be achieved through legendary projects like "ASI Internal Implementation" or "Strawberry Project," or simply by publicly sharing raw algorithms and matplotlib learning curves.
Marc Andreesen, founder of top Wall Street venture capital firm A16Z, believes DeepSeek R1 is one of the most astonishing and impressive breakthroughs he has seen, and as an open-source project, it is a profoundly meaningful gift to the world.
Former senior researcher at Tencent and postdoctoral fellow in artificial intelligence at Peking University, Lu Jing, analyzed from the perspective of technological accumulation. He pointed out that DeepSeek did not suddenly become popular; it inherited many innovations from the previous generation of model versions, and the related model architecture and algorithm innovations have been iteratively validated, making its impact on the industry inevitable.
Turing Award winner and Meta's chief AI scientist Yann LeCun offered a new perspective:
"For those who see DeepSeek's performance and think 'China is surpassing the U.S. in AI,' your interpretation is wrong. The correct interpretation should be, 'Open-source models are surpassing proprietary models.'"
DeepMind CEO Demis Hassabis's evaluation revealed a hint of concern:
"The achievements of DeepSeek are impressive. I think we need to consider how to maintain the leading position of Western frontier models. I believe the West is still ahead, but it is certain that China has extremely strong engineering and scaling capabilities."
Microsoft CEO Satya Nadella stated at the World Economic Forum in Davos, Switzerland, that DeepSeek has effectively developed an open-source model that not only performs excellently in inference computing but also has extremely high supercomputing efficiency.
He emphasized that Microsoft must respond to these groundbreaking advancements from China with the utmost seriousness.
Meta CEO Mark Zuckerberg provided a deeper evaluation, stating that the technical strength and performance demonstrated by DeepSeek are impressive, and pointed out that the AI gap between China and the U.S. has become minimal, with China's full-speed sprint making this competition increasingly intense.
Reactions from competitors may be the best recognition for DeepSeek. According to leaks from Meta employees on the anonymous workplace community TeamBlind, the emergence of DeepSeek-V3 and R1 has thrown Meta's generative AI team into a panic.
Meta engineers are racing against time to analyze DeepSeek's technology, trying to replicate any possible techniques.
The reason is that the training cost of DeepSeek-V3 is only $5.58 million, a figure that is even less than the annual salary of some Meta executives. Such a stark input-output ratio has put pressure on Meta's management when explaining its massive AI R&D budget.
International mainstream media have also paid close attention to the rise of DeepSeek.
The Financial Times pointed out that DeepSeek's success has overturned the traditional perception that "AI R&D must rely on huge investments," proving that a precise technological route can also achieve excellent research results. More importantly, the DeepSeek team's selfless sharing of technological innovations has made this company, which focuses more on research value, a particularly strong competitor.
The Economist stated that China's rapid breakthroughs in AI technology in terms of cost-effectiveness have begun to shake America's technological advantage, which could affect the U.S.'s productivity growth and economic potential over the next decade.
The New York Times approached from another angle, noting that DeepSeek-V3's performance is comparable to high-end chatbots from American companies, but at a significantly lower cost.
This indicates that even under chip export controls, Chinese companies can compete through innovation and efficient resource utilization. Furthermore, the U.S. government's chip restriction policies may backfire, inadvertently promoting China's innovative breakthroughs in open-source AI technology.
DeepSeek "Claims to be GPT-4"
Amidst the praise, DeepSeek also faces some controversies.
Many outsiders believe that DeepSeek may have used output data from models like ChatGPT as training materials during the training process, transferring the "knowledge" from this data to DeepSeek's own model through model distillation techniques.
This practice is not uncommon in the AI field, but skeptics are concerned about whether DeepSeek used OpenAI model output data without sufficient disclosure. This seems to be reflected in DeepSeek-V3's self-perception.
Earlier, users discovered that when asked about its identity, the model mistakenly identified itself as GPT-4.
High-quality data has always been a crucial factor in AI development, and even OpenAI has faced controversies over data acquisition, having faced numerous copyright lawsuits for its practice of scraping data from the internet. As of now, the first-instance ruling between OpenAI and The New York Times has yet to be finalized, and new cases have emerged.
As a result, DeepSeek has also faced public insinuations from Sam Altman and John Schulman.
"Copying what you know works is (relatively) easy. When you don't know if it will work, doing something new, risky, and difficult is very hard."
However, the DeepSeek team explicitly stated in the R1 technical report that they did not use OpenAI model output data and claimed to have achieved high performance through reinforcement learning and unique training strategies.
For example, they adopted a multi-stage training approach, including base model training, reinforcement learning (RL) training, and fine-tuning. This multi-stage cyclical training method helps the model absorb different knowledge and capabilities at different stages.
Saving Money is Also a Technical Skill: The Valuable Insights Behind DeepSeek's Technology
The DeepSeek-R1 technical report mentions a noteworthy discovery: the "aha moment" that occurred during the R1 zero training process. In the mid-training phase of the model, DeepSeek-R1-Zero began to actively reassess its initial problem-solving approach and allocated more time to optimize strategies (such as trying different solutions multiple times).
In other words, through the RL framework, AI may spontaneously develop human-like reasoning abilities, even surpassing the limitations of preset rules. This also holds promise for developing more autonomous and adaptive AI models, such as dynamically adjusting strategies in complex decision-making (medical diagnosis, algorithm design).
Meanwhile, many industry insiders are trying to delve into DeepSeek's technical report. Former OpenAI co-founder Andrej Karpathy stated after the release of DeepSeek V3:
DeepSeek (this Chinese AI company) has made it easy today; it publicly released a cutting-edge language model (LLM) and completed training on an extremely low budget (2048 GPUs, over 2 months, costing $6 million).
For reference, this capability typically requires a cluster of 16K GPUs to support, while most of these advanced systems now use around 100K GPUs. For example, Llama 3 (405B parameters) used 30.8 million GPU hours, while DeepSeek-V3 seems to be a more powerful model that only used 2.8 million GPU hours (about 1/11 of Llama 3's computational load).
If this model also performs well in practical tests (for example, the LLM Arena ranking is ongoing, and my quick tests performed well), then this will be a very impressive achievement that showcases research and engineering capabilities under resource constraints.
So, does this mean we no longer need large GPU clusters to train cutting-edge LLMs? Not necessarily, but it indicates that you must ensure that the resources you use are not wasted. This case demonstrates that data and algorithm optimization can still lead to significant progress. Additionally, this technical report is also very impressive and detailed, worth reading.
In response to the controversy surrounding DeepSeek V3's alleged use of ChatGPT data, Karpathy stated that large language models do not inherently possess human-like self-awareness. Whether a model can correctly identify its own identity depends entirely on whether the development team has specifically constructed a self-awareness training set. If not explicitly trained, the model will respond based on the closest information in the training data.
Moreover, the model identifying itself as ChatGPT is not the issue at hand. Given the ubiquity of ChatGPT-related data on the internet, such a response actually reflects a natural phenomenon of "emergent knowledge proximity."
After reading the technical report of DeepSeek-R1, Jim Fan pointed out:
The most important point of this paper is: it is completely driven by reinforcement learning, with no involvement of any supervised learning (SFT). This approach is similar to AlphaZero—mastering Go, Shogi, and Chess from scratch through "cold start" without needing to imitate human players' moves.
– It uses real rewards calculated based on hard-coded rules, rather than learning-based reward models that can be easily "cracked" by reinforcement learning.
– The model's thinking time steadily increases as the training progresses; this is not pre-programmed but rather a spontaneous characteristic.
– There are phenomena of self-reflection and exploratory behavior.
– GRPO is used instead of PPO: GRPO removes the commentator network in PPO and instead uses the average reward of multiple samples. This is a simple method that can reduce memory usage. Notably, GRPO was invented by the DeepSeek team in February 2024, which is truly a very powerful team.
On the same day, when Kimi also released similar research results, Jim Fan noticed that both companies' research paths converged:
- Both abandoned complex tree search methods like MCTS in favor of simpler linearized thinking trajectories, adopting traditional autoregressive prediction methods.
- Both avoided using value functions that require additional model copies, reducing computational resource demands and improving training efficiency.
- Both discarded dense reward modeling, relying as much as possible on real outcomes as guidance, ensuring training stability.
However, there are also significant differences between the two:
- DeepSeek adopts an AlphaZero-style pure RL cold start method, while Kimi k1.5 chooses an AlphaGo-Master-style warm-up strategy, using lightweight SFT.
- DeepSeek is open-sourced under the MIT license, while Kimi performs excellently in multimodal benchmark tests, with its paper containing richer details on system design, covering RL infrastructure, mixed clusters, code sandboxes, and parallel strategies.
Nevertheless, in this rapidly iterating AI market, leading advantages are often fleeting. Other model companies will undoubtedly quickly absorb DeepSeek's experiences and improve upon them, potentially catching up soon.
The Initiator of the Large Model Price War
Many people know that DeepSeek is nicknamed "the Pinduoduo of AI," but they may not understand that this nickname actually stems from the large model price war that erupted last year.
On May 6, 2024, DeepSeek released the DeepSeek-V2 open-source MoE model, achieving breakthroughs in both performance and cost through innovative architectures such as MLA (Multi-Head Latent Attention Mechanism) and MoE (Mixture of Experts).
The inference cost was reduced to only 1 RMB per million tokens, about one-seventh of Llama3 70B and one-seventieth of GPT-4 Turbo at that time. This technological breakthrough enabled DeepSeek to provide highly cost-effective services without incurring losses, while also putting immense competitive pressure on other vendors.
The release of DeepSeek-V2 triggered a chain reaction, with ByteDance, Baidu, Alibaba, Tencent, and Zhizhu AI all following suit, significantly lowering the prices of their large model products. The impact of this price war even crossed the Pacific, drawing significant attention from Silicon Valley.
As a result, DeepSeek earned the title of "the Pinduoduo of AI."
In response to external doubts, DeepSeek founder Liang Wenfeng stated in an interview with AnYun:
"Attracting users is not our primary goal. We lowered prices partly because we have explored the structure of the next-generation model, and costs have come down; on the other hand, we believe that both APIs and AI should be inclusive and affordable for everyone."
In fact, the significance of this price war goes far beyond competition itself. The lower entry barriers allow more enterprises and developers to access and apply cutting-edge AI, while also forcing the entire industry to rethink pricing strategies. It was during this period that DeepSeek began to enter the public eye and emerge prominently.
Spending a Fortune to Hire AI Genius
A few weeks ago, DeepSeek experienced a notable personnel change.
According to Yicai, Lei Jun successfully poached Luo Fuli with an annual salary of ten million, appointing her as the head of the large model team at Xiaomi AI Lab.
Luo Fuli joined DeepSeek, a subsidiary of Huanfang Quantitative, in 2022, and her presence can be seen in important reports such as DeepSeek-V2 and the latest R1.
Subsequently, DeepSeek, which had focused on the B-end, also began to lay out the C-end, launching a mobile application. As of the time of writing, DeepSeek's mobile application reached the second position in the free version of the Apple App Store, demonstrating strong competitiveness.
A series of small peaks has made DeepSeek well-known, but it is also building up to a higher climax. On the evening of January 20, the ultra-large model DeepSeek R1, with 660 billion parameters, was officially released.
This model performed excellently in mathematical tasks, achieving a pass@1 score of 79.8% at AIME 2024, slightly surpassing OpenAI-o1; it scored as high as 97.3% on MATH-500, comparable to OpenAI-o1.
In programming tasks, it achieved a 2029 Elo rating on Codeforces, surpassing 96.3% of human participants. In knowledge benchmark tests such as MMLU, MMLU-Pro, and GPQA Diamond, DeepSeek R1 scored 90.8%, 84.0%, and 71.5%, respectively, slightly lower than OpenAI-o1 but better than other closed-source models.
In the latest comprehensive ranking of the large model arena LM Arena, DeepSeek R1 ranked third, tied with o1.
- In "Hard Prompts," "Coding," and "Math," DeepSeek R1 ranked first.
- In "Style Control," DeepSeek R1 tied for first with o1.
- In the test of "Hard Prompt with Style Control," DeepSeek R1 also tied for first with o1.
In terms of open-source strategy, R1 adopts the MIT License, granting users maximum freedom of use, supporting model distillation, and allowing inference capabilities to be distilled into smaller models. For instance, the 32B and 70B models achieved performance comparable to o1-mini in multiple capabilities, with the open-source effort even surpassing that of Meta, which has been criticized in the past.
The emergence of DeepSeek R1 allows domestic users to access models comparable to o1 level for free for the first time, breaking down long-standing information barriers. The discussion wave it has stirred on social platforms like Xiaohongshu is comparable to the initial release of GPT-4.
Going Global, Avoiding Internal Competition
Looking back at DeepSeek's development trajectory, its success formula is clear: strength is the foundation, but brand recognition is the moat.
In a conversation with "Wandian," MiniMax CEO Yan Junjie shared his insights on the AI industry and the company's strategic shift. He emphasized two key turning points: first, recognizing the importance of a technology brand, and second, understanding the value of an open-source strategy.
Yan Junjie believes that in the AI field, the speed of technological evolution is more important than current achievements, and open-source can accelerate this process through community feedback. Additionally, a strong technology brand is crucial for attracting talent and acquiring resources.
Taking OpenAI as an example, despite later experiencing management turmoil, its early establishment of an innovative image and open-source spirit has garnered it a first wave of good impressions. Even though Claude has since become technically competitive and gradually encroached on OpenAI's B-end users, OpenAI still leads significantly among C-end users due to user path dependence.
In the AI field, the real competitive stage is always global. Going global, avoiding internal competition, and promoting oneself is undoubtedly a good path.
This wave of going global has already created ripples in the industry. Earlier, Qwen, Mianbi Intelligence, and recently DeepSeek R1, Kimi v1.5, and Doubao v1.5 Pro have all made significant noise overseas.
While 2025 has been dubbed the year of intelligent agents and the year of AI glasses, this year will also be an important milestone for Chinese AI companies embracing the global market. Going out will become an unavoidable keyword.
Moreover, the open-source strategy is also a good move, attracting a large number of tech bloggers and developers to spontaneously become DeepSeek's "word-of-mouth promoters." Technology for good should not just be a slogan; from the slogan "AI for All" to true technological inclusivity, DeepSeek has forged a path that is purer than OpenAI's.
If OpenAI has shown us the power of AI, then DeepSeek has made us believe:
This power will ultimately benefit everyone.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。