The hallucinations produced by DeepSeek, similarly sparked by curiosity, may indeed represent the dual nature of innovation.
Author: Zhou Yue, Economic Observer
Image source: Generated by Boundless AI
Introduction
1 || For companies like Google, Meta, and Anthropic, replicating reasoning models similar to DeepSeek-R1 is not a difficult task. However, in the competition among giants, even a small decision-making error can lead to missed opportunities.
2 || The net computational cost of the DeepSeek-V3 model is approximately $5.58 million, which is already quite efficient. Beyond costs, what excites AI industry professionals even more is DeepSeek's unique technological path, algorithmic innovations, and commitment to open source.
3 || All large models face the "hallucination" problem, and DeepSeek is no exception. Some users have reported that due to its superior expressive ability and logical reasoning, the hallucination issues produced by DeepSeek are even harder to identify.
In the past few weeks, DeepSeek has stirred up a storm globally.
The most obvious reflection was in the U.S. stock market: on January 27, AI and chip stocks plummeted, with Nvidia closing down over 17%, evaporating $589 billion in market value in a single day, setting a record for the largest single-day loss in U.S. stock market history.
From the perspective of some self-media and the public, DeepSeek is seen as the "most exciting protagonist of 2025," with four major "high points":
First, "mysterious power overtakes on a curve." DeepSeek is a "young" large model company established in 2023, previously receiving less attention than any major company or star startup at home and abroad, with its parent company, Huanshuo Quantitative, primarily engaged in quantitative investment. Many are puzzled that a leading AI company in China emerged from a private equity firm, which can be described as "a chaotic punch that knocks out the master."
Second, "small efforts yield miracles." The training cost of the DeepSeek-V3 model is about $5.58 million, less than one-tenth of OpenAI's GPT-4o model, yet its performance is already close. This has been interpreted as DeepSeek overturning the "Bible" of the AI industry—the scaling law. This law refers to improving model performance by increasing the amount of training parameters and computational power, which usually means spending more money on labeling high-quality data and purchasing computational chips, colloquially referred to as "great efforts yield miracles."
Third, "Nvidia's moat disappears." DeepSeek mentioned in its paper that it uses a custom PTX (Parallel Thread Execution) language for programming, better unleashing the performance of underlying hardware. This has been interpreted as DeepSeek "bypassing Nvidia's CUDA computing platform."
Fourth, "foreigners are convinced." On January 31, overnight, overseas AI giants like Nvidia, Microsoft, and Amazon all integrated DeepSeek. Suddenly, assertions like "Chinese AI has overtaken the U.S.," "the era of OpenAI is over," and "the demand for AI computing power has disappeared" emerged, almost unanimously praising DeepSeek while mocking Silicon Valley's AI giants.
However, the panic in the capital market did not last. On February 6, Nvidia's market value returned to $3 trillion, and U.S. chip stocks generally rose. Looking back at the aforementioned four "high points," most are likely misinterpretations.
First, by the end of 2017, almost all quantitative strategies of Huanshuo Quantitative had adopted AI models for computation. At that time, the AI field was experiencing the most significant wave of deep learning, and it can be said that Huanshuo Quantitative was keeping pace with the frontier.
In 2019, Huanshuo Quantitative's deep learning training platform "Firefly No. 2" was equipped with about 10,000 Nvidia A100 graphics cards. Ten thousand cards are the threshold for self-training large models; although this cannot be equated with DeepSeek's resources, Huanshuo Quantitative obtained the entry ticket for large model team battles earlier than many internet giants.
Second, DeepSeek mentioned in the V3 model technical report that "the $5.58 million does not include the costs related to architecture, algorithms, or preliminary research and ablation experiments." This means that DeepSeek's actual costs are even higher.
Several AI industry experts and practitioners told the Economic Observer that DeepSeek has not changed industry rules but has adopted "smarter" algorithms and architectures to save resources and improve efficiency.
Third, the PTX language was developed by Nvidia and is part of the CUDA ecosystem. DeepSeek's approach may stimulate hardware performance, but changing target tasks would require rewriting programs, which is a significant workload.
Fourth, companies like Nvidia, Microsoft, and Amazon have merely deployed DeepSeek's model on their cloud services. Users pay cloud service providers as needed, gaining a more stable experience and more efficient tools, which is a win-win approach.
Since February 5, domestic cloud providers like Huawei Cloud, Tencent Cloud, and Baidu Cloud have also successively launched the DeepSeek model.
Beyond the four major "high points," the public has many misunderstandings about DeepSeek. While "exciting" interpretations can provide a stimulating experience, they also obscure the innovations in algorithms and engineering capabilities of the DeepSeek team, as well as their commitment to open source, both of which have a more profound impact on the tech industry.
U.S. AI giants are not incapable, but have made decision-making errors
When users use the DeepSeek app or web version and click the "Deep Thinking (R1)" button, they will see the complete thought process of the DeepSeek-R1 model, which is a brand new experience.
Since the advent of ChatGPT, the vast majority of large models have directly output answers.
DeepSeek-R1 has an "out-of-the-box" example: when a user asks, "Which is better, University A or Tsinghua University?" DeepSeek first answers "Tsinghua University." When the user follows up with "I am a student of University A, please answer again," the response will be "University A is better." This exchange, posted on social media, sparked collective astonishment at "AI actually understands human relationships."
Many users have stated that the thought process exhibited by DeepSeek resembles that of a "person"—brainstorming while taking notes on draft paper. It refers to itself as "I," prompts to "avoid making the user feel their school is being belittled," and "praises their alma mater with positive and affirmative language," writing down everything it thinks of.
On February 2, DeepSeek topped the app markets in 140 countries and regions, allowing tens of millions of users to experience the deep thinking feature. Therefore, in users' perception, the AI's demonstration of the thought process is considered DeepSeek's "original creation."
In fact, the OpenAI o1 model is the pioneer of the reasoning paradigm. OpenAI released a preview version of the o1 model in September 2024 and the official version in December. However, unlike the freely accessible DeepSeek-R1 model, the OpenAI o1 model is only available to a limited number of paying users.
Liu Zhiyuan, a long-term associate professor at Tsinghua University and chief scientist at Mianbi Intelligence, believes that the global success of the DeepSeek-R1 model is significantly related to the erroneous decisions made by OpenAI. After releasing the o1 model, OpenAI neither open-sourced it nor disclosed technical details, and the fees were very high, making it difficult to reach a wider audience and allowing global users to experience the shock brought by deep thinking. This strategy effectively ceded the position originally held by ChatGPT to DeepSeek.
From a technical perspective, there are currently two common paradigms for large models: pre-trained models and reasoning models. The more well-known OpenAI GPT series and the DeepSeek-V3 model belong to pre-trained models.
In contrast, OpenAI o1 and DeepSeek-R1 belong to reasoning models, which represent a new paradigm where the model gradually decomposes complex problems through a chain of thought, reflecting step by step to arrive at relatively accurate and insightful results.
Guo Chengkai, who has been engaged in AI research for decades, told the Economic Observer that the reasoning paradigm is a relatively easy track for "overtaking on a curve." As a new paradigm, reasoning iterates quickly and is more likely to achieve significant improvements with small computational loads. The premise is having a powerful pre-trained model, which can deeply explore the potential of large-scale pre-trained models through reinforcement learning, approaching the ceiling of large model capabilities under the reasoning paradigm.
For companies like Google, Meta, and Anthropic, replicating reasoning models similar to DeepSeek-R1 is not a difficult task. However, in the competition among giants, even a small decision-making error can lead to missed opportunities.
It is evident that on February 6, Google released a reasoning model, Gemini Flash 2.0 Thinking, which is cheaper and has a longer context length, performing better than R1 in several tests, but it did not create the same waves as the DeepSeek-R1 model.
What is most worth discussing is not low cost,
but technological innovation and "full sincerity" in open source
The most widespread discussion about DeepSeek has always been about "low cost." Since the release of the DeepSeek-V2 model in May 2024, the company has been jokingly referred to as the "Pinduoduo of the AI industry."
Nature magazine reported that Meta spent over $60 million to train its latest AI model, Llama3.1405B, while DeepSeek-V3 trained for less than one-tenth of that. This indicates that efficiently utilizing resources is more important than sheer computational scale.
Some institutions believe that DeepSeek's training costs are underestimated. The AI and semiconductor industry analysis firm Semi Analysis stated in a report that the pre-training costs of DeepSeek are far from the actual investment in the model. According to their estimates, DeepSeek's total expenditure on GPUs is $2.573 billion, with $1.629 billion spent on purchasing servers and $944 million on operational costs.
Nevertheless, the net computational cost of the DeepSeek-V3 model is approximately $5.58 million, which is already quite efficient.
Beyond costs, what excites AI industry professionals even more is DeepSeek's unique technological path, algorithmic innovations, and commitment to open source.
Guo Chengkai explained that many current methods rely on classic training approaches for large models, such as supervised fine-tuning (SFT), which requires a large amount of labeled data. DeepSeek proposed a new method to enhance reasoning capabilities through large-scale reinforcement learning (RL), effectively opening up a new research direction. Additionally, multi-head latent attention (MLA) is a key innovation that significantly reduces reasoning costs.
Zhai Jidong, a professor at Tsinghua University and chief scientist at Qingcheng Jizhi, noted that what impressed him most about DeepSeek is the innovation in the mixture of experts (MoE) architecture, with each layer having 256 routing experts and one shared expert. Previous research had algorithms with Auxiliary Loss that could disturb gradients and affect model convergence. DeepSeek proposed a LossFree approach that allows the model to converge effectively while achieving load balancing.
Zhai Jidong emphasized, "The DeepSeek team is quite daring in innovation. I think not completely following foreign strategies and having their own thoughts is very important."
What excites AI practitioners even more is that DeepSeek's "full sincerity" in open source has injected a shot of "steroids" into the slightly declining open-source community.
Before this, the most powerful pillar of the open-source community was Meta's 400 billion parameter model, Llama3. However, many developers told the Economic Observer that after experiencing it, they still felt that Llama3 was at least a generation behind closed-source models like GPT-4, "almost making them lose confidence."
However, DeepSeek's open-source efforts have done three things to restore developers' confidence:
First, it directly open-sourced a 671B model and released several distilled models under multiple popular architectures, equivalent to "good teachers producing more good students."
Second, the published papers and technical reports contain a wealth of technical details. The papers for the V3 model and R1 model are 50 pages and 150 pages long, respectively, and are referred to as the "most detailed technical reports" in the open-source community. This means that individuals or companies with similar resources can reproduce the model according to this "manual." Many developers have described it as "elegant" and "solid" after reviewing it.
Third, it is worth mentioning that DeepSeek-R1 adopts the MIT license, which allows anyone to freely use, modify, distribute, and commercialize the model, as long as the original copyright notice and MIT license are retained in all copies. This means users can more freely utilize the model weights and outputs for secondary development, including fine-tuning and distillation.
While Llama allows for secondary development and commercial use, it includes some restrictions in its agreement, such as additional limitations for enterprise users with over 700 million monthly active users, and explicitly prohibits using Llama's output to improve other large models.
A developer told the Economic Observer that he started using DeepSeek from the V2 version for code generation development. Besides being very inexpensive, the performance of the DeepSeek model is also outstanding. Among all the models he has used, only OpenAI and DeepSeek's models can output effective logic up to over 30 layers. This means that professional programmers can assist in generating 30% to 70% of the code with the help of tools.
Several developers emphasized the significant importance of DeepSeek's open-source efforts to the Economic Observer. Previously, the leading companies in the industry, OpenAI and Anthropic, seemed like the aristocrats of Silicon Valley. DeepSeek has opened knowledge to everyone, making it more accessible, which is an important form of equality, allowing developers in the global open-source community to stand on DeepSeek's shoulders, while DeepSeek can also gather ideas from the world's top creators and geeks.
Turing Award winner and Meta's chief scientist Yang Likun believes that the correct interpretation of DeepSeek's rise should be that open-source models are surpassing closed-source models.
DeepSeek is good, but not perfect
All large models face the "hallucination" problem, and DeepSeek is no exception. Some users have reported that due to its superior expressive ability and logical reasoning, the hallucination issues produced by DeepSeek are even harder to identify.
One user on social media mentioned that he asked DeepSeek about route planning for a certain city. DeepSeek explained some reasons, listed some city planning protection regulations and data, and extracted a concept of "silent zones," making the answer seem very reasonable.
In contrast, other AI responses to the same question were not as profound, and a person could easily see that they were "nonsense."
After reviewing the protection regulations, this user found that there was no mention of "silent zones" at all. He concluded, "DeepSeek is building a 'Great Wall of Hallucinations' on the Chinese internet."
Guo Chengkai also discovered similar issues, noting that DeepSeek-R1's responses often misattribute certain proper nouns, especially for open-ended questions, where the "hallucination" experience can be more severe. He speculated that this might be due to the model's overly strong reasoning ability, linking a large amount of knowledge and data together.
He suggested that when using DeepSeek, users should enable the online search function and pay close attention to the thought process, intervening and correcting errors as needed. Additionally, when using reasoning models, it is advisable to use concise prompts. The longer the prompt, the more content the model will associate.
Liu Zhiyuan found that DeepSeek-R1 often uses some advanced vocabulary, such as quantum entanglement and entropy increase/decrease (which can be applied in various fields). He speculated that this might be due to some mechanism set in reinforcement learning. Furthermore, R1's reasoning performance on tasks without ground truth (referring to the process of collecting appropriate objective data for the test) in some general fields is still not ideal, and reinforcement learning training does not guarantee generalization.
In addition to the common issue of "hallucinations," there are also some persistent problems that DeepSeek needs to address.
On one hand, there are potential ongoing disputes arising from "distillation technology." Model or knowledge distillation typically involves training a weaker model by having a stronger model generate responses, thereby improving the performance of the weaker model.
On January 29, OpenAI accused DeepSeek of using model distillation technology to train its own model based on OpenAI's technology. OpenAI claimed there is evidence that DeepSeek used its proprietary model to train its open-source model but did not provide further evidence. OpenAI's terms of service state that users cannot "copy" any of its services or "use its outputs to develop models that compete with OpenAI."
Guo Chengkai believes that optimizing one's model based on distillation from leading models is a common practice in training many large models. Since DeepSeek has already open-sourced its model, further validation is a straightforward task. Moreover, OpenAI's early training data itself has legal issues, and if legal action is to be taken against DeepSeek, it must elevate to a legal level to uphold the legality of its terms and clarify the content of those terms.
Another issue that DeepSeek needs to resolve is how to advance larger-scale pre-trained models. In this regard, OpenAI, which has more high-quality labeled data and computational resources, has yet to release a larger-scale pre-trained model like GPT-5. Whether DeepSeek can continue to create miracles remains a question.
Regardless, the hallucinations produced by DeepSeek are also sparked by curiosity, which may represent the dual nature of innovation. As its founder Liang Wenfeng stated, "Innovation is not entirely driven by business; it also requires curiosity and a desire to create. Chinese AI cannot always follow; someone needs to stand at the forefront of technology."
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。