
Sam Gao|Jan 29, 2025 08:28
My view on DeepSeek (3/N)
3. Speaking Up for DeepSeek
DeepSeek R1’s paper boasts astonishing metrics but has also raised suspicions:
1. Mixture of Experts (MoE) Technology
This approach requires a high level of training expertise and vast datasets. It’s one reason people suspect DeepSeek might have used OpenAI’s data for training.
2. Reinforcement Learning (RL)
RL-based approaches demand heavy hardware resources. Compared to the tens of thousands of GPUs at Meta or OpenAI, DeepSeek allegedly used only 2,048 H800s for training.
Given the limited computing and the complexity of MoE, it seems almost too good to be true that DeepSeek R1 succeeded on a mere 5 million budget. Yet whether you view R1 as a “miracle of low cost” or dismiss it as “all show and no substance,” its dazzling functional innovations are hard to ignore.
Arthur Hayes, co-founder of BitMEX, wrote:
“Will the rise of DeepSeek make global investors question American exceptionalism? Are U.S. assets vastly overvalued?”
At this year’s Davos Forum, Professor Andrew Ng of Stanford University publicly stated:
“I’m impressed by DeepSeek’s progress. I think they’ve managed to train their models in a very cost-effective way. Their latest inference model is outstanding… kudos to them!”
A16z founder Marc Andreessen said,
“DeepSeek R1 is one of the most astounding, most impressive breakthroughs I’ve seen—and as an open-source release, it’s a profound gift to the world.”
Share To
Timeline
HotFlash
APP
X
Telegram
CopyLink