DeepSeek launches NSA mechanism to improve long context training and inference efficiency

PANews
PANews|Feb 18, 2025 08:53
DeepSeek announced the launch of NSA (Sparse Attention Mechanism), which is highly consistent with hardware and supports native training, aiming to achieve ultra fast long context training and inference. Through optimized design for modern hardware, NSA significantly reduces pre training costs while accelerating inference speed, without affecting model performance. According to official reports, the NSA performs excellently in general benchmark testing, long context tasks, and instruction based inference, with comparable or even better performance compared to fully attentive models.
+6
Mentioned
Share To

Timeline

HotFlash

APP

X

Telegram

Facebook

Reddit

CopyLink

Hot Reads