DeepSeek launches NSA mechanism to improve long context training and inference efficiency

PANews|Feb 18, 2025 08:53
DeepSeek announced the launch of NSA (Sparse Attention Mechanism), which is highly consistent with hardware and supports native training, aiming to achieve ultra fast long context training and inference. Through optimized design for modern hardware, NSA significantly reduces pre training costs while accelerating inference speed, without affecting model performance.
According to official reports, the NSA performs excellently in general benchmark testing, long context tasks, and instruction based inference, with comparable or even better performance compared to fully attentive models.
Share To
Timeline
HotFlash
APP
X
Telegram
CopyLink