OpenAI employees publicly accuse Grok3's benchmark test results of being misleading

PANews
PANews|Feb 23, 2025 03:11
According to a report by Jin Shi, a OpenAI employee recently publicly accused xAI, a subsidiary of Elon Musk, of misleading benchmark test results for its latest AI model, Grok3. Regarding this, Igor Babushkin, co-founder of xAI, insists that the company has no wrongdoing. According to xAI's chart, the two versions of Grok3- Grok3 Reasoning Beta and Grok3 mini Reasoning - outperformed OpenAI's current strongest available model, o3-mini high, on AIME 2025. However, OpenAI employees quickly pointed out on the X platform that xAI's chart did not include o3-mini high“ cons@64 ”AIME 2025 score under certain conditions. Babushkin argued on the X platform that OpenAI has also released similar misleading benchmark charts in the past. Although these charts are used to compare the performance of their own models.
+3
Mentioned
Share To

Timeline

HotFlash

APP

X

Telegram

Facebook

Reddit

CopyLink

Hot Reads