Musk's Grok3 is not yet the "smartest on Earth," but it is indeed the wealthiest.

CN
1 day ago

Having money allows for indulgence, but there is still much to be done to become the "strongest."

Image source: Generated by Wujie AI

The "smartest AI on Earth," Grok 3, as mentioned by Musk, has arrived.

In a live broadcast watched by millions, Musk unveiled Grok 3, joined by two Chinese researchers, xAI co-founders Tony Wu and Jimmy Ba. According to benchmark tests, Grok 3 is indeed impressively powerful, and the 200,000 GPU computing cluster behind it is astonishing in terms of capital investment.

The release of Grok 3 includes a series of models: Grok 3, Grok 3 mini, as well as updates for reasoning mode (Think), DeepSearch, Big Brain, and more.

#01, The Name "Smartest AI" Comes from Rankings, How Does It Perform in Tests

In benchmark evaluations, Grok 3 outperformed other models such as GPT-4o, Gemini-2 Pro, Claude3.5 Sonnet, and DeepSeek-V3 in mathematical reasoning, STEM, and scientific fields. Even the smaller version, Grok 3 Mini, is at a top-tier level.

The early version of Grok 3 also scored high in the large model competition Chatbot Arena, a crowdsourced testing platform where different AI models compete, and users vote for the best answers. Grok-3 is the first model to break the 1400-point barrier, ranking first in all categories.

Since its release in 2023, Grok has seen a rapid increase in MMILU scores, especially achieving significant breakthroughs with Grok 2 in 2024, demonstrating rapid catch-up and progress compared to the GPT series.

"Grok 3 has very powerful reasoning capabilities, and in all the tests we've conducted so far, Grok 3 has outperformed any released product we know of, which is a good sign," Musk stated via video call at the World Government Summit held in Dubai last week.

Grok 3 also introduced reasoning mode (Think), which allows it to think like reasoning models such as DeepSeek-R1 through Grok 3 Reasoning and Grok 3 mini Reasoning. The Grok 3 model can solve complex problems by considering all possible solutions, self-critique, validating solutions, backtracking, and thinking from first principles. However, to prevent distillation, some of Grok 3's reasoning processes have been obscured.

Grok 3 Reasoning surpassed the best version of o3-mini—o3-mini-high—in multiple popular benchmark tests, including the new mathematical benchmark AIME2025.

The team demonstrated using Grok 3's Think mode to generate an animated 3D drawing of a trajectory for launching from Earth to Mars and back, showcasing the next launch window.

In the demonstration, Grok 3 provided a Python script using Matplotlib and explained the code. The code appeared to solve Kepler's laws numerically. After running the code, Grok animated the two planets, Earth and Mars, using green spheres to represent the spacecraft's journey between them.

The demonstration was generated live, so there was no verification of whether the solution was entirely correct, but Musk, wearing a pendant showing the Earth-Mars transfer orbit, indicated it was close to the actual solution.

Andrej Karpathy, who had an early experience with Grok 3, stated that Grok 3's Think mode accomplished tasks that DeepSeek-R1, Gemini 2.0 Flash Thinking, and Claude had not achieved, but he noted that top OpenAI models, like o1-pro, could also do so.

Following OpenAI, Gemini, and Perplexity, Grok has also launched its own deep search, Deep Search. The xAI team positions Deep Search as the "next-generation search engine," the first-generation product of Grok Agent. It is not just a simple information retrieval tool but aims to assist in programming, research, and answering everyday questions.

From the demonstration, Grok 3's Deep Search did not seem to have many unique features, emphasizing its difference from traditional search engines' keyword matching models, as it can deeply understand the semantics and intent of user queries, gather content from multiple information sources, and cross-verify to ensure accuracy, offering more controllability than traditional search engines, allowing users to specify sources.

The xAI team particularly mentioned that the search process of Deep Search is transparent to users, allowing them to understand the AI's "thinking" process.

Andrej Karpathy believes that Grok 3's Deep Search is roughly equivalent to Perplexity's Deep Research but has not yet reached the level of OpenAI's recently released Deep Research.

#02, Full-Power "Big Brain" Mode

For more complex queries, the "Big Brain" mode utilizes more computation for reasoning. xAI describes these reasoning models as best suited for mathematical, scientific, and programming problems, appearing to be another way of saying "full-power version."

The xAI team demonstrated Grok 3 creating a new game that combines Tetris and Bejeweled in Big Brain mode. The xAI team explained that since it was generated live, Grok might make some minor coding errors, leading to the game not running entirely as expected. In the live test, the generated game was able to run normally, but there were some issues with the color display, and it was unclear whether the mechanism for clearing a whole line in Tetris was implemented.

The xAI team also confirmed during the live broadcast that they plan to launch an AI game studio, which Musk had also tweeted about the day before.

#03, Having Money Allows for Indulgence, but There is Still Much to Be Done to Become the "Strongest"

Grok 3 is based on xAI's Colossus cluster, which took only 122 days to build the first phase of 100,000 cards and then expanded to 200,000 cards in 92 days, using about 200,000 GPUs to train Grok 3, completing pre-training in early January. Previously, Musk posted on the X platform that the development of Grok 3 used "10 times" the computing resources of its predecessor, Grok 2, and the training dataset was expanded, reportedly including court case documents. In the live broadcast, he stated that Grok 3's computing resources are about 15 times that of Grok 2.

Musk also revealed that xAI is building a new AI cluster that will have five times the power of the current cluster.

Additionally, regarding the voice mode, the team did not provide a specific release date, but Musk indicated it would be released "in about a week."

In terms of specifics, the voice will be generated directly by a model similar to Grok, capable of understanding spoken words and directly generating audio. This approach allows the AI to remember details and continue conversations more naturally. The voice mode functionality will be available in both the application and API.

xAI plans to launch the Grok-3 API in the coming weeks. This API will include Grok-3's reasoning model and Deep Search capabilities. The xAI team is very optimistic about enterprise-level application scenarios, believing that Grok-3's powerful capabilities and the addition of Deep Search will bring significant value to enterprise users.

It is worth noting that xAI recently launched a promotion where users can receive $150 in API credits for a minimum recharge of $5, as long as they agree to share their data. Clearly, xAI does not mind giving away this little benefit; what they value more is acquiring users and data through this method.

Regarding the open-source plan, Musk stated that they will continue the previous strategy and will open-source Grok 2 when Grok 3 matures and stabilizes (which is expected to happen in a few months).

Currently, users can experience Grok through X and the Grok website and app, although not all models and related features of Grok 3 are online yet (some are still in testing). Grok 3 will first be available to Premium+ subscribers on the X platform, and there will also be an independent subscription service called Super Grok, which offers advanced features and early access to Grok users for $30 per month or $300 per year. Super Grok unlocks more query counts in DeepSearch and provides unlimited image generation services.

The release of Grok 3 marks xAI's fierce competition in the AI field, not only against OpenAI and Google but also facing pressure from emerging Chinese companies. For instance, DeepSeek has prompted AI companies worldwide to adjust their strategies, making deep thinking models the "standard," and has also led OpenAI to recently open up its reasoning models for free, signaling a move towards open-source.

For Musk, OpenAI may be xAI's biggest rival. Musk founded xAI in 2023 with the aim of becoming an alternative to OpenAI and has publicly criticized OpenAI's plans to restructure itself into a for-profit entity.

Musk has also filed two lawsuits against OpenAI, accusing it of deviating from its original founding principles and proposing to acquire OpenAI's nonprofit division for $97.4 billion, but that proposal was rejected by the OpenAI board last week. Sam Altman stated that this acquisition offer was a strategy to "slow us down." Although Musk was involved in the founding of OpenAI, he has maintained a critical stance towards the company since leaving the board in 2018.

Both companies are undergoing remarkable financing, with valuations soaring. According to a report by Bloomberg last week, Musk's xAI is in talks for about $10 billion in funding, which would bring the company's valuation to $75 billion, up from its last valuation of $51 billion. Meanwhile, OpenAI is negotiating to raise up to $40 billion, with its valuation expected to rise to $300 billion.

The characteristics of both companies benefiting from capital are also quite evident. SoftBank, OpenAI, Oracle, and Abu Dhabi-backed MGX jointly announced plans in January to invest $100 billion in the U.S., ultimately committing $500 billion for building data centers and other AI infrastructure. At the same time, Dell Technologies is nearing completion of a deal worth over $5 billion to provide xAI with servers optimized specifically for AI.

From the current situation, OpenAI is indeed xAI's main competitor. The two have a direct competitive relationship in terms of technology, market positioning, and financing strategies. OpenAI remains in the lead with its mature product line and strong market share. Although Grok 3 has advantages in certain metrics, the overall demonstration does not show much innovation; it is more about catching up with industry leaders. What seems to support Grok 3 more is the 200,000 GPUs and continuous capital support rather than a true technological breakthrough. This release is not what Musk described as "perhaps the last chance for AI to surpass Grok."

At the opening of the Grok 3 release, Musk reiterated xAI's and Grok's mission: to understand the nature of the universe, clarify what is happening, search for traces of extraterrestrial life, explore the meaning of life, understand the origins of the universe, and determine its end. xAI is driven by the pursuit of truth, aiming to become the ultimate truth-seeking artificial intelligence.

However, whether achieving these grand visions or facing more realistic competition, relying solely on "financial power" and the title of "strongest" on the rankings is clearly not enough. To become the truly "smartest AI on Earth," Musk and his xAI have a long way to go.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Share To
APP

X

Telegram

Facebook

Reddit

CopyLink