How far has the development of large models in the United States progressed outside of ChatGPT?

CN
巴比特
Follow
1 year ago

Source: Huashang Taolue

Author: Wang Mengxin

Image Source: Generated by Wujie AI

Since the beginning of the year, OpenAI has set off a wave of AI large model craze worldwide with ChatGPT. However, the United States' AI large models are far more than just OpenAI's ChatGPT.

01 Explosive Development

According to various data, although China's development is rapid, the United States still leads the world in the number of large models released. By May 2023, the number of basic large models with a scale of over 1 billion parameters has exceeded 100.

The Economist reported that the total investment in large models in the United States in 2022 reached $47.4 billion, about 3.5 times that of the second-place China ($13.4 billion), and still maintains a strong growth momentum. Goldman Sachs further predicts that the United States' investment in large models by 2025 could reach $100 billion, about half of the global total.

Goldman Sachs' survey shows that 16% of companies in the Russell 3000 index mentioned large models in their 2023 financial reports. Its economists estimate that large models will increase overall labor productivity by 1% over the next decade and bring about a 14% increase in the S&P 500 index.

In addition to ChatGPT, representative general large model companies in the United States now include: Anthropic, Cohere, and Google.

Among them, Anthropic, founded by former OpenAI executives Dario and Daniela Amodei in 2021, is now valued at $30 billion, second only to OpenAI (valued at about $86 billion) as a general large model enterprise.

Anthropic has several former core employees of OpenAI who participated in the development of GPT-2 and GPT-3. Its large model product, Claude2, is considered a classic masterpiece second only to ChatGPT-4, and some analysts even believe that Claude2's performance is superior to ChatGPT-4.

For example, Claude2 can handle datasets of up to about 75,000 words, while ChatGPT is about 3,000, which means it can process and output more complex content and be applied to more challenging fields, such as generating thousands of words of long-form content.

What makes Claude2 even more popular is that it is directly open to the public for free, rather than requiring payment to use like GPT-4.

The excellent founding team and strong product performance have made Anthropic highly sought after by capital. Google, SK Telecom (one of the largest mobile operators in South Korea), and Amazon have all become its investors, with Amazon alone investing as much as $4 billion.

In addition to Anthropic, another commendable company is Cohere.

In June of this year, Cohere, founded in 2019, received a $270 million investment from NVIDIA, Oracle, Salesforce Ventures, and others, becoming a unicorn valued at $20 billion, second only to OpenAI and Anthropic in the basic large model company valuation.

Cohere, like OpenAI, provides similar products, but it sees the market opportunity of "data privacy," differentiating itself from OpenAI's positioning and choosing the ToB track, firmly following the path of commercial large models. Its product capabilities include three main categories: text retrieval, text generation, and text classification, and can be tailored to customer needs, emphasizing security, privacy, and customized services.

Another selling point of Cohere is that it is not limited by any cloud platform, thereby ensuring the privacy and security of data. It provides flexible storage and data privacy protection paths, allowing users to achieve local deployment to meet the different data storage needs of customers.

Cohere's ability to quickly find its own differentiated positioning is inseparable from Aidan and his co-founder's unique talent view and entrepreneurial philosophy.

Aidan once said that Cohere is looking for people from different backgrounds who are very interested in AI and ambitious: they may not have a fancy resume from a big company, but they must have a very high interest and passion for their focused field, and not only write papers, but also have practical hands-on abilities.

Differentiated product strategy and a unique team background have made Cohere a breath of fresh air in the field of general large models.

Recently, Cohere released the world's first publicly available multilingual understanding model, which is trained based on real data from native speakers and can read and understand over 100 of the most commonly used languages worldwide.

Now let's take a look at the giant Google.

On December 6, Google DeepMind launched the multimodal AI model Gemini, which can simultaneously learn and understand text, images, audio, code, and other multimodalities.

Taking the application of customer service chatbots as an example, using Gemini as a model can not only understand customers from the literal meaning of the conversation but also receive the intent from expressions and tone, and can handle content including audio, code, images, and videos.

According to test results, Gemini is the first model to surpass human experts in large model multitask language understanding and outperforms GPT-4 in 30 out of 32 AI tests.

With its powerful performance, Gemini quickly gained attention and created a huge buzz for its parent company, Alphabet. On December 7, Alphabet's stock price rose by 5.31% to $136.93, with a total market value of $1.72 trillion. Google plans to gradually integrate this model into its search, advertising, and other services.

However, when it comes to large models in the United States, what is more worthy of attention is its progress in industrial applications and future imagination.

02 Accelerating Industrial Implementation

According to the "2023 Artificial Intelligence Index Report" released by Stanford University, in 2022, out of the 35 large models in the United States, only 3 came from laboratories, and 32 were born in the industry. This trend continues this year.

On March 30, 2023, while the outside world was still immersed in the frenzy of the emergence of general large models, Bloomberg single-handedly focused everyone's attention on a new industry track. On that day, it announced that it had built the largest financial domain dataset to date, trained a large language model specifically for the financial field called LLM, and developed a language model with 500 billion parameters—BloombergGPT.

Carrying the halo of the world's first financial large model, BloombergGPT relies on a large amount of financial data sources from Bloomberg to build a dataset with 363 billion tags. According to Gaojin Think Tank analysis, it can greatly improve the efficiency and stability of financial institutions, assisting in cost reduction and efficiency improvement.

In terms of cost reduction, BloombergGPT can reduce personnel input in investment research, R&D programming, risk control, and process management; in terms of efficiency improvement, it can automatically generate high-quality financial reports, financial analysis reports, and prospectuses based on given topics and contexts, while also assisting in accounting and auditing work, and can refine and organize financial news or financial information, freeing up professional manpower for more labor-intensive areas.

Tianfeng Securities pointed out in its report that due to BloombergGPT having more professional training data than ChatGPT, it will demonstrate superior capabilities in financial scenarios compared to general large models, marking the beginning of the GPT revolution in the financial field.

BloombergGPT is just a typical case. Currently, the large models in the United States have shown three distinct "schools": first, independent full-stack self-research, emphasizing independent controllability; second, combining their own data and scenarios with others' models for fine-tuning, forming a financial large model that fits their own needs; third, calling from the cloud, accessing various large model APIs for private deployment as needed, a method preferred by small and medium-sized financial companies with weak technological foundations.

According to relevant statistics, financial AI in the United States accounts for about 6.7% of overall AI financing.

The healthcare industry is another hotbed for the implementation of large models in the United States. Tech giants like Google and Microsoft, healthcare technology companies like Sensely and Enlitic, biopharmaceutical startups like AbSci and Exscientia, and CXO (medical outsourcing) companies like Syneos are all involved.

New drug development businesses such as compound synthesis and target discovery, electronic medical records, and assisted diagnosis in hospital medical services are common scenarios for the application of large models in the United States. Medical devices such as CT (computed tomography) scans and MRI (magnetic resonance imaging) are further enhanced under the empowerment of large models.

Among the many medical large models, Google's Med-PaLM2 is a key focus. It is the first large model to achieve a "expert" level of performance on the MEDQA dataset for the United States Medical Licensing Examination (USMLE), with an accuracy rate of over 85 points; it is also the first artificial intelligence system to achieve a passing score on the MEDMCQA dataset, which includes questions from the AIIMS and NEET medical exams in India, with a score of 72.3 points.

Med-PaLM2 is also bringing revolutionary impact to the industry.

Through Med-PaLM2, it can analyze large-scale biomedical data, discover genes, proteins, and metabolic pathways related to diseases, identify potential targets, help screen for potentially active drug molecules, thereby narrowing down the range of candidate drugs and prioritizing the selection of compounds with higher activity for subsequent experimental verification. This will shorten the drug development cycle and reduce development costs.

The success of Med-PaLM2 has also prompted Google to invest more in the field of medical large models.

For example, it has partnered with healthcare software company Epic to develop a tool based on ChatGPT that can automatically send professional medical information to patients; Google's partner, healthcare provider Carbon Health, has also launched an AI tool called Carby based on GPT-4, which can automatically generate diagnostic records based on conversations between doctors and patients, greatly improving the efficiency and diagnostic experience of doctors. Currently, Carby is being used by over 130 clinics and more than 600 medical personnel, with one clinic in San Francisco reporting a 30% increase in patient visits after using Carby.

Outside of Google, AI chip giant NVIDIA has also been laying out in the field of medical large models for many years.

In 2021, NVIDIA announced a strategic partnership with Schrodinger, a US medical information technology company, to accelerate the development of new treatment methods by improving the speed and accuracy of its computing platform for rapid and accurate assessment.

In September 2022, NVIDIA released BioNeMo, a large-scale biological molecule language model for training and deploying supercomputers, to help scientists better understand diseases and find optimal treatments. BioNeMo also provides cloud API services to support pre-trained AI models. In July of this year, NVIDIA invested $50 million in the biotechnology company Recursion to support the development and training of AI foundational models in the fields of biology and chemistry.

The education sector is also an important scene for the implementation of large models in the United States, with its core applications mainly focused on language learning, online courses, and auxiliary learning. A landmark case is the AI assistant Khanmigo based on the GPT-4 model released by the US online education organization Khan Academy in April, which has tutoring, lesson plan generation, writing training, and programming exercises functions.

Currently, Khan Academy has commercialized its operations, with a fee of $9 per month or $99 per year. Among them, tutoring can provide one-on-one tutoring for students. Khanmigo actively explains the reasoning behind answers and guides students in thinking exercises until they calculate the correct answer themselves. In addition, Khanmigo can also serve as a writing guide, prompting and advising students to write and debate from different perspectives based on specific details such as character features and story backgrounds, unleashing students' creativity.

With its strong intent understanding and natural language communication capabilities, as well as text and image generation capabilities, Khanmigo can truly understand students, provide personalized learning advice, and greatly increase the supply of teaching materials, including entertaining courseware and abundant extracurricular materials, making the "thousand faces of education" a possibility and having a significant impact on the industry.

Overall, large models in the United States are accelerating their integration with industry, and a new industrial revolution is happening as a result.

03 Chinese Advantages and Opportunities

From a global perspective, China and the United States are leading the development of large models.

According to the "Research Report on the Map of Chinese Artificial Intelligence Large Models," there are currently a total of 202 large models released globally, with large models from China and the United States accounting for nearly 80% of the total number of large models worldwide. The global competition for large models is actually a competition between China and the United States.

China's participants in large models are also numerous, with leading technology companies (Alibaba, Baidu, Tencent, Huawei, etc.), startups (Zhipu AI, Baichuan Intelligence, etc.), traditional AI companies (iFlytek, SenseTime, etc.), and research institutes (Tsinghua University, Fudan University, Chinese Academy of Sciences, etc.) all deeply involved, gradually forming a pattern of internet giants leading in general model development, with a variety of AI vendors, startups, and research institutes flourishing.

Although the United States currently shows a leading trend in the field of large models and has taken measures to suppress China, such as prohibiting US companies from providing cloud computing and large model training services to China, China's large models still have tremendous development opportunities and the potential to surpass the United States.

Firstly, both the government and the industry in China are pushing for the development and catching up of large models. According to the Financial Times, China occupies four seats in the top ten large model research institutions globally, including Baidu, BAAI Zhijing Research Institute, Tsinghua University, and Alibaba Research Institute. Baidu's "Wenxin Yiyuan" and Alibaba's "Tongyi Qianwen" are self-developed large models in China, with performance that can compete with large models in the United States.

Leonis Capital's analysis report indicates that compared to American companies, which focus more on underlying research and development capabilities, the majority of Chinese companies outside of leading giants such as Baidu and Alibaba are more inclined towards framework and industry application-oriented research and development. This difference will bring huge opportunities to China, allowing it to surpass the United States in the application of generative AI and large model industry solutions, ultimately driving or supporting the catching up of the foundational end through application leadership.

Because, although China may be slightly inferior in underlying research and development technology compared to the United States, it has a huge market size and rich application scenarios, which can provide broad space and conditions for the implementation of large models, thereby driving overall breakthroughs through industry applications.

An important feature of large models is the dual drive of application and technology. In other words, consumers using large models not only contribute to profits but also provide more feedback through data loops, thereby enhancing the capabilities of neural networks. Rich scenarios can make large models more practical, match demand for better results, and drive faster technological development.

Based on this feature, if China can rely on its huge market size and rich scenarios, grasp the key of application, respect market laws, continuously gain profits from market applications, and then feed back the accumulation of funds and talents, ultimately breaking through in underlying technology will be a natural outcome.

As the leader of AI large models in China, Robin Li recently stated at the Geek Park Innovation Conference 2024, "The arrival of the era of large models, the real value lies in native applications."

Robin Li believes that large models themselves are not the innovation and entrepreneurial opportunity for most people; native applications are. Whether for large companies or small and medium-sized enterprises, entrepreneurs, native applications are a great opportunity.

Robin Li said that he is somewhat anxious to see that the main excitement of the media, society, and the public is still on the basic model and has not shifted to AI native applications. In recent public speeches, as well as internal company talks, he has been emphasizing continuously. "We must focus on the native applications of AI. We have to make this thing happen, and then your model will have value."

In fact, China has already achieved leading development in the internet and mobile internet fields through rich scenarios and application innovation, ultimately driving the progress of the entire technology industry. This trend is also continuing in the field of large models. However, compared to the development of China and the United States in the internet field, one thing that Chinese large model enterprises should highly value now is: earlier overseas layout and global development.

Today's Chinese companies also have a better foundation for going global and finding broader development opportunities in the global market.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

币安:注册返10%、领$600
链接:https://accounts.suitechsui.blue/zh-CN/register?ref=FRV6ZPAF&return_to=aHR0cHM6Ly93d3cuc3VpdGVjaHN1aS5hY2FkZW15L3poLUNOL2pvaW4_cmVmPUZSVjZaUEFG
Ad
Share To
APP

X

Telegram

Facebook

Reddit

CopyLink