Source: 硬AI
Author | Fang Jiayao
Image Source: Generated by Wujie AI
Currently, the field of artificial intelligence is attempting to develop smaller and lower-cost AI models, which may make AI technology more popular and easier to use.
Last week, a research team led by Jim Reid and Luke Templeman of Deutsche Bank released their thematic outlook report, which showed that Deutsche Bank included AI in the top ten themes for 2024 and pointed out that large AI models may gradually be replaced by smaller, more efficient, and lower-cost models.
Sam Altman, the CEO of OpenAI and a well-deserved pioneer in the field of artificial intelligence, also admitted:
"The era of large models may be coming to an end, and in the future, we will improve them in other ways."
Some AI experts predict that by 2024, small language models will play a greater role in deploying AI in companies for specific tasks.
01 Limitations of Large Models
Currently, large models have limitations in terms of cost and computational requirements.
Deutsche Bank pointed out that in the past five years, the field of AI has generally measured a model's capabilities based on the number of parameters. The more parameters, the model is usually capable of handling more complex tasks and demonstrating stronger capabilities.
For example, the number of parameters in the largest models has increased tenfold or more each year, and each increase has brought unexpected capability expansions, such as programming and translation abilities. Therefore, large neural network models are generally considered to have better performance.
Some opinions point out:
"Using the number of parameters as a measure of capability or risk is too rough, and we should pay more attention to the actual usage of the model."
These large models use an extremely large number of parameters (some exceeding 100 billion), and each parameter requires computational resources to process. Although large models (such as the GPT series) are technically advanced, they are often large in scale and have extremely high demands for computational resources. Whenever large models have a significant improvement in performance, their training and operating costs also sharply increase.
Even though these models are open source, many researchers and small businesses find it difficult to afford the expensive computational costs they require.
Furthermore, many AI researchers find it challenging to iterate and develop their own models based on these large models due to their complexity.
Deutsche Bank stated that there are also concerns about regulation of large models, and the regulation of large language models is becoming stricter. For example, the U.S. government issued an executive order at the end of October last year requiring companies that produce "dual-use" foundational models, such as those with "tens of billions of parameters," to implement higher transparency requirements.
02 Advantages of Small Models
For certain specific tasks, small, efficient AI models may be more suitable than large models.
As Matt Casey, from the technology company Snorkel, specializing in artificial intelligence and machine learning, wrote:
"Using large models for certain tasks is like using a supercomputer to play 'Frogger'."
While large models have advantages in handling complex tasks, not every task requires such powerful computational capabilities.
The advantages of small language models are numerous.
Lower resource requirements. Small models usually require fewer computational resources for training and operation, making them more suitable for use on devices with limited computing capabilities. For example, small models can be directly installed on users' computers or smartphones, eliminating the need to connect to remote data centers. Lower cost. Small models require fewer computational resources for training and deployment, directly resulting in lower operating and maintenance costs. Better privacy protection. Small models can run on local devices without the need to send data to cloud servers, which helps to improve data processing privacy and security. Faster processing speed. Due to fewer parameters, small models typically have shorter response times when processing requests, which is particularly important for applications requiring real-time responses.
Researchers are working to develop smaller, more efficient AI models by reducing their number of parameters while ensuring that they can achieve or even surpass the performance of large models in specific tasks.
One approach is "knowledge distillation technology." Unlike traditional pre-training, the meaning of "distillation technology" is to use a large "teacher" model to guide the training of a small "student" model. Training small models in a "distilled" manner no longer directly learns from the massive data used in training large models, but only imitates. It's like a student not learning the teacher's entire knowledge base, but being able to achieve a similar level of performance in targeted areas.
Graham Neubig, a computer science professor at Carnegie Mellon University, said:
"In most cases, you can create a much smaller specialized model to handle specific tasks. Although this small model does not have the broad applicability of large models, it can perform exceptionally well in specific tasks."
Professor Neubig and his collaborators developed a model that is 700 times smaller than the GPT model in an experiment and found that it outperformed the large GPT model in three natural language processing tasks.
There are many examples of small models performing well.
For example, Microsoft researchers recently reported that they were able to reduce the GPT model to a small model with just over 1 billion parameters. This small model can compete with large models in certain specific tasks.
Furthermore, Deutsche Bank pointed out that in July of this year, Meta's open-source Llama 2 was released in three versions, with parameter ranges from 700 million to 7 billion. Additionally, BloombergGPT, designed for financial applications, has only 5 billion parameters. Although these models have relatively fewer parameters, their performance in multiple tasks is superior to similar models, demonstrating the potential of small models.
03 Limitations of Small Language Models
However, these advantages often come at the cost of sacrificing some performance. Some studies show that small "student" models may only perform well within a certain range of tasks. Large "teacher" models, due to their large number of parameters and complex structures, are usually more precise and powerful in understanding and generating language. Therefore, the choice between small and large models for more extensive or complex tasks depends on the specific application's needs and limitations.
Sara Hooker, head of Cohere for AI, a non-profit artificial intelligence research lab at the AI company Cohere, said:
"The ability of small models to handle broad or rare tasks is still limited."
"There are still many unknown areas. How do we ensure that the data obtained from large models is diverse enough to cover all these tasks?"
In addition, due to the inherent risks of "imitation," "distillation technology" is currently in a legal gray area.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。