Source: First Voice
Author: Xia Yu
Image source: Generated by Wujie AI
According to IDC's forecast, the market size of China's digital human market is expected to reach 10.24 billion yuan by 2026. This includes not only the rapidly developing B-end market, but also the C-end market, which is considered a potential stock. Especially at the current node of the rapid development of AIGC large models, the penetration rate of digital humans will be enhanced.
Currently, many manufacturers have entered the game. In August of this year, Huawei announced the launch of the Pangu digital human large model, which can help users generate digital humans in 12 hours. Previously, Tencent launched some virtual digital humans based on specific scenarios, such as Ping An's digital employee, Xinhua News Agency's anchor "Xin Xiaowei," astronaut "Xiao Zheng," 3D sign language interpreter "Lingyu," and the Palace Museum guide "Fudaren." Alibaba's virtual digital humans serve their own business needs, mainly focusing on live streaming and sales…
So, what is the development of digital humans under the wave of AIGC large models? In which scenarios will digital humans be applied? What technical challenges do they face? How much does it cost? This article discusses these questions with the product VP of Xinyi Universe, Chen Yang, the founder and CEO of Shiyou Technology, Ji Zhihui, and industry practitioner Li Yuan (pseudonym), attempting to answer the above questions.
01 With the blessing of AIGC, are digital humans taking off?
In Tencent's "Digital Human Industry Report," digital humans are defined as "virtual characters that exist in digital space in digital form, with human-like or real appearance, behavior, and characteristics."
In 2023, with the strong rise of AIGC and the emergence of large ChatGPT language models, the digital human track has become increasingly lively. They are beginning to appear frequently in various application scenarios, as well as in various industries such as culture and tourism, e-commerce, and finance, where a variety of virtual digital humans are replacing real people, acting as spokespersons, anchors, announcers, customer service, and intelligent assistants.
The number of market participants is also visibly increasing. Internet giants, startups, traditional AI companies, and some digital service providers and investors who were previously engaged in intelligent customer service marketing have all entered the game.
Data from Rui Guan Network shows that as of December 2022, there have been over 140 investment and financing events in the Chinese digital human industry. According to IDC's "China AI Digital Human Market Status and Opportunity Analysis 2022" report, it is expected that the market size of China's AI digital human market will reach 10.24 billion yuan by 2026.
Several interviewees summarized three reasons for the increasingly hot phenomenon of the digital human industry.
First, from a technical perspective, the emergence of AIGC has solved many pain points of digital humans, such as "only reading scripts and unable to interact." The generation algorithm has improved the efficiency and convenience of content creation, reduced costs and barriers, enriched the diversity and personalization of content creation, and met the different needs and preferences of users. Natural language processing large models help improve the interactive experience of digital humans, allowing them to find a better way out from the past of "no brain, no soul."
At the same time, modeling rendering, AI-generated motion capture, and other technologies are also constantly advancing, making the performance of the entire digital human more human-like than ordinary AI robots, more comprehensive in the exploration and sorting of information and knowledge, and closer to human daily communication habits in sentence processing.
Second, the acceptance of virtual characters by the "netizens" represented by the post-90s and post-00s generation is gradually deepening, making it easy for them to invest emotions in virtual characters.
Third, under the further development trend of China's digital economy, the need for enterprises to reduce costs and increase efficiency has played a boosting role.
Industry practitioner Li Yuan (pseudonym) agreed with this view and explained with the example of live streaming. Real-person live streaming requires a certain cost to set up the scene, and with platforms such as Douyin and Meituan opening local live streaming, talent shortage is also a problem. Digital humans can fill this gap and can work 24/7.
"Through AI technology, the company has achieved cost control and large-scale production capacity in the three stages of 'creating, nurturing, and using' digital humans. In terms of AI products, this year Shiyou Technology has launched the Shiyou BOTA and Shiyou AI digital human live broadcast system, a digital human product system. Through the combined application of AI and digital humans, batch production of virtual humans is achieved to reduce costs and increase efficiency for the industry," said Ji Zhihui, founder and CEO of Shiyou Technology.
Currently, the application of digital humans is becoming more and more widespread, and the presence of digital humans can be seen in both the B-end and C-end. The "Virtual Digital Human Research Report 2.0" released by Tsinghua University shows that digital humans have penetrated into various industries, becoming a new generation of productivity and creativity. From the layout of leading companies, digital human product services account for 79% of the market in the B-end and 36% in the C-end.
Regarding the B-end application scenarios of digital humans, Chen Yang, the product VP of Xinyi Universe, admitted, "Mainly for customer service, marketing, cultural and tourism guides, and AI live streaming, because AI live streaming is essentially interacting with users watching the live stream and answering their questions. The difficulty lies in how digital human customer service can quickly provide the correct answers to the communication target. Based on the ChatGPT large language model, digital humans can obtain a lot of information, but at the same time, there may be the phenomenon of being unable to accurately answer the communication target's questions, or even fabricating answers. This will cause fatal damage to the customer service scenario."
In terms of the industry application and landing of digital humans, an IDC report introduces that the financial industry is currently a relatively mature field for the application of digital humans. By 2025, over 80% of banks will deploy digital humans, handling 90% of customer service and financial consulting. For example, Shanghai Pudong Development Bank is the earliest bank in China to "hire" digital employees. Currently, the 3D digital human "Xiao Pu" has served in over 20 positions, including wealth planner, document reviewer, lobby manager, and phone customer service.
In addition, a company's digital humans can be linked to internal systems, and employees can communicate with them to understand the company's rules and regulations, and inquire about various information.
In the future, digital humans will play a role in various fields such as healthcare, education, and manufacturing. For example, in the healthcare field, digital humans can serve as cognitive intelligent large models to assist doctors in diagnosis and treatment; in the education field, digital humans can serve as personalized teaching assistants to help students improve learning effectiveness.
Several industry insiders interviewed expressed that the C-end is also a relatively promising market in the future, and in the future, everyone may have their own digital human, but in terms of cost, technology, and equipment, it will still need to go through a period of development.
02 The insurmountable high cost wall, the cost of 3D digital humans reaches 1 million yuan
Enterprises that want to enter the game need to understand the cost-effectiveness.
Currently, digital humans are divided into two categories. One is the human-driven "avatar," which relies on human power to drive virtual anchors for live streaming. This driving method requires a large amount of shooting and post-production work, resulting in high costs. Many 3D virtual humans use this driving method.
The other type is AI-driven digital humans, which are trained to perform specific tasks through machine learning, data feeding, and other methods. These digital humans are usually used in service scenarios with high repetitive workloads. Currently, most of the live streaming hosts in the live broadcast room are 2D real-person digital humans belonging to this category.
3D digital humans often appear as animated characters and are suitable for creating virtual IPs. For this type of digital human, custom production is required for everything from facial contours to clothing scenes, which usually incurs higher costs and longer production cycles, with quotes exceeding 200,000 yuan.
For example, Nvidia once stated in an official blog that the 14-second video of Huang Renxun's virtual human appearance at the press conference involved the collaboration of 34 3D artists and 15 software engineers, totaling nearly a thousand man-hours.
This high cost has been confirmed by Li Yuan, "With traditional 3D modeling technology, a decent custom digital human requires tens of thousands of yuan in cost, and this is just the tip of the iceberg."
According to Ji Zhihui, CEO of Shiyou Technology, the market generally divides the cost of digital humans into three parts: creating, nurturing, and using. The first part is creating, which involves character creativity, original painting, modeling, binding, expressions, real-time rendering, etc., and can produce cartoon Q versions, Disney characters, second-dimensional, next-generation, beautiful realistic, and ultra-realistic styles, with prices ranging from tens of thousands to millions.
The second part is nurturing. After the digital human is created, low-cost, high-frequency content output is still needed to build the IP's awareness. For example, if a digital human needs to produce a short video or a TVC advertisement, the cost depends on factors such as the accuracy and effectiveness of the content, as well as the difficulty of the script, with costs ranging from thousands to tens of thousands per minute, mainly depending on the difficulty of the script.
The third part is using, mainly referring to the scenarios where digital humans produce content. "Currently, Shiyou Technology is involved in ten major application scenarios, including radio and television media, brand marketing, e-commerce live streaming, short videos, government culture and tourism, education and entertainment, film and television dramas, AR/VR/AI, NFT, metaverse, and various online and offline scenarios. For example, digital humans can serve as virtual anchors, media reporters, event hosts, and offline exhibition receptionists, etc., according to the project requirements of the clients, involving related execution costs," said Ji Zhihui.
The cost difference between different types of digital humans is significant. Compared to 2D digital humans developed using AI generation technology, the cost is much cheaper. "Overall, the cost of 2D digital humans is only 1/10 or 1/20 of 3D digital humans, which is currently a relatively down-to-earth and market-accepted level," said Ji Zhihui.
Shiyou Technology has a product line related to 2D digital humans. 2D digital humans do not require modeling, and the production process is relatively simple. It mainly involves training through AI technology after shooting a real-person video, and the cost of creating a digital human is only a few thousand yuan. In terms of nurturing content generation, only the input of a script is needed, and the digital human can speak and output to the outside world, with the cost of nurturing approaching zero afterward.
Of course, 2D digital humans are not suitable for all scenarios. In game scenarios and races such as virtual idols, companies can only use high-priced 3D digital humans, and the cost burden can be imagined. And whether it is 2D digital humans or 3D digital humans, they both face content disadvantages.
This year, in May, Douyin issued an AI identification order, beginning to regulate the new species of digital humans. Many 2D digital humans that lack competitiveness and only repeat scripted content were banned. Ji Zhihui mentioned that because Douyin, Kuaishou, and WeChat are content and e-commerce platforms, hosts must provide high-quality content. However, the quality of AI-generated content is not high, so the platforms do not provide traffic, ultimately resulting in unsold products. Therefore, achieving high sales through digital humans requires multiple factors such as having an operations team and good products to ensure a good return on investment in sales.
As for the public's concerns about the risks of "face swapping" and "immortality" brought about by AI, Li Yuan said, "The emergence and application of any new technology require corresponding regulations, and supervision is also responding. In January of this year, relevant departments issued some regulatory policies on AI-generated content. Regular manufacturers in the market also attach great importance to this, and these manufacturers have self-regulation and requirements."
03 Three technical challenges, digital humans cannot be as "vivid and flexible" as humans
It is worth noting that the current large-model-driven digital human products are still in the early stage of application. In addition to the cost constraints mentioned above, the industry generally believes that at the current stage, technological maturity and efficiency are still one of the challenges for digital humans.
A research report summarized three characteristics of digital humans, but there are many shortcomings in the technology's presentation of these characteristics.
First, digital humans have the appearance of humans, with specific features such as appearance, gender, and personality.
"If the client does not choose an ultra-realistic digital human, that is, completely replicating a real person's state, then the appearance technology of digital humans is already relatively mature, but the expressions and movements of digital humans still have some stiffness. However, in the case of complete self-driven motion instead of real human motion capture, it is difficult for digital humans to present natural expressions and movements," said Chen Yang.
Second, digital humans have human behavior, with the ability to express themselves through language, facial expressions, and body movements. However, many digital human products currently have relatively stiff performance in terms of voice, expressions, and interactive behavior.
Li Yuan believes that because digital humans lack the ability to express emotions and feelings, such as when feeling angry or wronged, they cannot use richer facial expressions and larger body movements to present themselves, resulting in digital humans having the appearance and voice of humans but not being as vivid and flexible as humans.
Third, digital humans have human thoughts, with the ability to recognize the external environment and interact with people.
"Although the emergence of ChatGPT gives digital humans a brain, if the market wants to portray a character with a specific personality, or even with its own growth experience and worldview, it is difficult to achieve this using ChatGPT alone. Currently, the technology cannot fully support this," said Li Yuan.
It is reported that currently, AI is not intelligent enough, which means that interactive digital humans driven by intelligence (TTSA character models) can only exist as supplementary roles (except in gaming scenarios), and those driven by real humans are still dominant in the market, such as digital humans acting as hosts in video live streaming and at exhibitions.
Chen Yang observed that holographic technology is used at exhibitions to interact with visitors. AI-driven digital humans mainly act as cultural and tourism guides and also appear in some Taobao live streaming rooms. They are used during time periods when real human hosts are not available, so their application scenarios are relatively limited.
However, Ji Zhihui believes that with the development of AI technology, the market for AI-driven interactive digital humans may be quite broad in the future. Digital humans driven by real humans will be more suitable for real-time interaction in 3D space, such as real-time interactive live streaming of 3D digital humans and applications like the metaverse.
From the lively beginning of the year to the present, industry practitioners and clients have also observed that the market is showing a rational return. Some noisy and speculative roles such as manufacturers and agents are accelerating their exit, hoping that digital humans will truly achieve cost reduction and efficiency improvement for enterprises in the future.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。