Article Source: Quantum Bit
ByteDance's version of ChatGPT is now available for everyone to use!
The web version, Android version, and iOS version are all launched simultaneously, and you can log in with your Douyin account, phone number, or Apple account.
And there's no need to wait in line, you can start chatting directly with this AI assistant named "Dou Bao" as soon as you open it.
It's doubao.com, not douban.com~
As soon as the news came out, many users went to check it out.
We also quickly tested it out~
On par with ChatGLM
First, let's see how Dou Bao introduces itself:
Since that's the case, let's try out these "official functions" to see if they are reliable.
For example, we selected a few interesting mistranslations for Dou Bao to correct, and the feedback from Dou Bao was quite good:
Especially the translation of "鸳鸯锅" (mandarin duck hot pot), it can be said to be superior to Google and DeepL.
In terms of knowledge, of course we have to ask some "tricky" questions
: Is the electricity released by a thunderstorm direct current or alternating current?
Dou Bao's answer can be summarized as "not direct current, but direct current" 😂, but the previous part was still acceptable.
Setting aside the fun, Dou Bao's appetizers did give us a good first impression.
So now let's move on to the main course—let's compare it with the well-received domestic open-source large model ChatGLM in terms of copywriting, logical reasoning, mathematics, and code.
In terms of copywriting, Dou Bao claims to be able to write various styles of copy for platforms like Zhihu and Xiaohongshu.
Let's try something quirky to see if it can create it, for example… a Xiaohongshu note about American-style soy milk.
The copy is accompanied by emojis, and even has tags, it seems that Dou Bao really understands Xiaohongshu.
But are you sure "醇香甘甜" (mellow and sweet) is used to describe soy milk…
Although ChatGLM wrote a lot, it didn't quite grasp the topic and directly treated soy milk as soybean milk…
(From this perspective, Dou Bao might have also misunderstood it as soybean milk, but it didn't directly say soybean milk, did it?)
It seems that Dou Bao does have a certain creative ability, so let's increase the difficulty.
Let's have it write a short video script for the promotion of "豆汁美式" (American-style soy milk).
The details of ChatGLM's version need to be more comprehensive, but Dou Bao's version is also quite comprehensive.
The copywriting abilities of both are comparable, so how about their logical reasoning abilities?
We presented a reasoning question, and neither of them got it right (the correct answer is A3, B1, C2):
Although neither got it right, it seems that Dou Bao's train of thought was heading in the right direction.
As for ChatGLM's answer, I just couldn't understand it.
It's difficult to evaluate their performance in this aspect, so let's move on to the nightmare stage for all large models—mathematics.
We won't test them with simple problems like the chicken and rabbit in the same cage, we'll go straight to a difficult one, and give them a question from a high school entrance exam to try.
△2023 Beijing Exam Question 16
(We didn't input the image, but it can be solved without the image; we also omitted the proof for the first question)
Dou Bao used a pure geometric method, and the final answer was correct, but unfortunately the process was incorrect.
**
**
△Errors start appearing from the red box
ChatGLM used a vector method:
First, the result is wrong, but 120 does complement 60 degrees, so is there a small problem?
But we quickly found the flaw:
It shouldn't be approximated as a negative number… how did you get a negative number from dividing two positive numbers…
The correct answer is quite different, and since the second question requires the proof from the first question, we also included the process for the first question:
In this regard, both large model participants still have a lot of room for improvement in their mathematical training.
So, when faced with the popular code problem, how does Dou Bao respond?
Let's start with the basic bubble sort algorithm.
We tried running it (replacing the preset numbers), and it successfully output the answer:
Next, let's go to LeetCode, and we chose a relatively simple problem of converting Arabic numerals to Roman numerals.
Dou Bao quickly generated a piece of code, and even provided an explanation:
ChatGLM's code is like this (also with an explanation):
The result was correct for Dou Bao, and incorrect for ChatGLM:
But when it comes to slightly more complex problems, they both got it wrong.
In addition to designing algorithms, we also wanted to see if they could use code to "draw pictures".
We randomly generated two sets of data to see if we could create a line graph:
The result using Dou Bao's code produced this…
As for ChatGLM… well, it threw an error and couldn't run.
That's all for the code section, in a nutshell: they both need more practice.
After reading so much, readers may be feeling a bit tired, so we've prepared a "dessert" for you, with some light-hearted content.
Let's have some fun!
Q1: Can radishes really "stimulate the appetite"?
"Seek professional medical help when necessary," is this like performing surgery on oneself…
Q2: Are guide dogs not allowed inside for the benefit of the blind, or for the benefit of the guide dogs?
Dou Bao chose option C between A and B.
And when asked "Why do meteorites always hit meteorite craters," Dou Bao got it right, but the answer was a bit complicated.
In summary, the evaluation of "Let's Have Some Fun" is: AI is still too naive and can't understand the complex thoughts of humans.
ByteDance's Big Model Starts to Flex Its Muscles
ByteDance's decision to open testing for "Dou Bao" at this time seems somewhat unexpected.
But in reality, this timeline can be traced back:
In March and April of this year, when ChatGPT was making waves, there were already reports of ByteDance forming a big model team.
According to 36kr, their exploration direction mainly focused on language and image big models, hoping to integrate big models with ByteDance's own downstream businesses such as search and advertising.
However, at that time, the response from ByteDance's technical team was:
The technology platform is exploring in these areas, but it's still in the early stages and not mature.
During the "Battle of the Big Models," ByteDance seemed to have no intention of officially joining the battle, and its cloud platform, Volcano Engine, also touted itself as "building a technical foundation for big models" and integrating third-party big models into a big model flagship store.
It wasn't until June that ByteDance was reported to have started internal testing of an AI conversation product, codenamed "Grace."
And the website for Grace, "gracebot.cn," now directly redirects to the Dou Bao official website.
Although the Dou Bao team does not officially acknowledge that Dou Bao is Grace, it currently appears that Dou Bao is the open testing version of Grace.
In addition, a friend from the "Dou Bao" project team revealed to us that Dou Bao is still in the early stages of development and testing, and there are still many limitations during the testing period, and the generated content may not be accurate. They welcome feedback from test users.
It is worth noting that recently, a multimodal big model called BuboGPT has launched a demo on huggingface. This big model also has technical participation from ByteDance.
According to the paper, BuboGPT supports three modalities: text, image, and audio, and can achieve fine-grained multimodal joint understanding.
For example, given a picture like this:
BuboGPT can not only recognize the frog and the banjo in the frog's hand, but also summarize the specific actions of the frog and the environment it is in.
One More Thing
Now that ByteDance has finally made a move, how would you rate Dou Bao's performance?
On the other hand, as the big models gradually emerge from the frenzy of new model releases every week, the quality of responses from domestic big model pioneers has quietly improved.
For example, the "My parents didn't take me to their wedding" question, which stumped many big model heroes in the past, can now be answered with sound reasoning by many domestic big models.
△Search on Baidu, use iFly
The evaluation scale for domestic big models may have reached a new level.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。