A lengthy analysis of the future development potential of AI-Agents and their profound impact on the Internet and Web3.
By: VION WILLIAMS
Exploring the Innovative Possibilities of AI-Agents
Consensus and Non-Consensus of AI-Agents
The increasing attention towards AI-Agents is largely due to the fact that LLM provides a feasible technical implementation route for the practical application of AI-Agents, and also because there are many AI-Agents related projects that have gained significant attention.
Although LiLian Weng has defined what LLM-driven AI-Agents are in her article, Deepmind is also attempting to define a unified concept of intelligent agents. I believe that the concept of AI-Agents will also form different types of differentiation based on the understanding of different AI companies.
However, a clear key consensus is that the automation of general problems based on LLM-driven Agents is what we recognize during this period of the explosion of large language models, and it forms a common understanding of AI-Agents.
Exploring Possibilities from the Relevance of Agents
In the context of the application of AI-Agents, at this stage, we should try to view AI-Agents from the perspective of "relevance," that is, to embrace the possibility of the form of AI-Agents with a trial-and-error tolerance and innovation, and not to seek a standardized answer with a narrow view like some critics. These are all undesirable.
For example, Auto-GTP as a possibility has actually inspired many Agents projects, but narrow criticism would miss the opportunity to capture new opportunities, which is a common phenomenon among Chinese developers. Without creative developers, how can you rely on your traditional competitive advantage in the era of natural language programming?
Although there are many introductions to AI-Agents related projects now, I believe there is a problem of homogenized listing of these introductions. These contents allow us to initially know which projects belong to the direction of AI-Agents, but they do not demonstrate the potential possibilities of AI-Agents in different application fields from the perspective of relevance, as well as the ecological position of a certain type of AI-Agents project.
For example, in my introduction, Auto-GPT, BabayAGI, and MetaGPT will be classified into one ecological category because they have a continuity in a certain path;
Building a Comprehensive Understanding in the Puzzle of Agents
In summary, in the representative project introduction of AI-Agents, I used the perspectives of "relevance," "ecological position," and "continuity" to introduce representative projects, allowing us to vaguely see the future development trends of AI-Agents.
The following 10 representative related projects are presented, including some related reference projects. I will use case studies to piece together a relatively complete picture, enough to make more people realize how the potential of Agents can change everything on the Internet, including reshaping the landscape of Web3.
Two Major Future Directions of AI-Agents
AI-Agents will roughly be divided into two major directions: Autonomous Agents and Generative Agents.
Autonomous Agents, represented by Auto-GPT, demonstrate the ability to automatically execute various tasks to achieve desired results based on natural language task descriptions. In this collaborative relationship, Autonomous Agents serve humans and have clear tool attributes;
Generative Agents, represented by the virtual town of 25 intelligent agents published by Stanford, lean more towards the concept of "nativeness" with human-like characteristics, autonomous decision-making ability, and long-term memory. In this collaborative relationship, Agents have digital native social relationships, not just serving as tools for humans;
Auto-GPT
Auto-GPT is the most well-known open-source project, and its introduction on GitHub is very simple: "An experimental open-source attempt to make GPT-4 fully autonomous."
In simple terms, Auto-GTP can fully automate the achievement of the final task result through a one-sentence task requirement; the core logic of Auto-GPT's ability to autonomously complete tasks lies in the use of the language model's task planning ability, step-by-step analysis of tasks, and automatic completion of task execution steps. In this process, it involves searching for results on the network and providing feedback to the language model, and further task disassembly and execution.
In plain language, Auto-GPT completes the task in the process of "self-asking and self-answering," without the need for humans to provide prompts.
Despite many criticisms of the huge token consumption and lack of stable results of Auto-GPT, as a case of LLM-based automation, it has greatly aroused the curiosity of developers. Similar to Auto-GPT, there are also projects like BabayAGI and MetaGPT, which are at the forefront of experimental exploration of automation based on open-source projects.
Project link: Auto-GPT
BabyAGI can automatically create, sort, and execute new tasks based on the results of previous tasks and our preset goals. It uses natural language processing technology to create new tasks based on goals and stores task results in a database for future reference.
BabyAGI is actually a Python script that completes the following steps by running an infinite loop:
- Retrieve the first task from the task list.
- Send the task to the execution agent, which uses OpenAI's API to complete the task based on the context.
- Enrich the results and store them in Chroma/Weaviate.
- Create new tasks and reorder the task list based on preset goals and the results of the previous task.
Project link: BabyAGI
In theory, both Auto-GPT and BabyAGI represent the initial period of our current LLM explosion, and our exploration of AGI based on LLM. I believe that a LLM-driven general task processing processor is the holy grail of the future AI-Agents field.
Generative Agents
The paper "Generative Agents: Interactive Simulacra of Human Behavior" published by researchers from Stanford and Google is a well-known AI-Agent project. In summary, this research places 25 AI intelligent agents in a pixel-style virtual town, where the agents can simulate interactive human behavior, interact with the environment of the virtual town, and interact with humans outside the virtual world.
There are two key solutions in this paper that are worth our attention:
1. Architecture of Generative Agents
The agents perceive their environment and save all perceptions in a comprehensive record called the memory stream, which records the agents' experiences. Based on their perceptions, the architecture retrieves relevant memories and uses these retrieved behaviors to determine an action. These retrieved memories are also used to form longer-term plans and create higher-level reflections, both of which are input into the memory stream for future use.
2. Memory Stream
Based on the architecture of generative agents and the interactive environment of the experiment, the agents will inevitably generate a large amount of memory data. The Memory Stream is a database that comprehensively records all memories of the generative agents. It is a list containing multiple memory objects, each of which includes a natural language description, a creation timestamp, and a recent access timestamp. The most basic element of the memory stream is an observation, which is an event directly perceived by the agent. Common observations include actions performed by the agent, or actions perceived by the agent or non-agent objects.
Based on these two key components, the overall behavior of generative agents is divided into three main parts: [Memory and Retrieval], [Reflection], and [Planning and Response]. For more details, please refer to the original paper.
This paper and the experiment verify the interactive behavior generated by agents based on LLM, allowing them to behave credibly in a digital environment, and generative agents can play a role in many digital environments, especially in forming a human-machine interaction relationship.
The most intuitive feeling we can get is that generative agents, as a kind of native digital resident of the metaverse, are created and interact with various actions in the human metaverse environment. In fact, we can completely simulate a highly developed digital virtual world of AI-Agents, from which humans can extract the digital labor results of AI-Agents.
Agents: How They Become Working Partners
Due to the translation of "Agents" into "代理" in many contexts, the concept of agents is easily associated with the role of intermediaries, making it difficult for many people to intuitively establish associations with the application scenarios of agents. In these three cases, they demonstrate how agents can become "human experts" that can be hired, fully automated marketing companies that do not require human involvement, and how agents can form teams to collaborate with each other.
In the following cases, we can use NexusGPT to create multiple expert staff and assemble them into a team employed by humans using GPTeam, working for a fully automated company like AutoCorp. Only when we piece together these puzzles can we intuitively feel that the future has arrived;
NexusGPT
This is a platform created by independent developer Assem, claiming to be the world's first AI freelancer platform. NexusGPT is based on the LangChainAI framework, using the GPT-3.5 API and Chroma (an AI-native open-source embedded database). The NexusGPT platform has over 800 AI agents with specific skills.
Agents on NexusGPT can intelligently adjust the difficulty of questions:
- Level 1: Simple conversation
- Level 2: Pre-trained operations/plugins
- Level 3: AutoGPT mode
However, all of these depend on the support of function calls from OpenAI and LangchainAI.
During the execution of agent tasks, the author considered converging to high ratings through human feedback and a rating observation system in the loop. This is actually to provide a strategy for iterative optimization of AI agents with specific skills in communicating task requirements with human clients.
NexusGPT represents a future business model of hiring agents by humans. This project actually has many areas for improvement, such as the combination of agents with expert modules (expert systems and expert models), and the calculation of the cost of hiring agents by the client based on token consumption. These will change the traditional labor market and also change the collaboration methods of DAO.
AutoCorp
AutoCorp was created by Mina Fahmi and their team in New York during a GPT/LLM hackathon in 5 hours. AutoCorp is a fully autonomous brand marketing company. AutoCorp automatically creates brand advertisements and product designs for a direct-to-consumer T-shirt company. When customers make new consumption demands, AutoCorp updates its themes and generates new design assets, constantly iterating towards better business directions.
First, AutoCorp created an initial idea for the T-shirt brand based on the initial idea. Then, it used this initial idea to generate various assets and default style guides for the company. When customers make demands, AutoCorp updates its plans based on these demands. If a plan leads to fewer sales, AutoCorp makes adjustments. This process has run from start to finish and can actually be connected to advertising APIs and custom T-shirt APIs for deployment in the real world.
This paragraph is quoted from Mina Fahmi's tweet, and AutoCorp was created by Mina Fahmi and their team in New York during a GPT/LLM hackathon in 5 hours. Their purpose in creating AutoCorp was to push the concept of "Autonomy" to the extreme.
AutoCorp's purpose is highly consistent with DAO. If the ultimate goal of a decentralized organization is to eliminate the "human" factor, then fully automating production operations is a reasonable development demand of the DAO concept. AutoCorp actually represents the future business development direction of DAO.
GPTeam
GPTeam is an open-source multi-agent simulation system. GPTeam uses GPT-4 to create multiple agents that collaborate to achieve predefined goals. The main goal of this project is to explore the potential of GPT models in improving multi-agent productivity and effective communication.
The GPTeam uses independent agents, each equipped with memory, and interacts through communication. The memory and reflection of agents are inspired by this research paper. Agents move in the world and perform tasks at different locations based on their tasks and the positions of other agents. They can communicate with each other and collaborate on tasks, working towards common goals in parallel.
Project link: GPTeam
In fact, there are still many open-source projects similar to GPTeam, such as Dev-GPT, an automated development team that creates customized microservices for users. The team consists of virtual product managers, developers, and DevOps engineers. The technical approach of Dev-GPT mainly focuses on identifying and testing effective task strategies. If it fails 10 times in a row, it switches to the next method.
We will see more and more projects designing AI agents as a type of AI team. Defining agents as a production role is not difficult, for example, in the case of NexusGPT, developers can set each agent as an agent with specific skills, and then figure out how to coordinate these agents to perform automated tasks/projects while combining their individual skills. This is a challenging task, and Project Atlas Agents, in its exploration of natural language-based automated operations, actually provides a good application scenario for an Agents-team.
All of this makes me think of DAO again, an automated task collaboration organization based on automated governance logic.
Agents: How They Replace Repetitive Work
Before AI completely replaces our work, the next direction for agents in the business field is to replace most of our current repetitive labor. Before the emergence of LLM-based agents, RPA (Robotic Process Automation) was the solution sought by the industry, but traditional RPA has a high threshold and cannot be popularized to the public. RPA is a remedy for the lack of automation in traditional IT interaction logic, while current agents can meet the functional requirements of RPA through natural language communication.
The following two projects demonstrate how LLM-based agents will help us free ourselves from repetitive labor in our daily work and academic research (in fact, the potential of these two projects goes beyond this).
Cheat Layer
"Automate your business Using Natural Language" is the brand slogan of Cheat Layer. Cheat Layer solves impossible business automation problems using a customized trained GPT-4 machine learning model as an AI software engineer for each user.
Cheat Layer released two products on Product Hunt, one is Cheat Layer, and the other is Project Atlas Agents. Project Atlas Agents is a no-code project management interface that can be used to build and iterate AI agents.
Cheat Layer uses natural language to automate business processes through a Google Chrome plugin, enabling automation of most routine operations on web pages. Cheat Layer easily brings to mind RPA, or Robotic Process Automation. There have been many discussions about the relationship between agents and RPA, and it is an undeniable fact that traditional RPA is being replaced by agents.
Through Cheat Layer, business process automation settings are done using natural language, while Project Atlas Agents are used to manage different automated processes. In simple terms, we can use a natural language pattern to create an agent to manage the automation of a certain business. As the complexity of the business increases, we can iterate and improve this agent.
Translation End
I'm sorry, but the text is too lengthy for a single translation. The character limit for a single translation is 5000 characters. If you can split the text into smaller parts and submit them separately, I'd be happy to help translate it for you.
"The current agents are mainly created and tested in simplified synthetic environments, which greatly limits their representation of real-world scenarios. In this paper, we have established an environment for agent command and control that is highly realistic and reproducible. Specifically, we focus on agents performing tasks on the web and have created a fully functional environment with four common domains: e-commerce, social forum discussions, collaborative software development, and content management. Our environment is diverse, including tools (such as maps) and external knowledge bases (such as user manuals) to encourage human-like task-solving methods.
Based on our environment, we have released a set of benchmark tasks, focusing on evaluating the functional correctness of task completion. Our benchmark tasks are diverse in type, spanning a long time span, and are designed to simulate tasks that humans frequently perform on the internet. We have designed and implemented several autonomous agents, integrating the latest technologies, such as think-first-act-later.
The results indicate that solving complex tasks is challenging: our best GPT-4-based agent achieved only a 10.59% end-to-end task success rate. These results highlight the need for further development of powerful agents. The current state-of-the-art language models still have a long way to go in these real tasks, and WebArena can be used to measure such progress."
Research Paper Title: WebArena: A Realistic Web Environment for Building Autonomous Agents
Research Paper Link: https://arxiv.org/pdf/2307.13854.pdf
This is an academic research result from AI researchers at Carnegie Mellon University. In fact, WebArena complements the development architecture of the widely known langchain or various Agents-Team projects. We need a simulation testing platform for Agents to ensure their robustness and effectiveness.
The main function of this platform is to provide a means to verify the feasibility of various Agents projects. I can even envision a scenario where, in the future, when I hire an Agent on a platform, we will test the actual working ability of the Agent through a platform similar to WebArena, giving humans the power to determine the pricing of AI-Agents.
AI-Agents and Their Impact on Everything
Automated Collaboration Network Based on Agents
Through the introduction and analysis of over a dozen projects, these different projects come together to form a relatively complete understanding of Agents. Agents are the direction in which the potential of LLM is truly unleashed. LLM, as the central hub, is given the ability to act through Agents. The diversity of functions that Agents based on LLM possess will cause Agents to explode like a biological phenomenon, and humans and Agents will form a digital companionship/coexistence.
The collaborative network of human society will also be transformed by the widespread application of Agents, and the production structure of human society will be upgraded, leading to changes in various aspects of society.
Changing Everything on the Internet
AI-Agents completely change the way we access, process, produce, and use information on the internet. They change the current business models that rely on the internet. An intelligent network with communication capabilities and the ability to autonomously execute tasks is the future form of the internet, and Agents are the intelligent medium with which we communicate and execute tasks.
Reshaping the Narrative of Web3
The narrative attributes of NFTs in Web3 naturally align with the narrative of the Metaverse, and in the early stages of Web3, these two were deliberately set against each other. This is a narrow view. Similarly, in today's AGI narrative, many Web3 practitioners only understand AI tools, but do not deeply consider the narrative logic of AGI, creating deliberate resistance to understanding the impact of AGI on DAOs.
Web3, Metaverse, and AGI are three highly related directions, and traditional mainstream technology media and investment institutions have not yet established a new paradigm for the future narrative of technology. They continue to influence the market with old narrative paradigms, leading to the dispersion and divergence of resources among practitioners in these directions.
The entire Chinese technology industry currently faces a key problem: the lack of a new narrative paradigm for technology. We are always focused on project work, but lack a narrative that can unite technological forces. No matter if it's Web3, Metaverse, or AGI, these three narratives did not originate in China.
I am looking forward to an era where there is a variety of technological narratives and a diversity of voices, and we urgently need to form a new understanding of technological narratives in order to find the correct path of development and determine our sustainable position in the entire technological ecosystem.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。