OpenAI early employee David Luan's latest interview: DeepSeek has not changed the narrative of AI technology.

Achieving more intelligence at a lower cost does not mean you will stop pursuing intelligence.

Author: MD

Produced by: Bright Company

Recently, on the Redpoint Venture podcast "Unsupervised Learning," Redpoint Venture partner Jacob Effron interviewed David Luan. They explored the insights that DeepSeek brings to the research and practice of large models from a technical perspective, and shared thoughts on the current bottlenecks of AI models and potential breakthrough directions.

David Luan is an early employee of OpenAI. After graduating from Yale University in 2009, he first joined iRobot to work on robotics, and then held positions at several companies (including Microsoft) until he joined the still-early-stage OpenAI in 2017, when the research team had only 35 people. In this interview, he also mentioned that his reason for joining an AI company stemmed from his interest in robotics, believing that "the biggest limitation of robots lies in the intelligence level of the underlying algorithms."

In 2020, David Luan left OpenAI to join Google, but after a short time, he co-founded Adept with two colleagues he met during his time at Google, serving as CEO. Last August, he joined Amazon as the head of the AGI San Francisco lab.

Below is the text of the interview compiled by "Bright Company" (slightly edited):

Limitations of Large Models and the Value of Reinforcement Learning

Jacob: David Luan is the head of Amazon's AGI lab. He was previously the co-founder and CEO of Adept, a company that raised over $400 million to develop AI Agents. He participated in many key breakthroughs during his tenure as Vice President of Engineering at OpenAI. I am Jacob Effron.

Today on the show, David and I discussed many interesting topics, including his views on DeepSeek, predictions for future model advancements, the current state of Agents, how to make them reliable, and when they will be ubiquitous. He also shared some interesting stories about the early days of OpenAI and its unique culture. This was a very engaging conversation, as David and I have known each other for over ten years. I think the audience will enjoy it. David, thank you for joining our podcast.

David: Thank you for having me. This will be very interesting since we have known each other for over ten years.

Jacob: I remember when you first joined OpenAI; it seemed intriguing, but I wasn't sure if it was a wise career choice. Then it became clear that you always saw opportunities earlier than others.

David: I was really lucky because I have always been interested in robotics, and (at that time) the biggest limitation of robots was the intelligence level of the underlying algorithms. So I started working in artificial intelligence, seeing these technologies advance in our lifetime is really cool.

Jacob: Today I want to explore many topics with you. I want to start with a recent hot topic. Clearly, there has been a huge reaction to DeepSeek in the past few weeks. People have been talking about it, and stocks have plummeted. Some say this is bad for OpenAI and Anthropic. I feel that people's emotions have calmed down from the initial panic. But I'm curious, in the broader discussion, what views on the impact of this event are correct, and what are incorrect?

David: I remember that morning when everyone was focused on the news about DeepSeek. I woke up and saw five missed calls on my phone. I thought, what on earth happened? The last time this happened was when SVB (Silicon Valley Bank) collapsed, and all the investors were calling me to pull funds from SVB and First Republic Bank. So I thought something terrible must have happened. I checked the news and found that the stock plummeted because of the release of DeepSeek R1. I immediately realized that people's understanding of this matter was completely wrong. DeepSeek did an excellent job, but it is part of a broader narrative — that we first learn how to make new large models smarter, and then we learn how to make them more efficient.

So this is actually a turning point. And where people misunderstand is that just because you can achieve more intelligence at a lower cost does not mean you will stop pursuing intelligence. On the contrary, you will use more intelligence. So when the market realized this, we returned to rationality.

Jacob: Given that at least the foundational models seem to have been trained on OpenAI's foundation, you can make the basic DeepSeek model behave like ChatGPT in various ways. Looking ahead, will OpenAI and Anthropic stop more openly releasing these models due to knowledge distillation?

David: What I think will happen is that people will always want to build the smartest models, but sometimes these models are not always reasoning efficiently. So I think we will increasingly see that, although people may not explicitly discuss it, they will train these huge "teacher models" in internal labs, utilizing all the computational resources they can access. Then they will try to compress it into efficient models suitable for customer use.

The biggest problem I see right now is that I envision the use cases of artificial intelligence as concentric circles of complexity. The innermost complexity might be simple conversational interactions with foundational language models, which we were able to do well with GPT-2. Each additional layer of intelligence, such as performing mental arithmetic, programming, or later Agents, even drug discovery, requires smarter models. But each previous layer of intelligence has become so cheap that it can be quantized.

Jacob: This reminds me of the trend of test-time compute. This seems like a very exciting path forward, especially in programming, mathematics, and other easily verifiable fields. How far can this paradigm take us?

David: There is a series of papers and podcasts that document my discussions over the years about how to build AGI (Artificial General Intelligence).

Jacob: Let's add something new to these discussions.

David: So now we can prove that we had this conversation at this moment. But back in 2020, when we started seeing the emergence of GPT-2, GPT-3 might have already been in development or completed. We began to think about GPT-4, and we live in a world where people are uncertain whether simply predicting the next token can solve all AGI problems.

My point, and the point of some people around me, is actually "no." The reason is that if a model is trained for next token prediction, it will essentially be penalized for discovering new knowledge because new knowledge is not in the training set. Therefore, what we need to do is look at other known machine learning paradigms that can truly discover new knowledge. We know that reinforcement learning (RL) can do this; RL can do this in search, right? Yes, or like AlphaGo, which may have been the first time the public realized we could use RL to discover new knowledge. The question has always been, when will we combine large language models (LLMs) with RL to build systems that have all of humanity's knowledge and can build upon it.

Jacob: So, for those areas that are not easily verifiable, such as healthcare or law, can this test-time compute paradigm allow us to build models that can handle these issues? Or will we become very good at programming and mathematics but still unable to tell a joke?

David: This is a topic worth debating, and I have a very clear viewpoint.

Jacob: What is your answer?

David: The generalization ability of these models is stronger than you think.** Everyone is saying, I used GPT-1, and it seems better at math, but when waiting for it to think, it might be a bit worse than ChatGPT or other models. I think these are just small bumps on the way to something more powerful. Today, we have seen some signs that by explicitly verifying whether the model correctly solved the problem (as we see in DeepSeek), it indeed leads to some transfer on slightly ambiguous problems in similar fields. I think everyone is working hard; my team and other teams are working hard to address the human preference issues in these more complex tasks to meet those preferences.

Jacob: Yes. And you always need to be able to build a model to verify, like "Hey, this output is a good legal opinion," or "this output is a good medical diagnosis," which is clearly much more difficult than verifying whether a mathematical proof or code runs.

David: I think what we are leveraging is the gap between the quality of these models** — the ability of the same set of neural network weights to judge whether they have done a good job compared to the ability to generate the correct answer. We always see these models being stronger in judging whether they "did a good job" than in "generating good answers." To some extent, we are leveraging this by using some RL tools to let it feel whether it has done a good job.

Jacob: What research problems need to be solved to truly launch models like this?

David: There are too many problems; I think I might just list three that we need to address. First, I think the first problem is that you need to really know how to build an organization and processes to reliably produce models.

I have always told my team and those I work with that today, if you operate a modern AI lab, your job is not to build models but to build a factory that can reliably produce models. When you think about it this way, it completely changes your investment direction. Until you reach reproducibility, I think there is not much progress to be made in some sense. We have just gone through a process of moving from alchemy to industrialization, and the way these models are built has changed. Without this foundation, these models cannot work.

I think the next part is that you must go slow to go fast. But I think that is the first part. I always believe that people are attracted to algorithms because they look cool and sexy. But if we look at what really drives all this, it is actually an engineering problem. For example, how do you perform large-scale cluster computing to ensure they can run reliably for a long enough time? If a node crashes, you don't want to waste too much time on your task. To push the frontier of scale, this is a real problem.

Now, in the entire field of reinforcement learning (RL), we are about to enter a world where there will be many data centers, each performing extensive reasoning on foundational models, perhaps even testing in new environments brought by customers to learn how to improve the models and feed this new knowledge back to a central location, allowing the models to learn to become smarter.

Jacob: Some people, like Yann LeCun, have recently been criticizing the limitations of large language models (LLMs). I would like you to summarize this criticism for our audience and then share your thoughts on those who say these models will never be capable of true original thinking.

David: I think we already have counterexamples; AlphaGo is an example of original thinking. If you look back at the early work of OpenAI, we used RL to play Flash games. If you are of that age, you might remember MiniClip and similar games. These were once pastimes during middle school, but it’s really interesting to see them become the cornerstone of AI. At that time, we were researching how to use our algorithms to play these games simultaneously, and you quickly realize they learned how to speedrun by exploiting glitches like passing through walls, which humans had never done.

Jacob: In terms of validation, it mainly involves finding clever ways to validate these different domains.

David: You just use the model.

How to Build Reliable Agents

Jacob: I want to shift the topic to the world of Agents. How would you describe the current state of these models?

David: I am still incredibly excited about Agents. It reminds me of 2020 and 2021 when the first wave of truly powerful models like GPT-4 emerged. When you try these models, you feel the immense potential—they can create excellent rap songs, deliver great roasts, and handle basic three-digit addition. But when you ask it to "help me order a pizza," it merely mimics the dialogue patterns of a Domino's Pizza customer service representative, failing to complete the actual task. This clearly exposes significant flaws in these systems, right?

Since then, I have been convinced that we must solve the problem of Agents. When I was at Google, we began researching what later became known as "tool use"—how to show large language models (LLMs) operational interfaces and let them decide autonomously when to take action. Although academia has always referred to it as "Agents," the public had not yet formed a unified understanding at that time. To address this, we tried to create a new term "Large Action Model" to replace "Large Language Model," which sparked some discussion. But ultimately, the industry chose the term "Agent," which has now been misused to the point of losing its original meaning, which is unfortunate. However, it’s still cool to be the first modern Asian company exploring this field.

When we founded Adept, the best open-source LLMs were underperforming. Since there were no multimodal LLMs (like LLMs with image input, such as the later GPT-4v) at that time, we had to train our models from scratch. It was a bit like starting an internet company in 2000 and having to call TSMC to manufacture your own chips; it was absolutely crazy.

So along the way, we learned that large language models, without today’s RL technology, are essentially behavioral cloners. They do what they see in the training data—this means that once they encounter a situation they have never seen before, their generalization ability is poor, and their behavior becomes unpredictable. So Adept has always focused on useful intelligence. What does practicality mean? It’s not about launching a cool demo that goes viral on Twitter. It’s about putting these technologies in people’s hands so they no longer have to do the tedious work that most knowledge workers have to do, like dragging files on a computer. So these knowledge workers care about reliability. One of our early use cases was: can we handle invoices for people?

Jacob: Everyone loves handling invoices (laughs). For these general models, this seems like a natural starting point.

David: It’s a great "Hello World." At that time, no one was really doing these things, so we chose an obvious "Hello World" use case. We also worked on other projects like Excel. If this system deleted a third of your QuickBooks entries one out of every seven times, you would never use it again. Reliability is still an issue; even today, systems like Operator are very impressive, seeming to outperform other cloud computing Agents. But if you look at these two systems, they both focus on end-to-end task execution, like when you input "I want you to help me find 55 weekend getaway spots," it tries to complete that task. But end-to-end reliability is very low, requiring a lot of human intervention. We still haven’t reached a point where businesses can truly trust these systems to do it "once and for all."

Jacob: We need to solve this problem. Perhaps you could explain to our audience, if you start with existing foundational multimodal models, what work is actually needed to transform them into a Large Action Model?

David: I can discuss this from a higher-dimensional perspective, but basically, there are two things that need to be done. The first is an engineering problem, which is how to present what can be done in a way that the model can understand. For example, here are the APIs that can be called, and here are the UI elements you can invoke. Let’s teach it a bit about how Expedia.com (note: travel service website) or SAP works. This involves some research engineering. This is the first step: giving it an awareness of its own capabilities and basic action abilities.

The second part is where it gets interesting, which is how to teach it to plan, reason, and re-plan, and to follow user instructions, even being able to infer what the user really wants and complete those tasks for them. This is a daunting R&D challenge, and it is very different from conventional language model work, because conventional language model work is "let’s generate a piece of text," even today’s reasoning tasks, like math problems, have a final answer.

So it’s more like a single-step process; even if it involves multi-step thinking, it just provides you with the answer. This is a complete multi-step decision-making process that involves backtracking, trying to predict the consequences of your actions, and realizing that the delete button might be dangerous, and you have to complete all this work in the basic settings.

Then you put it in a sandbox environment and let it learn under its own conditions. The best analogy is what Andrej Karpathy (note: founding team member of OpenAI, founded AI+ education institution Eureka Labs in 2024) said, modern AI training is somewhat like the organization of a textbook. First, you have a complete explanation of a physical process, followed by some example problems. The first part is pre-training, the example problems are supervised fine-tuning, and the last step is open-ended questions, perhaps with answers at the back of the textbook. We are just following this process.

Andrej Karpathy's description of large models (Source: X.com, Bright Company)

Jacob: I think you must have thought a lot about how these intelligent agents will truly enter the world. I want to ask a few questions. First, you mentioned that part of the problem is making the model aware of what it can access. So, over time, how will the model interact with browsers and programs? Will it be similar to human interaction, or just through code? Are there other methods?

David: If I were to comment on this field, I think the biggest problem right now is that people lack creativity in how to interact with these increasingly intelligent large models and Agents. Remember when the iPhone first came out, and the App Store launched, people started making all sorts of applications, like a button that makes a burp sound or an app that pours beer into your mouth by tilting the phone. Our interfaces today feel like that; they feel terrible because chatting is a super limited, low-bandwidth interaction, at least in some respects. For example, I don’t want to decide my pizza toppings through seven rounds of dialogue.

This lack of creativity frustrates me. I think part of the reason is that the excellent product designers who could help us solve these problems have not yet truly understood the limitations of these models. This situation is changing rapidly, but conversely, so far, those who have been able to drive technological advancement have always viewed it as "I’m delivering a black box here," rather than "I’m delivering an experience here."

When this changes, I look forward to seeing systems like this, where when you interact with the agent, it actually synthesizes a multimodal user interface for you, listing what it needs to get from you and establishing a shared context between humans and AI, rather than the current paradigm where you are just chatting with it. It’s more like you and it are doing something on the computer together, looking at the screen, more parallel than vertical.

Jacob: I think you mentioned that while Operator is impressive, it is not perfect at times. So when do you think we will have reliable intelligent agents?

David: I think Operator is amazing; it’s just that the entire field currently lacks the final piece of the puzzle.

Jacob: I think, considering the history of autonomous driving, they demonstrated autonomous driving as early as 1995, with vehicles able to cross the country and complete 99% of the journey.

David: Yes.

Jacob: Do we need to wait another 30 years?

David: I don’t think so, because I believe we actually have the right tools now.

Jacob: You mentioned earlier that AGI (Artificial General Intelligence) is not far off.

David: What I am looking for in the field of Agents is a major milestone where I can give this agent any task during training, come back a few days later, and it has completed it 100%. Yes, just like humans brought us a 5% improvement in reliability, but this agent has learned how to solve the problem.

Jacob: As you mentioned earlier, when you founded Adept, there were no truly open-source models, let alone multimodal open-source models. Do you think if someone started a company like Adept today, could a startup succeed here? Or will it ultimately be the foundational model companies and hyperscale cloud service providers that drive the ball forward?

David: I have a lot of uncertainty about this issue. But my current view is that I personally believe AGI is not far off.

Jacob: When you mention AGI, how do you define it?

David: A model that can perform any useful task that humans do on a computer is part of the definition. Another definition I like is that it is a model that can learn to do these things as quickly as a human. I don’t think these are too far off, but I also don’t think they will spread rapidly throughout society. As we know, according to Amdahl's Law, once you really accelerate one thing, other things become bottlenecks, and the overall acceleration you gain is not as great as you might imagine.

So, I think what will happen is that we will have this technology, but the ability of humans to use these technologies efficiently will take quite a long time. Many of my colleagues refer to this as "capability overhang," a significant surplus of capability.

Jacob: Have you done any preliminary thinking about the possible accelerators once we have these capabilities?

David: I think it depends on people. It’s about how we co-design interactions with the model and how we use these models. This will be a matter of societal acceptance. For example, imagine you have a model that comes out tomorrow and says, "I have invented a brand new way of doing things that everyone should use." Humans need to reconcile with it and decide whether this is truly a better solution, and that won’t happen as quickly as we might imagine.

Jacob: As you said, even if the lab is the first place to develop these models, there may be an opportunity for startups to truly bridge the gap between the capabilities of these models and what end users actually want to interact with.

David: I am pretty sure that’s what will happen. Because at the end of the day, I still firmly believe that in a world with AGI, human relationships really matter. Ultimately, understanding and having customers, and being closer to them to understand their needs, will be more important than just controlling this tool that many other labs have.

Jacob: How do you think humans will use computers in the next decade? All these models have reached your definition of AGI. Will I still be sitting in front of a computer? What is your vision for how humans will interact with these technologies in the future?

David: I think we will gain a new toolbox for interacting with computers. Today, there are still people using command lines, right? Just like people still use graphical user interfaces (GUIs). In the future, people will still use voice interfaces. But I think people will also use more ambient computing. Moreover, I think one metric we should focus on is the leverage gained per unit of energy when humans interact with computers. I believe that as these systems develop, this metric will continue to increase.

Jacob: Perhaps you could talk a bit about this future world of models and whether we will eventually have models for specific domains.

David: Let’s look at a hypothetical legal expert model. You might want this hypothetical legal expert to know some basic facts about the world.

Jacob: Many people will read a general degree before going to law school.

David: Exactly. So I think there will be some domain-specific models, but I don’t want to obscure the point; I’m just saying there will be some domain-specific models. I believe there will be domain-specific models for technical reasons, but there will also be policy reasons.

Jacob: That’s interesting; what does that mean?

David: It’s like some companies really don’t want their data to be mixed together. For example, imagine you are a large bank; you have sales and trading departments, and you have an investment banking department. AI employees or LLMs support these departments, just like today these employees cannot share information, the models should not be able to share information through their weights either.

Jacob: What other issues do you think need to be addressed? In terms of models, it seems you are confident that if we just expand our current computing capabilities, we can get very close to solving the problems we need to solve. But are there other significant technical challenges that need to be overcome to continue expanding the intelligence of the models?

David: In fact, I do not agree with the view that simply migrating existing technology directly to a computing cluster two years from now will miraculously work. While scale will still be a key factor, my confidence comes from assessing the current core open questions—we need to evaluate the difficulty of solving these problems. For example, are there super difficult problems that must be tackled through disruptive innovation? Like completely replacing the gradient descent algorithm (note: gradient descent, the core algorithm for optimizing parameters in current deep learning models, iteratively updates parameters by calculating the negative gradient direction of the loss function), or must we rely on quantum computers to achieve AGI? But I don’t think these are necessarily the technological paths.

Jacob: When new models come out, how do you evaluate them? Do you have some fixed questions to test, or how do you judge the quality of these new models?

David: My evaluation methodology is based on two core principles: Methodological Simplicity: This is one of the most fascinating characteristics of the deep learning field—when a piece of research comes with a methodology paper (which has become increasingly rare today), you can simply look at its implementation path and find a solution that is simpler and more effective than traditional approaches. Such breakthroughs often make it into the deep learning canon and bring about those "this really showcases the beauty of the algorithm" moments.

Benchmark Misalignment: The current hype in the field has led to a disconnect between many benchmark tests and the actual needs of the models, yet they are overemphasized in the R&D process. These tests are essentially a game. The complexity of evaluation and measurement is severely underestimated—compared to many current research directions, they should receive more academic recognition and resource investment.

Differentiated Technical Accumulation is Actually Rare

Jacob: It seems everyone has their own internal benchmarks that they don’t publicly release, like things they believe in more. For example, you can see that OpenAI's models perform better in many programming benchmark tests, but everyone uses Anthropic's models because they know those models are better. It’s interesting to see the evolution of this field. I want to hear about your recent experiences at Amazon; how do you view Amazon's role in the broader ecosystem?

David: Yes, Amazon is a very interesting place. In fact, I learned a lot there. Amazon is very serious about building general intelligent systems, especially general intelligent Agents. What’s really cool is that I think everyone at Amazon understands that computation itself is shifting from the basic elements we know to calls to large models or large agents, which may be the most important basic element of computation in the future. So people are very concerned about this, which is great.

I think it’s interesting that I am responsible for Amazon's Agent business, and it’s cool to see how broad the reach of agents is in a large company like Amazon. Peter (phonetic) and I opened a new research lab for Amazon in San Francisco, largely because many people at the top of Amazon really believe we need to make new research breakthroughs to address the major issues we discussed on the path to AGI.

Jacob: Are you keeping an eye on any of these alternative architectures or more cutting-edge research areas?

David: Let me think. I always pay attention to things that might help us better map model learning to computation. Can we use more computation more effectively? This provides a huge multiplier effect on what we can do. But I actually spend more time focusing on data centers and chips because I find that very interesting. There are some interesting moves happening now.

Jacob: It seems one of the main factors driving model development is data labeling, and clearly, all labs are spending a lot of money on this. Is this still relevant in the testing paradigm of computation? How do you view this issue?

David: The first two tasks I can think of that need to be solved in data labeling are first, teaching the model the foundational knowledge to complete a task by cloning human behavior. If you have high-quality data, then you can better stimulate the model with what it has already seen during pre-training. Then I think the second task is teaching the model what is good and what is bad, for those ambiguous tasks. I think both of these are still very important. …

Jacob: You have clearly been at the forefront of this field for the past decade. Is there one thing that you have changed your mind about in the past year?

David: I have been thinking about building team culture. I think we have always known this, but I have become more convinced that hiring truly smart, energetic, intrinsically motivated people, especially early in their careers, is actually a key engine of our success. In this field, the best strategies change every few years. So if people become too accustomed to the previous best strategies, they will actually slow you down. So I think, compared to my previous thoughts, betting on newcomers will be better.

Another thing I have changed my mind about is that I used to think that building AI would actually have real long-term technological differentiation that you could continuously accumulate on. I once thought that if you did well in text modeling, it should naturally help you become a winner in the multimodal space. If you did well in multimodal, you should become a winner in reasoning and agent domains… these advantages should accumulate continuously. But in practice, I see very little accumulation. I think everyone is trying similar ideas.

Jacob: Implicitly, just because you broke through A first doesn’t mean you will have an advantage in B. For example, OpenAI made breakthroughs in language models, but that doesn’t necessarily mean they will make breakthroughs in reasoning.

David: They are related, but that doesn’t mean you will necessarily win the next opportunity.

When Will Robots Enter Homes

Jacob: I want to ask, you initially entered AI through the robotics field. So, what is your view on the current state of AI robotics today?

David: Similar to my view on Digital Agents, I think we already have many raw materials. Moreover, I think it’s interesting that Digital Agents provide us with an opportunity to solve some tricky problems before physical Agents.

Jacob: Expand on how the reliability of digital Agents can carry over to physical Agents?

David: To give a simple example, suppose you have a warehouse that needs to be rearranged, and you have a physical agent that you need to calculate the best plan for rearranging the warehouse. If you are learning in the physical world or even in a robotic simulation environment, this can be quite challenging. But if you have already done this in the digital space, and you have all the training recipes and tuning algorithms to learn from simulated data, it’s like you have already completed this task in the training rounds.

Jacob: That’s interesting. I think there are two extremes when people think about robots. Some believe that the scaling laws we find in language models will also be found in the robotics field, and we are on the verge of a huge change. You often hear Jensen (NVIDIA founder Jensen Huang) talk about this. Then there are others who think it’s like self-driving cars in 1995, a great demonstration, but it will take a long time to really work. Where do you fall on this spectrum?

David: I go back to what I mentioned earlier, what gives me the most confidence is our ability to build training recipes that allow us to complete tasks 100%. We can do this in the digital space. While there are challenges, it will ultimately transfer to the physical space.

Jacob: When will we have robots in our homes?

David: I think this actually goes back to the issue I mentioned earlier. I believe the bottleneck for many problems is not in building models (modeling), but in the diffusion of modeling.

Jacob: What about video models? Clearly, many people are entering this field now, and it seems to be a new frontier that involves understanding world models and physics for more open exploration. Perhaps you could talk about what you see in this area and your views on it.

David: I am very excited about this. I think it addresses a major issue we mentioned earlier, which is that today we can get reinforcement learning to work on problems with verifiers (Verifiers), like theorem proving.

Then we discussed how to generalize this to the Digital Agents domain, where you don’t have a verifier, but you might have a reliable simulator because I can launch a staging environment for an application and teach the agent how to use it. But I think the remaining major question is what happens when there is no clear verifier or clear simulator? I believe world modeling is our way to answer this question.

The Organizational Growth Path of OpenAI

Jacob: That’s great. I want to shift gears a bit and talk about OpenAI and your time there. Clearly, you were involved during a very special period for the company and played a similar role in many advancements. I think in the future we will see a lot of analyses about OpenAI's culture, about what was special about the era that developed GPT-1 to GPT-4. What do you think those analyses will say? What made this organization so successful?

David: When I joined OpenAI, the research community was still very small. It was 2017, just over a year after OpenAI was founded. I knew the founding team and some early employees who were looking for someone who could blur the lines between research and engineering, and I happened to fit that need.

So joining OpenAI was a very fortunate thing. At that time, the team had only 35 people, but they were all extremely talented individuals who had done a lot of work in supercomputing, and there were many others I could name. They were all outstanding members of the team at that time.

Interestingly, my initial job was to help OpenAI build scalable infrastructure to expand from a small team to a larger scale. But quickly, my work began to shift towards how to define a differentiated research strategy that would allow us to make the right judgments for machine learning during this period. I think we realized earlier than others that the previous research model—writing a world-changing paper with your three best friends—was over. What we really needed to think about was this new era, where we tried to solve significant scientific goals with larger teams, combining researchers and engineers, regardless of whether the solution was defined as "novel" by academia. We were willing to take responsibility for that. When GPT-2 was first released, people said it looked like a Transformer, and we said, "Yes, it’s a Transformer." And we were proud of that.

Jacob: So what considerations led you to join OpenAI at that time?

David: I was very excited because I wanted to be at the forefront of research. At that time, the choices were OpenAI, DeepMind, or Google Brain. … As I mentioned earlier, betting on truly intrinsically motivated people, especially those early in their careers, is a very successful strategy, and many others who defined a field at that time actually didn’t have PhDs or ten years of work experience.

Jacob: Did you find any common traits among these outstanding researchers? What made them so exceptional? What did you learn about how to combine them into teams to achieve goals?

David: It largely comes down to intrinsic motivation and intellectual flexibility. **There was one person who was very excited and invested in the research he was doing in our team—I won’t mention his name. About a month and a half later, I had a one-on-one conversation with him, and he suddenly mentioned that he had moved to the Bay Area to join us but hadn’t had time to set up Wi-Fi or electricity in his apartment; he spent all his time in the office doing experiments, and *it was completely unimportant to him.*

Jacob: That kind of enthusiasm is truly impressive. I’ve heard you mention before that Google didn’t make progress on the GPT breakthrough, even though the Transformer was invented at Google. It was clear at the time how much potential this technology had, but as a whole, Google struggled to coalesce around it. What are your thoughts on that?

David: That credit goes to Ilya, who was our scientific leader in foundational research and later facilitated the birth of GPT, CLIP, and DALL·E. I remember he would often come to the office, like a missionary, telling people, "Guys, I think this paper is important." He encouraged people to experiment with Transformers.

Jacob: Do you think that now these foundational model companies are doing a lot of things, will there be another "recipe" that emerges at some point in the future?

David: I think losing focus is very dangerous.

Jacob: You might be one of NVIDIA and Jensen (Huang) Huang's biggest fans. Aside from the well-known achievements, what do you think are some things about NVIDIA that haven’t been widely discussed but are actually very important for the company?

David: I really like Jensen; he is a true legend. I think he has made many correct decisions over a long period, and the past few years have indeed been a huge turning point for NVIDIA as they internalized interconnect technology and chose to build their business around systems, which was a very wise move.

Jacob: We usually do a quick-fire round at the end of interviews. Do you think the progress of models this year will be more, less, or the same as last year?

David: On the surface, the progress may seem about the same, but in reality, it is more.

Jacob: What do you think are currently overhyped or underestimated aspects in the AI field?

David: What is overhyped is "skills are dead, we are completely finished, don’t buy chips anymore." What is underestimated is how we can really solve the problem of ultra-large-scale simulation so that these models can learn from it.

Jacob: David, this has been a fascinating conversation. I believe everyone will want to learn more about your work at Amazon and some exciting things you are doing. Where can people find more information?

David: For Amazon, you can follow the Amazon SF AI Lab. I actually don’t use Twitter very often, but I plan to start using it again. So you can follow my Twitter account @jluan.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

OpenAI early employee David Luan's latest interview: DeepSeek has not changed the narrative of AI technology.

Limitations of Large Models and the Value of Reinforcement Learning

How to Build Reliable Agents

Differentiated Technical Accumulation is Actually Rare

When Will Robots Enter Homes

The Organizational Growth Path of OpenAI

Selected Articles by 深潮TechFlow

Table of Contents

Related Articles