a16z: The "Pixar" of the new era, how will AI integrate movies and games?

In the next century, Pixar will not be born through traditional movies or animations, but through interactive videos.

Translation: DeepTechFlow

In the past century, technological changes have given birth to many of our favorite stories. For example, in the 1930s, Disney invented the multiplane camera and produced the first sound-synchronized full-color animation. This technological breakthrough led to the creation of the groundbreaking animated film "Snow White and the Seven Dwarfs."

In the 1940s, Marvel and DC Comics rose to prominence, known as the "Golden Age of Comics," thanks to the widespread use of the four-color printing press and offset printing technology, enabling comics to be printed on a large scale. The limitations of this technology—low resolution, limited color range, and halftone printing on cheap newsprint—formed the iconic "pulp" appearance that we still recognize today.

Similarly, in the 1980s, Pixar was in a unique position to utilize new technological platforms—computers and 3D graphics. Co-founder Edwin Catmull was an early researcher at the NYIT Computer Graphics Lab and Lucasfilm, pioneering the basic concepts of CGI, and later released the first fully computer-generated animated feature film "Toy Story." Pixar's graphics rendering suite, Renderman, has been used in over 500 films to date.

In each wave of technological change, early prototypes as novelties gradually evolved into deeply narrative new formats, led by generation after generation of new creators. Today, we believe the next Pixar is about to be born. Generative AI is driving a fundamental transformation in creative storytelling, enabling a new generation of human creators to tell stories in entirely new ways.

Specifically, we believe that the next century's Pixar will not be born through traditional movies or animations, but through interactive videos. This new narrative format will blur the boundaries between video games and TV/movies, combining deep storytelling with audience interactivity and "gaming," opening up a huge new market.

Gaming: The Frontier of Modern Narratives

Today, two major trends are emerging that may accelerate the formation of a new generation of narrative companies:

The shift of consumers towards interactive media (rather than linear/passive media, i.e., TV/movies)
Technological advancements driven by generative AI

Over the past 30 years, we have seen a deepening shift in consumer behavior, with games and interactive media becoming increasingly popular with each generation. For Generation Z and younger, games are now their preferred way to spend leisure time, surpassing TV and movies. In 2019, Netflix CEO Reed Hastings wrote in a letter to shareholders, "We compete (and lose) more with Fortnite than we do with HBO." For most families, the question is "What are we playing?" rather than "What are we watching?"

While TV, movies, and books still tell compelling stories, many of the most innovative and successful new stories are now being told in games. For example, "Harry Potter." The open-world role-playing game "Hogwarts Legacy" allows players to experience an unprecedented level of immersion as a new student at Hogwarts. This game was a bestseller in 2023, generating revenue of over 1 billion dollars at launch, surpassing the box office of all "Harry Potter" movies except the last one, "Harry Potter and the Deathly Hallows – Part 2" (1.03 billion dollars).

Intellectual properties (IP) from games have recently achieved great success in TV and movie adaptations. Naughty Dog's "The Last of Us" became the highest-rated series on HBO Max in 2023, with an average of 32 million viewers per episode. The "Super Mario Bros." movie had the biggest opening weekend for an animated film, grossing 1.4 billion dollars. Additionally, there are critically acclaimed series such as "Fallout," Paramount's "Halo" series, the "Tomb Raider" movie starring Tom Holland, and Michael Bay's "Skibidi Toilet" movie—among many others.

One key reason for the immense power of interactive media is that active engagement helps build an intimate connection to the story or universe. An hour of attention playing a game far surpasses an hour of passive TV watching. Many games also have a social aspect, incorporating multiplayer mechanisms into their core design. The most memorable stories are often those we create and share with loved ones.

Audiences continue to interact with intellectual property across multiple media (watching, gaming, creating, sharing), making stories not just entertainment, but also a part of personal identity. The magical transformation occurs when a person evolves from a simple "Harry Potter viewer" to a "loyal Potter fan," the latter being more enduring, building identity and a community around what was once a solitary activity.

Overall, while our greatest stories in history have been told in linear media, looking ahead, games and interactive media will become the storytelling ground for future stories—thus, we believe the most important narrative companies of the next century will emerge here.

Interactive Video: The Fusion of Narrative and Gaming

Given the dominance of games in culture, we believe the next Pixar will emerge through a media format that combines narrative and gaming. One highly potential format we see is interactive video.

First, what is interactive video, and how does it differ from video games? In video games, developers preload a set of assets into the game engine. For example, in "Super Mario Bros.," artists design the Mario character, trees, and backgrounds. Programmers set Mario to jump 50 pixels when the player presses the "A" button. The jump frames are rendered using a traditional graphics pipeline. This results in a highly deterministic and computational game architecture, with developers in full control.

Interactive video, on the other hand, relies entirely on neural networks to generate frames in real-time. Apart from creative prompts (which can be text or representative images), no resources need to be uploaded or created. Real-time AI image models receive player input (e.g., pressing the "up" button) and probabilistically speculate the next generated game frame.

The potential of interactive video lies in merging the accessibility of TV and movies with narrative depth, while combining the dynamics and player-driven systems of video games. Everyone knows how to watch TV and follow a linear story. By adding real-time generated video based on player input, we can create personalized and infinite gaming experiences—potentially attracting fans for thousands of hours. Blizzard's "World of Warcraft" has been around for over 20 years and still retains approximately 7 million subscription users today.

Interactive video also offers multiple consumption modes—audiences can easily enjoy the content like watching a TV show, or actively play on mobile devices or controllers at other times. Allowing fans to experience their favorite intellectual property universe in as many ways as possible is at the core of transmedia storytelling, which helps enhance intimacy with the intellectual property.

In the past decade, many storytellers have attempted to realize the vision of interactive video. An early breakthrough was Telltale's "The Walking Dead"—an interactive experience based on Robert Kirkman's comic series, where players watch the unfolding of animated scenes and make choices through dialogue and quick reaction events at critical moments. These choices—such as deciding which character to save in a zombie attack—created personalized story variants, making each gaming experience unique. "The Walking Dead" was released in 2012 and achieved tremendous success—winning multiple game of the year awards and selling over 28 million copies to date.

In 2017, Netflix also entered the interactive video space—starting with the animated work "Puss in Book" and eventually releasing the critically acclaimed "Black Mirror: Bandersnatch," a live-action movie where the audience makes choices for a young programmer adapting a fantasy book into a video game. Bandersnatch became a holiday phenomenon, attracting a fervent fan base who created a flowchart to document every possible ending of the movie.

However, despite receiving positive reviews, Bandersnatch and "The Walking Dead" both faced survival challenges—the time and cost of manually creating countless branching stories defining this format were too expensive. As Telltale expanded to multiple projects, they fostered a culture of crunch among developers, with complaints of "fatigue and burnout." The narrative quality was affected—while "The Walking Dead" initially had a Metacritic score of 89, four years later, when Telltale released one of their biggest IPs, "Batman," it only received a disappointing 64 rating. In 2018, Telltale announced bankruptcy, failing to establish a sustainable business model.

For "Bandersnatch," the production team shot 250 video clips, including 5+ hours of footage, to explain the movie's 5 endings. The budget and production time were reported to be twice that of a standard "Black Mirror" episode, with the showrunners stating that the complexity of the project was equivalent to "making 4 episodes at once." Ultimately, in 2024, Netflix decided to shut down the entire interactive special programs department and focus on traditional games instead.

Until now, the content cost of interactive video projects is linearly related to the game time—there is no way around this issue. However, advancements in generative AI models may be the key to scaling interactive video.

Generative models will soon be fast enough to support interactive video

Recent advancements in image generation model distillation are remarkable. In 2023, the release of Latent Consistency Model and SDXL Turbo significantly improved the speed and efficiency of image generation, enabling high-resolution rendering in a single step, which previously required 20-30 steps, and reducing costs by over 30 times. The idea of generating videos—a series of consistent images with inter-frame variations—suddenly became highly feasible.

Earlier this year, OpenAI garnered widespread attention by announcing the launch of Sora, a text-to-video model capable of generating up to 1-minute videos while ensuring visual consistency. Shortly after, Luma AI released a faster video model called Dream Machine, capable of generating 120 frames (about 5 seconds of video) in 120 seconds. Luma recently shared that they attracted an astonishing 10 million users in just 7 weeks. Last month, Hedra Labs released Character-1, a character-centric multimodal video model capable of generating 60 seconds of video in 90 seconds, showcasing rich human emotions and voiceovers. Runway also recently launched Gen-3 Turbo, a model capable of rendering a 10-second clip in just 15 seconds.

Today, an ambitious filmmaker can quickly generate several minutes of 720p high-definition video content from text prompts or reference images and pair them with starting or ending keyframes to add specificity. Runway has also developed a suite of editing tools that provide finer control over the generated video, including intra-frame camera control, frame interpolation, and motion brushes. Luma and Hedra will also be releasing their own creator toolkits soon.

While the production workflow is still in its early stages, we have already encountered several content creators who are using these tools to tell stories. Resemblance AI created Nexus 1945, a captivating 3-minute alternate history story set in World War II, produced by Luma, Midjourney, and Eleven Labs. Independent filmmaker Uncanny Harry collaborated with Hedra to create a cyberpunk short film, and the creators also made music videos, trailers, travel vlogs, and even a fast food burger ad. Since 2022, Runway has been hosting an annual AI film festival, selecting 10 outstanding AI-produced short films.

There are still some limitations at present—the narrative quality and control between a 2-minute segment generated from prompts and a 2-hour feature film produced by a professional team still have significant differences. It is often challenging to generate the content that creators want from prompts or images, and even experienced prompt engineers often discard most of the generated content. AI creator Abel Art reported that generating a 1-minute coherent video requires approximately 500 videos. Image consistency typically starts to break down after playing continuous videos for one or two minutes and often requires manual editing, which is why most generated videos today are limited to about 1 minute in length.

For most professional Hollywood studios, videos generated by generative models can be used for storyboarding in pre-production to visualize the look of scenes or characters, but they cannot replace on-location shooting. There is also an opportunity to use AI for audio and visual effects processing in post-production, but overall, AI creator toolkits are still in the early stages of development compared to traditional workflows that have been invested in for decades.

In the short term, one of the biggest opportunities for generated videos is in developing new media formats, such as interactive videos and short films. Interactive videos have been segmented into short 1-2 minute clips based on player choices and are often animated or stylized, using lower-resolution materials. More importantly, the cost of creating these short videos using generative models is more cost-effective than during the Telltale/Bandersnatch era—Abel Art estimates that a 1-minute video from Luma costs 125 USD, equivalent to the cost of renting a day's worth of film footage.

While the quality of generated videos today may be inconsistent, the popularity of vertical short videos from platforms like ReelShort and DramaBox has proven the demand for low-production-value episodic short-form television from audiences. Despite criticism of amateur cinematography and formulaic scripts, ReelShort has driven over 30 million downloads and monthly revenue exceeding 10 million USD, launching thousands of mini-series such as "Forbidden Desire: Love of the Alpha".

The biggest technical barrier for interactive videos is achieving fast enough frame generation speed to generate content in real-time. Dream Machine currently generates about 1 frame per second. The minimum acceptable target for modern gaming consoles is a stable 30 FPS, with 60 FPS being the gold standard. With the help of technologies like PAB, this can be increased to 10-20 FPS for certain video types, but it is still not fast enough.

Current State: The Landscape of Interactive Videos

Considering the pace of improvements in foundational hardware and models that we are seeing, we estimate that we are about 2 years away from commercially viable fully generated interactive videos.

Today, we see progress in the research field from participants like Microsoft Research and OpenAI, working on end-to-end foundational models for interactive videos. Microsoft's model aims to generate a fully "playable world" in 3D environments. OpenAI demonstrated a showcase of Sora, a model capable of "zero-shot" Minecraft simulation: "Sora can control player actions in Minecraft, render the world and its dynamics with high fidelity."

In February 2024, Google DeepMind released its own end-to-end foundational model for interactive videos, Genie. What sets Genie apart is its latent action model, which infers the latent actions between a pair of video frames. Trained on 300,000 hours of platform videos, Genie learned to recognize character actions, such as how to overcome obstacles. This latent action model, combined with a video segmenter, is input into a dynamic model that predicts the next frame, thus constructing an interactive video.

On the application front, we have seen some teams exploring new types of interactive video experiences. Many companies are focused on creating generative films or TV shows, designing and developing around the limitations of current models. We also see some teams integrating video elements into AI-native game engines.

Ilumine's Latens is developing an "awake dream simulator" where content is generated in real-time as users walk through a dream. This slight delay helps create a surreal experience. Developers from the open-source community Deforum are creating real-world installations for immersive interactive videos. Dynamic is developing a simulation engine where users can control robots in first-person perspective using fully generated videos.

In the TV and film industry, Fable Studio is developing Showrunner, an AI streaming service that allows fans to adapt their own versions of popular shows. Fable's proof of concept project, "South Park AI," premiered last summer and garnered 8 million views. Solo Twin and Uncanny Harry are two cutting-edge AI film production studios. Alterverse created a D&D-inspired interactive video role-playing game where the community decides what happens next. Late Night Labs is a new top-tier film production company integrating AI into the creative process. Odyssey is developing a visually-driven narrative platform powered by 4 generative models.

As the boundaries between film and gaming blur, we will see AI-native game engines and tools that empower creators with more control. Series AI has developed the Rho Engine, an end-to-end platform for AI game development, collaborating with major IP holders to create original works. We also see AI creation kits from Rosebud AI, Astrocade, and Videogame AI that enable novice programmers or artists to quickly get started in creating interactive experiences.

These new AI creation kits will create market opportunities for storytelling, allowing a new class of citizen creators to bring their imagination to life using prompt engineering, visual sketches, and speech recognition.

Who Will Build the Interactive Pixar?

Pixar was able to create a groundbreaking company through the foundational technological shift of computers and 3D graphics. Today, a similar wave is happening in the field of generative AI. However, it's important to remember that Pixar's success is largely attributed to the classic animated films created by a world-class storytelling team led by John Lasseter, such as "Toy Story." Human creativity combined with new technology created the best stories.

Similarly, we believe the next Pixar will need to be a world-class interactive storytelling studio as well as a top-tier tech company. With rapid advancements in AI research, creative teams need to closely collaborate with AI teams, merging narrative and game design with technological innovation. Pixar has a unique blend of art and technology and has a partnership with Disney. The opportunity today lies in a new team being able to integrate the disciplines of gaming, film, and AI.

It's important to note that this will be a massive challenge, not just limited to technology. This team will need to explore new ways for human storytellers to collaborate with AI tools to enhance, not diminish, their imagination. Additionally, there are many legal and ethical barriers that need to be addressed—ownership and copyright protection of AI-generated creative works remain unclear unless creators can prove ownership of all data used to train the models. Compensation for the original writers, artists, and producers behind the training data also needs to be addressed.

However, it's clear today that there is a strong demand for new interactive experiences. In the long run, the next Pixar can not only create interactive stories but also build complete virtual worlds. We have previously discussed the potential of endless games—dynamic worlds that blend real-time level generation, personalized narratives, and intelligent agents—similar to the concept of HBO's "Westworld." Interactive videos solve one of the biggest challenges in making "Westworld" a reality—rapidly generating a large amount of personalized, high-quality interactive content.

One day, with the help of AI, we may be able to kickstart the creative process by building a story world—a fully formed intellectual property world containing characters, storylines, visuals, etc.—and then generate various media products we want to offer to audiences or specific scenarios. This will be the ultimate development of cross-media storytelling, blurring the boundaries of traditional media formats.

Pixar, Disney, and Marvel have all been able to create unforgettable worlds that have become core parts of fan identities. The opportunity for the next interactive Pixar lies in using generative AI to achieve the same goal—creating new story worlds, blurring the boundaries of traditional narrative formats, and creating unprecedented worlds.

免责声明：本文章仅代表作者个人观点，不代表本平台的立场和观点。本文章仅供信息分享，不构成对任何人的任何投资建议。用户与作者之间的任何争议，与本平台无关。如网页中刊载的文章或图片涉及侵权，请提供相关的权利证明和身份证明发送邮件到support@aicoin.com，本平台相关工作人员将会进行核查。

a16z: The "Pixar" of the new era, how will AI integrate movies and games?

Gaming: The Frontier of Modern Narratives

Interactive Video: The Fusion of Narrative and Gaming

Generative models will soon be fast enough to support interactive video

Current State: The Landscape of Interactive Videos

Who Will Build the Interactive Pixar?

Selected Articles by 深潮TechFlow

Table of Contents

Related Articles