Born in 1997, sharing the stage with Jensen Huang: He wants to change the gaming industry in a different way

I met Song Yachen at his office at VAST around noon on January 10. He had just finished a morning meeting and was scheduled to meet with investors that afternoon. The day before, MiniMax—a company he co-founded—had gone public, with its market capitalization surpassing HK$100 billion on its first day of trading. Yet, just as usual, he had been playing video games until 3 a.m.

Founded in early 2023, VAST has now become a leading player in the 3D generation field—with over 6.5 million users and annual recurring revenue (ARR) exceeding $12 million as of last August. The team comprises dozens of PhDs and scientists specializing in the intersection of AI and computer graphics, and has published more than 60 research papers.

Just last month, they released Tripo Studio 1.0, and a few days ago they announced a game competition for independent developers.

There is a striking contrast about Song Yachen.

Born in 1997, he earned his bachelor’s degree in international relations and economics from Johns Hopkins University. During his junior year, he spent a semester studying in Israel, where a professor told him, “Embrace the complexity of the world.”

Although he does not have a technical background, he has dedicated himself to the cutting-edge field of AI. In 2019, he joined SenseTime, where he worked on strategy in the CEO’s office, researching how AI could be integrated into and applied to animation and gaming. In 2021, he participated in the founding of MiniMax as Employee No. 001 and an early co-founder. However, at the end of 2022—just as GPT-3.5 was released and large language models were at the peak of their popularity—Song Yachen made a counterintuitive decision: he left MiniMax to go all-in on 3D.

At the time, Google released Dreamfusion, making 3D generation a reality. His assessment was that the progression from 3D to video to images to text represents a process of continuous compression of information; 3D is the source code—more fundamental than other media—and worth going all in on.

In early 2023, he founded VAST. That same year, he became the first Chinese person in SIGGRAPH’s 50-year history to deliver a keynote address, sharing the stage with NVIDIA’s Jensen Huang.

Born in 1997, sharing the stage with Jensen Huang: He wants to change the gaming industry in a different way

The first Tripo Game Jam kicked off on January 8

Beyond his reputation as an entrepreneur, Song Yachen is also an avid gamer: he loves strategy and role-playing games, and once served as an alliance leader in *Rateshu Zhibin*; he ranked eighth in the world among players his age in chess; and every week or two, he gets together with friends for a game of D&D (Dungeons & Dragons)…

In high school, he would go to bed at 11 p.m. every night and get up at 3 a.m. to do his homework. After school, he would first play chess for three or four hours, then play video games for another three or four hours. In college, because he liked to sit on his bed and play video games, he wore a dent into his Sealy mattress. Now that he’s started his own business, his schedule remains the same: he wakes up around 7 or 8 a.m., works until 9 or 10 p.m., and then plays video games until 3 or 4 a.m.

It was precisely his experience as a gamer that allowed him to identify the pain points in the gaming industry earlier than most AI entrepreneurs.

While working on "AI+Gaming" at SenseTime in 2019, he frequently attended industry events and noticed that while many game mechanics were quite interesting, the visuals were generally rough. Many developers had originally intended to create 3D games but ended up settling for 2D, opting for odd camera angles that prevented the gameplay’s full potential from being realized.

In 2020 and 2021, he participated in two consecutive Global Game Jams, taking on the role of game designer himself. He noticed a pattern: the biggest shortages in on-site teams were always for artists and engineers, while there were plenty of designers. People in the chat groups were constantly crying out for artists and engineers. Many teams didn’t have time to finish their projects, so they had to settle for a placeholder sprite just to make up the numbers.

The high costs and long development cycles associated with game production have long been a major challenge for the industry. Once at GDC, he met an indie developer whose business card bore the name of a certain studio and looked very professional. After chatting for a while, he learned that the developer had been with this “company” for ten years, juggling countless side jobs just to keep his one-man studio afloat. It took a decade to develop a single game, and during that time, he had to take on countless freelance projects just to earn the right to keep pursuing his dream.

“This is really stifling everyone’s creativity.”

Song Yachen has a more critical view of the gaming industry. As a hardcore gamer himself who has spent a considerable amount of money on games, he makes no secret of the fact that today’s business models are severely hindering the diversity of games.

“Why? For one simple reason: making games is too hard and too expensive.”

He said he’d love to eat digital burgers, digital salads, and digital chicken drumsticks—those healthy, light, non-addictive foods. But why are there so few of them on the market? Because they don’t make enough money.

In his view, this is an inevitable outcome dictated by the cost structure. The high barriers to entry mean that most products are forced to prioritize the most profitable avenues.

Given the current pace of AI development, Song Yachen predicts that within the next year or two, 3D generation technology may advance to the point where “people will have no choice but to use it.”

By then, we may see a burst of creativity.

The following is an edited transcript of a conversation between Game Teahouse and Song Yachen.

01

Creativity in the gaming industry is severely stifled

Teahouse: Why did you leave MiniMax to start your own business?

Song Yachen: In late 2022, Google released Dreamfusion, a pioneering work in 3D generation. With 3D generation now a reality, I believe large 3D models present a tremendous opportunity—we should go all in.

But back then, GPT and OpenAI were at the height of their popularity, so it was hard to convince people to go all-in on 3D. However, I believe that 3D is more fundamental and has greater long-term potential, so I decided to strike out on my own and go all-in on it.

Teahouse: When did you first get involved in the gaming industry?

Song Yachen: When I was working on “AI+Gaming” and “AI+Animation” at SenseTime, I frequently attended events in the gaming industry. I remember attending a networking event at a teahouse in 2019, where I met many investors, major game publishers, and small-to-medium-sized development studios. Since many of these small studios were our target clients, I added them one by one on WeChat.

Sure, the big game companies are getting in on it too, but the really fun stuff still comes from the smaller studios—they come up with all sorts of innovative gameplay mechanics, rather than just buying in to please their bosses or meet some KPI like “we have to use this much AI this year.” They actually know how to put the fun stuff to good use.

Teahouse: I heard you’ve participated in a Game Jam before?

Song Yachen: Right, I participated twice—in 2020 and 2021. The first time, I teamed up with an artist from 37 Interactive Entertainment, a programmer from iFlytek, and a copywriter to create a 3D bar-tending game. It wasn’t very polished, but we had a lot of fun making it.

When I attended an in-person Game Jam, I noticed that the biggest shortage in on-site teams was artists and engineers, while there were the most designers. People in the chat were constantly shouting, “We need artists! We need engineers!” Plus, artists often didn’t have time to finish their work, so many people would just throw up a placeholder sprite or upload something they’d worked on in their spare time to make the project look a little more polished. If you don’t have ready-made assets, starting from scratch is a real pain.

Teahouse: Did you start paying attention to this issue in the art world back then?

Song Yachen: Art has always been the biggest challenge.

There’s a community in Shanghai called Random Encounters, organized by a group of foreigners from Ubisoft. They host events where people bring their demos to try out each other’s games. I’ve attended a few times and seen some pretty interesting gameplay, but the art is generally pretty rough.Many people originally intended to make 3D games, but 3D development is difficult not only in terms of art but also in the entire development process. In the end, they had to settle for 2D, choosing some odd camera angles that made it hard to showcase the combat mechanics.

Large studios have the budget to invest in art and assets, so they can tackle these tasks after the project has been approved and testing is complete—it all falls into place naturally. But for small studios and independent developers, it’s a much tougher road. I’ve seen truly dedicated developers who spend ten years working on a single game, taking on countless freelance projects along the way just to keep their studio afloat.

Teahouse: It really takes a long time to polish a game, especially when the team is small.

Song Yachen: This is really stifling everyone's creativity.

Now that the code has become simpler, many people—especially product managers and designers—are using tools like Cursor. They’re already able to create quite a few 2D designs, which is definitely a step forward.

But 3D is a different story. On the one hand, the technology isn’t quite mature yet—especially when it comes to building large-scale scenes, where there’s still room for improvement. On the other hand, making the leap from 2D to 3D is a major hurdle in itself—the technical framework, architecture, and the entire engine have to change, which drives costs up significantly.

But we also know that 2D games inevitably fall short in terms of gameplay experience and immersion. It’s not that many indie developers don’t want to create 3D games; rather, the difficulty and cost of art production and programming are simply too high. Left with no other choice, they have to settle for 2D as a second-best option.

02

When the cost of creation approaches zero, new things emerge

Teahouse: How will the gaming industry change as 3D large-scale model technology and code generation mature?

Song Yachen: If we focus solely on the game itself, we might miss the real changes. What we’re actually seeing is the emergence of something new.

Video games are the ninth art form. Like films, literary masterpieces, and murals, they require a large team, substantial funding, and a significant amount of time to complete. It’s a bit like Michelangelo leading a team of dozens of people in painting the dome of the Sistine Chapel—a project that took several years to complete. Game development is essentially the same.

These art forms share one common characteristic: they are very costly. This is a fundamental point. Because the costs are high, the revenue requirements are also high. And when you have high commercial expectations, you tend to treat each individual work as a standalone product capable of generating a complete commercial cycle.

These days, every game is a product—whether it’s *Honor of Kings*, *PlayerUnknown’s Battlegrounds*, *Genshin Impact*, *Match 3*, or *King of Xianyu*—they’re all end-to-end, self-contained products. The same goes for movies and novels; isn’t every one of them meant to make money? Disney’s movies are meant to turn a profit, and the Medici family had to pay for Michelangelo’s paintings.

Teahouse: What exactly do you mean by “new things”?

Song Yachen: The novelty lies in this: when the cost of creating content approaches zero, I don’t necessarily have to rely on it to make money. If I don’t rely on it to make money, it can be a way to share my feelings, show off my ideas, convey information, vent my frustrations… it can be anything—it doesn’t have to be packaged into a complete product.

For example, the ad creatives used in many game marketing campaigns are actually quite fun, but the interactive elements in these creatives are typically hidden as secondary gameplay features within heavily monetized games. Why are they limited to being secondary features? There’s just one reason: they don’t generate revenue from start to finish. So they can’t be classified as “games” as we currently define them. But are they games? I think they’re incredibly game-like.

Today’s gaming business models are a major obstacle to diversity in the industry. Why? There’s just one reason: making games is too difficult and too expensive. You have to make money: first-time purchases, repeat purchases, cumulative spending, all kinds of holiday events—they’ll cram in as many monetization tactics as they can, leaving your wallet completely drained. I’m a hardcore gamer myself, and I’ve spent a lot of money on these games.

Teahouse: The barriers to entry are too high, so most products have no choice but to move toward more commercially viable approaches.

Song Yachen: Text, images, and video have all already cleared this hurdle. When the cost of creation drops to near zero, new things emerge—if we still had to carve characters onto turtle shells, how could platforms like Weibo, Zhihu, and Xiaohongshu have come into existence? If shooting videos still required bulky cameras and film, Douyin would never have emerged.

Nowadays, when people think of video content, their first thoughts go to TikTok, Kuaishou, and Bilibili—not movies. Movies account for only a tiny fraction of the entire video industry, and the future of gaming may well follow suit: among all interactive content, “games” in the traditional sense will make up only a small portion. When people think of interactive content, their first thought is no longer games, but rather these new forms of media.

I don’t know what this new thing is called, but it’s definitely not a “mini-game” platform. Someone asked me, “Do you want to build a mini-game platform?” I said no. The reason is simple: Zhang Yiming created TikTok—would you call that a “mini-video platform”?

Teahouses: When will this new trend take off?

Song Yachen: It’s still in the early stages. Before smartphone cameras came along, TikTok couldn’t have existed—there weren’t even any cameras, so how could it have been created?

So the metaverse bubble burst. Back then, many people who entered the metaverse would say one thing: there was nothing to experience.The metaverse is vast, but if your house is empty, what’s there to do? When you jump into GTA, there are all kinds of stories—tattooed gangsters, neon-lit streets, and every billboard is unique. It’s only when the details are that refined that there’s so much content for you to consume. Of course, that cost a fortune—it was all handcrafted.

But what if something like GTA had zero barriers to entry, zero cost, and was procedurally generated? I could make one today—or even several. It’s pretty fun, but people probably wouldn’t spend much time on it—maybe two minutes, a minute, or even 20 seconds. That’s enough; then you can go create the next world or consume something else.

This new concept—with its heavy emphasis on interaction and light on user experience—is definitely a bit premature right now. Those of us working in AI might need to wait a little longer.

03

3D is the source code of information; it is more fundamental than other media.

Teahouse: Do you think 3D is more fundamental than images and videos?

Song Yachen: The common perception is that as we move from text to images to video to 3D, information density continues to increase and the user experience gradually evolves to a higher dimension. This perception is incorrect.

Actually, the opposite is true: moving from 3D to video to images to text is a process in which information is constantly compressed and lost.

Over billions of years of cosmic evolution, there has been only 3D information. Over billions of years of biological evolution, there has been only 3D information. For hundreds of thousands of years of human history, right up until three to five thousand years ago, there was also only 3D information. At Liangzhu, Hongshan, and Sanxingdui, the artifacts unearthed were all masks, jewelry, and totems—3D objects—rather than text, images, or video.

Why? Because text, images, and videos are all compression formats invented relatively recently by humans. Three to five thousand years ago, people discovered they could record information on tortoise shells and bamboo slips, but because these media were limited in capacity, they had to be compressed—and that’s how text came about. The same goes for video: it was invented just over 100 years ago and is yet another form of compression.

That’s why 3D is the source code and source files, while text, images, and videos are all low-poly models. Why low-poly? Because GPUs and bandwidth are insufficient. 2G could only transmit text, while 4G made video possible. Once everyone has fiber-optic internet, people will return to source files because they offer the best experience.

The same principle applies to training AI. How arrogant must one be to think that general artificial intelligence should be trained using text—a compressed format? Shouldn’t it be trained using the source data itself? That’s why Fei-Fei Li and Li-Kun Yang are both working on world models and 3D large models. People are finally catching on: training must be based on the most fundamental information.

Teahouse: Why has the development of 3D large models been slower compared to other modalities?

Song Yachen: Three reasons.

First, there’s a shortage of talent. There aren’t many people in computer graphics to begin with, and even fewer at the intersection of computer graphics and AI—these fields never overlapped before, but now that they do, there are very few people capable of working in this area. It’s hard to find a 50-year-old professor who has spent their entire career researching this; it’s a completely new field. That’s why our team consists almost entirely of PhDs under the age of 30.

Second, there’s a lack of data. It’s perfectly normal these days to take photos and write posts for your social media feed, but how many people have you seen creating 3D models to post on social media? You have to find ways to scrape together data from every nook and cranny.

Third, the pipeline is too long. It doesn’t end once the geometry is generated—texturing, PBR, normals, UVs, topology, rigging, animation, VFX… there’s just so much. You’re creating a world.

Of all AI-generated content, only 3D bears a striking resemblance to code. Many people call us the “Cursor of 3D,” and they have a point. Almost all large language models operate on a “model-as-product” model—a simple interface where you type in text and get a result. Only code and 3D are different.Cursor looks like VSCode, but we look like an engine, because it’s a pipeline—it’s not just a simple input-output process. Moreover, 3D artists and programmers are very similar in terms of numbers, salaries, and work habits.

More fundamentally, building a world requires only two things: creating everything and establishing rules. We are responsible for creating everything, while code is responsible for establishing the rules. When these two elements come together, they form a world—something more fundamental than text, images, or video.

Teahouse: What is the relationship between 3D large models and world models?

Song Yachen: 3D models will evolve into world models. World models, having analyzed vast amounts of data from both virtual and real worlds, will be able to automatically simulate logic and physics.

For example, a door should be able to open. And whether it opens inward or outward depends on the shape of the rod and spring at the top—if the spring curves inward, the door opens inward; if it curves outward, the door opens outward. It understands this. It first breaks things down, identifies what’s there, then understands, and finally generates the result.

Once you have a world model, many RPGs become much easier to develop.

Teahouse: Currently, games that incorporate AI mechanics are mostly concentrated in the RPG genre—chatbots, AI-powered NPCs, and so on. But RPGs aren’t actually the largest genre.

Song Yachen: I don’t think people in the AI field and those in the gaming industry fully understand each other yet. People tend to view gameplay styles like MOBA as rule-based, much like Go, and believe they’re difficult to adapt using AI.

However, cost reduction and efficiency improvements should not be completely separated from gameplay innovation. It’s not very realistic to directly use AI to overhaul *Honor of Kings*, but if we lower the barrier to entry for creation, countless people might develop similar prototypes, and one or two of those prototypes could turn out to be game-changing innovations.

Essentially, you just need to experiment more with new capabilities—that’s how interesting things tend to emerge. But right now, the cost of experimentation is still too high. If it only took 50 yuan and two days, everyone would be doing it. But if it costs 5 million yuan and takes two years, it’s a different story.

Teahouse: What are your thoughts on the pace of AI technology development?

Song Yachen: Many of today’s technologies would undoubtedly have been considered the greatest inventions of their time if they had been introduced over the past 1,000 years.

Take video generation, for example. In the past, there were only two types of videos: filming, which involved finding a location and angle in the physical world and shooting for a period of time; and animation, which involved finding a location and angle in the virtual world and shooting for a period of time. AI-generated video has achieved something truly remarkable: it doesn’t require a physical or virtual camera, doesn’t need to find a location or angle, and doesn’t require any shooting time—it simply creates something that has never existed before. This deserves a Nobel Prize.

The same goes for 3D generation. There used to be nothing, but now a single command can generate a model—it’s as if words become reality, just like Ma Li’s magic brush. But after three years of this spectacle, people have grown numb to it.

04

3D generation technology will soon evolve to the point where it can no longer be ignored

Teahouse: What has been the trajectory of VAST’s development?

Song Yachen: At first, we wanted to create a new content platform—a "TikTok for new content"—but it didn't work out.

Why did it fail? Because there was nothing new about the "smartphone camera," and there was a lack of creative tools accessible to the general public. Since the general public couldn’t participate, you were still relying on the same artistic elite—only this time, they were doing it in their spare time. There’s a big difference between a director shooting a video out of boredom and an old man down the street filming a video.

Once we realized this, we started thinking about how we could make it accessible even to the average person on the street. So we turned to large 3D models—and eventually world models—with the goal of enabling everyone to create immersive, highly interactive, and content-light experiences.

Teahouse: What exactly did you do?

Song Yachen: We released version 0.1 of Studio—the beta version—on May 31, 2025. Revenue doubled or tripled within a month and increased five- or sixfold within a quarter. We did just one thing: we transformed the original “model-as-product” interface into an engine and a studio.

It was released in an incomplete state initially, which is why we called it 0.1. We’ve just launched Tripo Studio 1.0, and we hope to continuously iterate on this version to gradually establish a new workflow that replaces the existing, highly complex 3D production pipeline involving various plugins and engines.

It’s aimed partly at professional users, but also at what we call “Pro C” users—consumer users with a certain level of professional expertise. Many people who use this tool don’t rely on it for a living; they might not be 3D modelers, but rather concept artists creating 3D references, or perhaps advertising professionals.

Teahouse: How far are we from realizing the vision of a "3D TikTok" at this stage?

Song Yachen: We’re developing a 3D version of JinYing—or rather, an interactive version of JinYing—that allows anyone to create 3D interactive content with zero barriers to entry and at no cost, whether on a PC or a mobile device. It’s not just a tool; it also serves as a platform and a community.

We’ve been at it consistently, constantly refining our work. We’ve been joined by a group of very supportive community creators who’ve helped us through this process.

Teahouse: What is the current competitive landscape for 3D large language models? What are VAST’s core strengths?

Song Yachen: There aren’t many competitors overseas; AI 3D is currently performing quite well in the domestic market.

Our strengths lie in our advanced technology and high-quality output. First, we handle a massive volume of data—an order of magnitude greater than our competitors. Second, we have a talented team of dozens of PhDs and scientists specializing in the intersection of AI and computer graphics, who have published over 60 research papers.

Teahouse: Shouldn't there be more data from the big tech companies?

Song Yachen: Not nearly that many. How many 3D models are there in a single game? Even 10,000 would be a lot. Even major studios couldn’t possibly have developed more than 500 large-scale 3D games, right? Even if we assume 500 games, that’s only 5 million; we have at least 50 million.

Teahouse: Some industry professionals say that integrating 3D generation technology into existing workflows is a hassle. What are your thoughts on that?

Song Yachen: Different pipelines do indeed have different requirements—the layering methods, texture formats, and naming conventions are all different. But I don’t think this is the time to worry about that. If it’s easy to use, people will use it.

It’s like when smartphones first started taking 360p and 720p photos—game companies said, “It’s completely useless; it’s garbage.” When it reached 1080p, they said, “It’s usable, but the format doesn’t quite fit our internal systems.” But when it can shoot 4K and 8K—and it’s practically free—when what used to cost 100,000 yuan now costs a dime, and what used to take months now takes 10 seconds—would you still say, “My pipeline is too complex”?You wouldn’t. You’d be thinking, “How can I integrate AI to build a new pipeline? Whoever does it first will be the leader.”

This suggests that the 3D generation isn't good enough yet. If it were good enough, they would ask, "What's your interface? How can we better align our pipeline with it?"

Teahouse: When do you think 3D generation will reach a point where everyone will have to use it?

Song Yachen: I think there’s a good chance by 2026. The pace of development is incredibly fast—just look at the technological advancements from 2023 to 2025; they’re on a completely different level. There are basically monthly updates, with no bottlenecks; it’s purely a matter of time and effort.

Teahouse: What kind of games do you like? What would you like to create using 3D generation technology?

Song Yachen: When it comes to SLGs and RPGs, I like competing against other players. Games with a strong strategic focus, like *Civilization*, *Total War*, and *Victoria*, are all great.

I want to make a 3D game with a strong strategy element. With 3D graphics at the core of the gameplay, there are many interesting possibilities.

原创文章,作者:游茶妹儿,禁止转载:https://youxichaguan.com/en/archives/195289

Like (0)
游茶妹儿
PGC London 2026 Kicks Off in London as the Global Gaming Industry Explores New Pathways for AI and Cross-Media Integration
Previous 3 days ago
KRAFTON Unveils New Strategic Plan for 2026; Plans to Launch 12 New Titles, Including *Paru: The Phantom Beast Mobile*, Within Two Years
Next 3 days ago

相关推荐