“We hope that after players experience this process of AI-generated and discovered spaces, they’ll go back to playing traditional ‘point-and-click’ building games like *The Sims* and *Animal Crossing* and find them ‘pointless’ or ‘boring.’”
In retrospect, 2025 may well prove to be a milestone year in the history of “AI + gaming.”
In addition to the explosive growth in the number of related products and their gradual move toward commercialization and market launch, several relatively well-defined category paradigms have also begun to take shape.Examples include narrative-driven AI games based on text-based modes, such as *Mysterious Inference* and *Sherlock: Night Stalker*; emotional companionship games like *Sister’s Tale* and *EVE*; and sandbox-style games inspired by “Stanford Town,” such as *Aivilization* and *Maggie’s Garden*, among others.
But in the ever-changing world of AI, the emergence of new technologies continues to give rise to countless new directions.
For example, Tea House recently discovered a game called “Forest Box” on Xiaohongshu. Thanks to its core “AI home design” feature, the game has gained a massive following among users. Since the first half of this year, it has gone viral on platforms like Xiaohongshu and Douyin, consistently generating viral posts that rack up tens of thousands of likes. To date, it has amassed over a million followers across the internet.
During the first closed beta test held recently, *Forest Box* demonstrated a straightforward core gameplay mechanic: using AI to decorate a room. Players select a floor plan and furniture, describe their desired room style to the AI using natural language, and the AI generates a room that meets their specifications. Players can also upload photos to generate 3D models of their favorite furniture and place them in the room.
What makes this unique is that the room layouts generated automatically by AI feature logical zoning and furnishings that look natural, and they can be further edited—the AI not only truly “understands” the aesthetic and practical standards of a real-world room, but also fully enables the free creation of rooms and the personalized expression of one’s personality.
Last month, I spoke with Li Yongyi, founding partner and COO of Habitat Technology—the company behind “Senhe.” He told me that this is the power of “spatial intelligence.”
Unlike familiar modalities such as text and video, “Senhe” uses AI to directly generate a spatial environment, training the AI to develop spatial awareness. Even within the field of AI, “spatial modalities” remain a brand-new interdisciplinary field where definitions and approaches are still being explored. Combined with the enthusiastic feedback from social media users, this indicates that Shengjing Technology is exploring an unprecedented and highly promising direction in the realm of “gaming + AI.”
Founded in 2023, this Shenzhen-based “tech startup” currently employs over 60 people, most of whom are graduates of Tsinghua University and other prestigious overseas universities, as well as former employees of major tech companies like Tencent, with interdisciplinary backgrounds in AI, algorithms, and gaming.
Shengjing Technology positions itself as a technology company specializing in AI-driven spatial design. In its early stages, the company focused primarily on B2B services for leading enterprises, partnering with companies such as Panasonic (a Fortune 500 firm) and Ashley, North America’s largest home furnishings brand, to use AI-powered spatial design to help customers create comprehensive home design solutions.
But Li Yongyi told me that this is just the previous phase of Shengjing Technology. They have just completed a Pre-A and Pre-A+ funding round totaling nearly 100 million RMB, and hope to enter the B2C sector through the transformation of “Senhe.”Their goal is to create an “AI-powered spatial version of TikTok,” where spaces can be created, shared, and monetized just like videos, becoming the next-generation platform for content and commerce. During my conversation with Li Yongyi, we also discussed how this “cross-disciplinary” AI tech team breaks down barriers between disciplines and teams to find inspiration for game creation.
The following is a transcript of the conversation between Tea House and Li Yongyi (some questions and answers have been edited for readability).
01
Technical professionals branching out into game development,
Expanding the “Spatial Dimensions” of Game AI
Q: Let’s start with a brief introduction.
Li: I am the founder and COO of Habitat Technology, and the project lead for the game *Senhe*. Habitat Technology is a global leader in the field of AI-driven spatial generation.My vision is to create a “3D spatial version of TikTok,” dedicated to “making spaces as easily created, shared, and monetized as videos.” We have recently secured nearly 100 million yuan in funding. Our team currently consists of about 60 members, most of whom are interdisciplinary experts and tech enthusiasts in algorithms and gaming from Tsinghua University, Tencent, and top international universities.
Q: Compared to a “game company,” does Habitat Technology focus more on AI?
Li: Whether it’s an AI company or a gaming company isn’t really that important; what matters is the unique value we create. In the past, we believed that general-purpose modalities like text, images, and video were dominated by major tech giants, leaving little opportunity for startups—except for 3D modalities. That’s why we chose to focus on AI spatial generation algorithms, providing tools, traffic, and data value to the home furnishings industry and the embodied intelligence sector.
We now aim to explore unique 3D interactive content and experiences based on our proprietary AI spatial generation algorithms and engine. Gaming is an ideal 3D medium, with billions of users, a trillion-dollar market, mature business models, and a strong willingness to pay—which is why we have chosen this direction.
Currently, excluding support staff, over 90% of our team is working on games: half are developing AI algorithms to provide a spatial generation engine for game development, while the other half is focused on game development, using the spatial generation engine to explore AI-native gameplay and interactive experiences. For now, we can describe ourselves as an “AI gaming” company.
Q: Could you please explain what “spatial intelligence” is? I’ve been seeing this term quite frequently since the second half of this year.
Li: Spatial intelligence gained traction in the second half of the year thanks to several key figures. The most prominent among them is Fei-Fei Li, who launched a spatial intelligence model called Marble, which recently entered open beta. This model can generate a 3D scene based on a single image—this is basic spatial generation.
“AI Godmother” Fei-Fei Li’s Marble World Generation Model
In reality, however, spatial intelligence is actually a crucial ability that enables humans to perceive the world. At its core, it is an innate human sense of space. The first level involves the perception and understanding of space—what objects are present in the environment, the spatial relationships between them, as well as their weight, friction, texture, and tactile sensations.
The second level involves decision-making based on perception. For example, if you want to get from point A to point B, you can use your perception of the space to plan your actions and route. Do you want to go to the kitchen to pour a glass of water, or open the fridge to grab a drink? After planning your actions, you then interact with the environment in a concrete way.
The third stage is when, in addition to interacting with the space, you also want to transform it—for example, by renovating your home, landscaping your yard, or rearranging your desk setup.
In summary, spatial intelligence essentially consists of three stages: the first is perception, the second is decision-making and planning, and the third is interaction and creation. This is our understanding of spatial intelligence.
Q: How are spatial intelligence models trained? It makes sense that large language models are trained using text, but how do you teach a model to perceive 3D space?
Li: There are currently two main approaches to training: one involves collecting data directly from the real world, such as recording real people performing tasks in their actual environments; the other involves using virtual simulations or synthetic data to recreate the real world on a one-to-one scale within a virtual environment and assigning physical properties to it, such as weight, friction, lighting, and heat conduction. With these properties in place, digital humans or virtual avatars can interact within the environment, thereby generating training data.
We are currently working on synthetic data. This is because we possess the ability to generate highly realistic 3D spaces—not just purely visual ones. Our understanding of spatial intelligence differs from that of Li Feifei and her team; the data they generate is more akin to text-to-image, image-to-video, video-to-Gaussian splatter, and then conversion into a mesh.Essentially, it remains 3D-rendered imagery that does not constitute a standardized 3D data format capable of being truly perceived, interacted with, or utilized for creation by intelligent agents or humans.
Q: So, in your understanding of spatial intelligence, 3D objects have more attributes beyond just shape and volume.
Li: That’s right. There’s even more information involved, including the spatial relationships between elements, the layout of functional combinations, aesthetic combinations, and even the projection of personal characteristics. These represent multidimensional data formats that were not available in traditional text and image modalities, yet they are particularly important in spatial intelligence. This is the most significant difference between spatial intelligence models and traditional text, image, and video modalities.
02
Take the initiative to source content from Xiaohongshu,
How Did a Game with a Million Followers Come to Be?
Q: Let’s talk in detail about *The Forest Box*. How did you make the transition and get the project off the ground?
Li: OK. Actually, we decided to make a game about a year and a half ago. At that time, we knew absolutely nothing about games—whether it was technology, game design, or development. So we started by doing some research, and we discovered a key opportunity: most games on the market—especially mobile games—were still in a relatively rudimentary, low-graphics development phase.
Looking back at the PC gaming market, a turning point occurred roughly a decade ago. Prior to that, most games featured a stylized, non-realistic art style and were of relatively low quality; however, between 2015 and 2020, there was an explosion of large-scale commercial 3D realistic games, with their share of the market rising from 20% to as high as 70% or 80%. In recent years, that figure has since dropped back to roughly 50-50.
We believe mobile games will follow the same path. Over the next two to three years, there will be significant growth potential for high-quality, photorealistic 3D mobile games. Since we happen to be interested in developing games related to 3D interior design, we conducted technical research and ultimately chose the Unreal Engine—because it supports high-quality, photorealistic visuals.
However, the official Unreal Engine has many limitations on mobile platforms, such as poor support for Lumen and numerous low-level issues. So we sought out experienced teams and eventually discovered that only a handful of teams—or even just a few specific individuals—at a select few companies actually possessed the necessary capabilities. We recruited those people.
About a year and a half ago, we began by developing the underlying rendering architecture. Over the next six months, we essentially completed the work on the underlying architecture and technical research.
Q: How is the gameplay determined?
Li: That brings us to the next key question: What kind of gameplay should we focus on? What content should we include? How should we design the systems? We have no experience in this area, so we’re hesitant to just blindly hire people. We’ve spoken with many game designers and product managers, but we’ve found that they still tend to fall back on conventional thinking.
As we currently understand it, gameplay is largely based on established systems from the past—limited resources, limited objectives, and limited mechanics—which are combined to create a variety of gameplay experiences. To be honest, most of these possibilities were likely exhausted during the Famicom era; since then, the focus has shifted more toward graphical upgrades and the cross-pollination of gameplay styles.
But we still wanted to do something different. So we adopted a player-collaboration approach. I had our new media team publish a large amount of content as a test, such as hand-drawn floor plans and rooms in various styles.
Then one day, a video titled “Four Layout Ideas for a Cream-Colored Room” went viral—it wasn’t just a hit on Xiaohongshu; it even took Instagram and TikTok by storm, racking up tens of millions of views per video and likely over 100 million views in total. This, in turn, influenced the direction of our product.
A post from Senhe was reposted on TikTok and received over a million likes
Q: Why did this post go viral?
Li: We were puzzled too: why did such simple content go viral? After reading through a lot of comments, we realized that users felt it was directly relevant to them.
At first, we created a lot of content about whole-house floor plans, but the response wasn’t great. Later, we realized that a whole house is a matter for the “whole family”—spaces like the living room are actually decided on by parents and siblings together, and I don’t really have much say in the matter. But the bedroom and study are “my” space; everyone has their own say in those areas and a desire to renovate them.
Early Content Published on Senhe (Part 2) vs. Recent Content Published (Part 1)
So you can see that a lot of people have left comments in the comments section asking, “Can you help me redesign my room, too?” We’ve clearly noticed that, for example, out of 1,000 replies, maybe 500 are from players who’ve used drawing apps like Notes to sketch out their own rooms in a crooked, hand-drawn style.
I’m sure this process must have been incredibly painful. Perhaps this player didn’t have the best planning skills, but he still painstakingly drew each stroke, one by one, in an effort to renovate his home. This experience really drove home the point that only content closely tied to the user’s own life can spark genuine engagement and interaction.
We don’t spend our days chasing trends or jumping on the latest fads—those are just superficial gimmicks. At its core, it’s all about the content: we’ve identified needs that users have always wanted to fulfill but have never been able to. I don’t think there’s ever been a product or game that truly allowed users to customize their own environment.
Q: So many of your product ideas come from social media and player interactions.
Li: That’s right. As you may have noticed, we’ve posted a lot of videos on our official account—featuring characters, DIY features, IPs, and more. Our goal is to design this product in a truly collaborative way, working together with our players. We carefully review every interaction and comment to better understand what players truly want. This approach provides invaluable insights for our product planning and serves as a powerful motivator for our team members.
By eight months ago, our direction had pretty much been set: to create an AI-powered, 3D version of TikTok. We wanted users to be able to use it as a DIY tool, while also giving players—who had been restricted by traditional home decoration and idle games—the freedom to let their creativity run wild. By using AI to break down creative barriers, we aim to fully satisfy the desire to explore, collect, and create.
The version of Senhe you’re currently seeing may seem more like a tool. Since we’re still in the early stages of building core functionality, many of the gameplay features we’ve designed haven’t been implemented yet. Here are some of the things we’ve been working on and thinking about recently.
Q: There are reports that *The Forest Box* “reached one million followers without paid promotion.” Is that true?
Li: That’s right, we haven’t run any paid ads. Our new media team consists of just four people, and they also handle media PR, marketing and distribution, and some product operations. We don’t have anyone dedicated to advertising and promotion.
Q: So what aspects of *The Forest Box* do you think sparked such a strong reaction?
Li: There are actually quite a few features that have received the best feedback so far—or that users are most looking forward to—and they align well with our plans for the product’s future gameplay.
First, we offer exceptional flexibility in floor plan creation, supporting freehand drawing, room scanning, video reconstruction, and 2D layout tools—making it the most flexible option available in any game currently on the market.
Second, our content is virtually limitless. This is because we’ve built an internal pipeline for automatically generating assets. Traditional mobile games may rely on manual labor to create a few thousand to tens of thousands of assets, which are then gradually released through in-game events; whereas, thanks to AI, we can produce hundreds of thousands of assets in a single month, reaching the scale of millions of assets in a year. For users, this means they’ve virtually achieved “item freedom.”
Third, we don’t just offer basic DIY editing tools—that’s merely a transitional approach from the era of manual creation. We’ve already achieved fully automated, end-to-end AI design, and we can integrate auxiliary features such as automatic furniture and decor matching, as well as customizable character interactions. What we strive for is a free-form experience characterized by high quality, with no material constraints, no creative limitations, and no restrictions on interaction.
Q: If you had to describe the core appeal of *The Forest Box* to players in 50 words or fewer, what would you say?
Li: I hope that after players experience this process of AI-driven creation and discovery, they’ll go back to playing traditional “point-and-click” building games like *The Sims* and *Animal Crossing* and find them “pointless” or “too boring.”
Q: Actually, traditional games also use technologies like PCG (procedural generation). How will the spatial awareness in *Forest Box* create a different experience?
Li: Let me give you an example: Suppose you want to renovate your home but don’t want to move your existing large pieces of furniture. In “Senhe,” you can scan your space with a single click using a video or photo to recreate the entire room on a 1:1 scale—it can even analyze your existing furniture.
Then you can move on to the product selection and design phase: If you have a preferred major furniture brand, you can simply select it and let the AI arrange it for you with a single click; if you’re not sure what to choose but have been inspired by a viral post on Xiaohongshu, just upload the corresponding image, and the AI will generate a design for you.
You can also take it a step further with DIY: for example, swap out a painting for one of your own, or replace the sofa with a model from your favorite brand. Or use AI to instantly change the style or color scheme. The entire process—from room scanning (perception) → product selection (decision-making) → AI design + DIY (creation)—results in a home decoration plan that’s ready to implement.
This is a general workflow for users with genuine home renovation needs, resulting in a structured spatial design—not just a mesh model, but a realistic product selection plan with appropriate dimensions, materials, and visual effects.
The gaming experience is a bit different. For example, users might start playing to explore different styles and character stories, and gradually grow fond of a particular character’s interior design, leading them to want to recreate it exactly in their own room. But what if the player’s room is a different size from the reference room, making a direct copy impossible? In the past, you could only arrange the items manually and were limited by the number of props available; now, AI can generate a new layout with a single click that fits the contours of your room while maintaining the same style.
It’s fair to say that traditional building and lifestyle simulation games still require players to invest a certain amount of patience and creativity, and this barrier to entry limits the user base. However, by incorporating AI-driven design, we can allow players to generate a room that scores a 70 or 80 with a single click, and then build upon that foundation. Just as taking photos used to require learning how to use a DSLR and finding the right angles, whereas now you can take great photos with a single tap on your phone, we’re making spatial design just as simple.
With the barrier to entry lowered in this way, younger students and older adults can all become potential users, greatly expanding the user base while ensuring that the experience remains enjoyable and accessible—anyone can create a great room to share.
Q: Are the objects placed by players pre-existing 3D assets in the asset library, or are they randomly generated?
Li: It can be assets that are already packaged in the game library, or AI-generated DIY content. Essentially, it’s freely available.
Q: What was the biggest challenge you faced when adapting an AI technology for a mainstream video game?
Li: There were many challenges. We had never developed a game before, so everything was completely new to us. We weren’t very familiar with engines like Unreal, Unity, and Cocos, so we had to learn them step by step on our own—studying the source code and documentation and building demos—before we finally settled on Unreal.
The team structures are also completely different. AI companies consist of front-end, back-end, and algorithm teams, while game companies are divided into three main departments: design, art, and programming.
The third issue is a clash in development approaches. For example, our AI team tends toward a full-stack development model, where individuals often handle multiple roles and prioritize rapid iteration.For example, a single person might handle the entire product documentation, prototyping, and UI design, while the algorithms might be assigned to someone else. In contrast, game development emphasizes processes, documentation, and division of labor, with roles highly specialized—system designers, balance designers; concept artists, 3D artists; UI designers, motion designers… Each person’s responsibilities are broken down into very specific tasks, which means that individuals may not fully understand the work of the roles immediately upstream or downstream from their own.
This includes the hiring process. Since we’re not in the industry, we don’t even know where to look for the right kind of people. We can only identify candidates by researching products on the market that share a similar vibe to what we’re aiming for and reaching out to those who have worked on those projects. During interviews, we may also encounter differences in perspective with the candidates.
In the early stages, we could only reach out to people with sincerity and gradually find the right fit. Once we started working, we had to go through a period of adjustment. For example, the first engine programmer we hired asked, “Where’s the design document?” We replied, “We haven’t hired a game designer yet—we only have ideas.” We had to write the document and study game design theory, while they had to step slightly outside the boundaries of their role.
Fortunately, the team is relatively young and has an interdisciplinary background, so there aren’t many rigid boundaries between us, and we’re able to quickly reach consensus on technical, business, and aesthetic matters. Although we’ve encountered some challenges, the project has generally progressed smoothly.
03
Aiming for a market capitalization of 10 billion or 100 billion,
The future is about so much more than just “decorating your little nest”
Q: Currently, “Forest Box” focuses primarily on interior room design. Will it expand to include entire houses, gardens, or cities in the future?
Li: Absolutely, but here’s how we’ve approached it: Generating designs for indoor spaces is far more challenging than for outdoor ones. That’s because indoor design involves functional, circulation, and aesthetic considerations. There are simply too many factors to account for, making the training process much more difficult.
When it comes to outdoor environments, PCG (procedural generation) technology has already reached a relatively mature stage. For example, *Cities: Skylines* uses rules to generate cities. Although the level of freedom is limited, the spatial complexity is already quite usable. Moreover, outdoor environments are primarily about visual presentation and do not involve much in the way of deep interaction.
If we can create a single room like this, we can certainly build a house with a dozen or so rooms—no problem at all. So our plan is to first use a single room as a showcase to refine our product and technology to a commercially viable state, and then scale up to multiple rooms. Finally, we’ll incorporate the environment—which we believe is a universal aspect with relatively low technical barriers. Whether through our proprietary technology or by leveraging existing solutions, generating the environment will be straightforward.
Q: Are you worried that big tech companies might come in and “outcompete” you?
Li: This is a question people often ask. First, there are definitely barriers to entry here; large tech companies can’t just jump in and build something overnight. Second, these companies are focused on other areas—they’re still busy with general-purpose modalities like text, images, and video. Those markets are larger and offer faster commercialization opportunities.
In contrast, when it comes to 3D modeling, there are still many technical challenges in the architectural aspects that remain unresolved, and only a small number of people within those organizations are experimenting with it.For example, while some major tech companies have dedicated labs for spatial content generation, they tend to focus on specific scenarios like soccer fields, and their research into complex spaces such as home layouts isn’t as deep as ours. In fact, even former algorithm engineers specializing in 3D generation from Tencent and Tsinghua University have now joined us—because they believe our technical approach and final results represent the best solution in both academia and industry. We’ve been at this for two years, and the mistakes we’ve made are ones that these major companies would have to repeat if they were to start from scratch.
Of course, technical barriers will eventually be overcome. All we can do is accelerate our R&D and the pace of iteration. With platform-based products like this, it all comes down to who can iterate faster.
Q: What are your thoughts on the scalability and commercial potential of *The Forest Box*? Many companies are reluctant to explore innovative directions precisely because it requires them to develop new revenue streams.
Li: We want to build something big, not just a small, one-off project that generates quick returns. There are three key points. First, it relates to our entrepreneurial goals—we want to build a company worth tens or even hundreds of billions. To do that, we must be highly innovative, and we must be prepared to go through a period where we don’t see immediate returns.
However, in the long run, we will certainly need to consider the balance between technological innovation and commercial viability. Currently, the mainstream monetization models for most games are one-time purchases, advertising, and in-app purchases. But we want to explore some new approaches.
For example, since our content is grounded in reality—the products generated within the game are actual goods—theoretically, as our user base grows, we could explore monetization through e-commerce. This would allow players to access most of the core content without being forced to make in-app purchases or watch ads. I believe this represents a promising avenue for exploration and experimentation within the industry.
In addition, we’ve had some unexpected benefits during our exploration. For example, these high-quality designs created by players could actually be sold directly to robotics companies for training purposes. This is because the designs players create are not only visually appealing but also more practical. As a result, user-generated content itself can support the game’s operations.
Ultimately, our primary goal is to use games to demonstrate our capabilities in generating 3D spaces. These capabilities aren’t limited to gaming; in fact, they can be fully applied across a wide range of fields, including film and television, live streaming, and AR/VR. These are some of our thoughts on commercialization.
Q: What do you envision for Habitat Technology in one year, five years, and twenty years?
Li: Twenty years is too far off—AI is changing so rapidly these days that we can’t even imagine what it will be like. But we do have plans for the next year and the next five years.
Within the next year, we hope to launch Senhe 1.0 as soon as possible, with the goal of reaching tens of millions of users. In this age of AI, the approach to game growth is no longer about slow, long-term operations—it’s either a hit or a miss, or it quickly surges to the millions or tens of millions. We still have high hopes for this product.
In five years, we aim to create a true “AI-powered spatial version of TikTok”—a kaleidoscope of 3D content that encompasses interactive elements like gaming while also integrating a wide range of 3D-related features, such as live streaming, e-commerce, VR, and more.
Q: There has been some pessimistic talk recently suggesting that, despite three or four years of discussion about “AI + gaming,” no true “AI-native games” have emerged yet. Do you think such native games will eventually appear? Is there a higher likelihood of this happening in the field of spatial intelligence?
Li: I think so. Essentially, the “premises have changed”—in the past, games were all about “limited resources + limited tasks + limited actions,” but now AI brings a nearly infinite library of materials—it’s like going straight from the agricultural age to an era of “abundant resources.” Gameplay will definitely undergo a complete transformation. This isn’t just about one new type of gameplay, but a complete reimagining of all existing gameplay mechanics.You can create so many new things now. That’s why I believe AI is the savior of the gaming industry; it breaks the deadlock of gameplay exhaustion.
However, native AI games in spatial mode are unlikely to be the first to hit the market, as games involve more than just 3D content—they also include text, images, and other elements. For example, we can see that many AI-powered emotional companionship products are already close to being fully operational, with some products seeing average daily user engagement of over 1.5 hours.
In my view, the spatial model is more of a fundamental manifestation of AI; it bridges the gap between the suddenness and variability of space, while also making assets and interaction methods virtually limitless. As a result, AI-native games based on the spatial model are likely to emerge later on, much like the concept of “next-gen AAA games.”
In the short term, we may see some AI-powered space games that excel in specific areas—such as more advanced space-generation capabilities, an infinite mission system, or innovative interaction methods. However, it will take some time for a fully realized version to emerge.
原创文章,作者:gallonwang,禁止转载:https://youxichaguan.com/en/archives/195191