July 3, 2024

Meta's New "3D Gen" AI 3D Object Generator Is Mind-Blowing

Meta 3D Gen (3DGen) is designed to generate 3D assets quickly and accurately from text descriptions.

by

Jim Clyde Monge

Meta has launched Meta 3D Gen (3DGen), a new and fast way to create high-quality 3D objects in less than 60 seconds from simple text prompts.

The AI community hasn’t gotten over the subsequent releases of AI video generators in the past weeks yet, and now we have a 3D generator from Meta.

Is it time to switch our focus to this new modality?

While AI 3D generation isn’t entirely new, the existing solutions like Genie from Luma Labs often fall short in quality for production or commercial use. Surprisingly, Meta’s 3D Gen appears to be a significantly more capable solution based on their released examples.

What is Meta 3D Gen?

Meta 3D Gen (3DGen) is designed to generate 3D assets quickly and accurately from text descriptions. It supports physically-based rendering (PBR), which is crucial for making 3D objects look realistic when lit in various ways.

Additionally, 3DGen can retexture previously generated or artist-created 3D models with new text prompts, making it highly versatile.

How does it work?

Meta 3D Gen combines two main components:

Here’s a simple breakdown of how it works:

Meta 3D Gen. Meta 3D AssetGen Meta 3D TextureGen — Image from Meta research

Stage 1: 3D Asset Generation

Initial 3D Creation: When you give a text prompt, Stage gets into work, using Meta 3D AssetGen to create an initial 3D model. This includes the 3D shape, basic texture, and PBR material maps. This stage takes about 30 seconds.
Multi-view Generation: The model generates several views of the object using a multi-view text-to-image generator. A reconstruction network then creates the first version of the 3D object, followed by mesh extraction to define its shape and texture.

Stage 2: Texture Refinement and Retexturing

Enhancing Textures: In Stage 2, the 3D model from Stage 1 is refined. Meta 3D TextureGen takes the initial model and the text prompt to produce high-quality texture and PBR maps. This takes around 20 seconds.
Retexturing: This stage can also add new textures to untextured 3D models based on new text prompts. This feature is useful for customizing or updating 3D models.

By combining these stages, 3DGen uses different spaces (view space, volumetric space, and UV space) to produce high-fidelity 3D assets efficiently.

If you want to know more about the details of 3D Gen, check out this whitepaper from Meta.

Examples

Here are a few fun examples of what Meta 3DGen can create from simple text prompts:

Prompt 1: A hippo wearing a sweater (Left).

Prompt 3: a stack of pancakes covered in maple syrup (Right)

Prompt: an adorable piglet in a field (Left)

Prompt: a baby dragon hatching out of a stone egg (Right)

These examples show how versatile and creative the tool can be, producing detailed 3D models quickly.

How does it compare to other solutions?

Meta 3DGen stands out from other industry solutions due to its speed and quality. Here’s a comparison with some leading text-to-3D generators:

Meta 3DGen outperforms these alternatives in both speed and quality, making it a highly efficient tool for 3D generation.

The researchers also analyze performance rates for visual quality, geometry, texture details, and the presence of texture artifacts, as functions of the scene complexity as described by the text prompt.

Meta 3D Gen benchmarks plot diagram — Image from Meta research

The plots above show that, while some of the baselines perform on par for simple prompts, 3DGen starts outperforming them strongly as the prompt complexity increases from objects to characters and their compositions.

Unlike many state-of-the-art solutions, both AssetGen and TextureGen are feed-forward generators, making them fast and efficient after deployment. This efficiency is crucial for applications requiring quick turnaround times, such as VR and gaming.

Why is this important?

One of the major challenges in 3D generation is creating models that look good in both VR and real-world applications. VR, in particular, is unforgiving when it comes to fake detailing. You need as much detail as possible in the actual geometry to make the experience believable.

While current AI models often output low-resolution geometry and approximate detailing with textures, tools like Meta 3DGen are paving the way for more sophisticated solutions.

Final Thoughts

Meta has a strong track record with its large language models, like the recent Llama 3. The 3D generation space is particularly challenging due to the limited availability of 3D datasets for training compared to images and videos.

However, the examples provided by Meta 3D Gen are remarkably promising, albeit with some limitations in handling extremely complex prompts or producing highly detailed assets. There’s also the challenge of producing clean topology, which is essential for applications like animation and 3D printing.

It’s encouraging to see a major tech company like Meta tackling this challenging research area and producing such an impressive solution. While we may need to wait a few more months before 3D generators mature enough to seamlessly integrate into the workflows of 3D illustrators and graphic artists, it’s nice to see some interesting progress in this field.

‍

Stay ahead. Stay updated.