A new paper describes how AI tools can be used to generate printable 3D models with only a text prompt.
The Cornell researchers have created “DreamFusion”, an AI tool that automatically transforms text prompts into full 3D models.
The technology is an extension of software developed to perform “text to image”, which can generate astonishingly detailed and realistic images from only a short descriptor sentence called a “prompt”. That technology in turn was preceded by “text to text” software that could generate paragraphs from a prompt.
All of these tools use AI techniques to create data that “fills in the blanks”. In the case of text generation, it’s predicting the next few words, much like is done when you’re typing in a Google search. Similarly, the “text to image” tools generate an image that matches your “caption-like” prompt. It works incredibly well, and I’ve even used AI-generated images as “stock photos” for this publication — and most people didn’t even notice.
How can these tools do this? They are trained by looking at content we’ve all provided to the Internet. The “text to image” tools have been trained by examining literally billions of images and their matching captions on the web. Essentially, the tool generates what it believes to be the appropriate image for a given “caption”, or prompt.
DreamFusion uses the same techniques, except that instead of generating text or images, it generates full color 3D models. The researchers explain:
“Given a caption, DreamFusion uses a text-to-image generative model called Imagen to optimize a 3D scene. We propose Score Distillation Sampling (SDS), a way to generate samples from a diffusion model by optimizing a loss function. SDS allows us to optimize samples in an arbitrary parameter space, such as a 3D space, as long as we can map back to images differentiably. We use a 3D scene parameterization similar to Neural Radiance Fields, or NeRFs, to define this differentiable mapping. SDS alone produces reasonable scene appearance, but DreamFusion adds additional regularizers and optimization strategies to improve geometry. The resulting trained NeRFs are coherent, with high-quality normals, surface geometry and depth, and are relightable with a Lambertian shading model.”
I had some suspicions that this might be possible, writing a prediction earlier this year. However, now it seems to have actually been done.
Well then, are we done? Is this the end of CAD designers?
By no means. This is merely a research project that had the intent to prove that it was possible.
In fact, DreamFusion can generate only a very small set of 3D models, and there’s a very good reason for that constraint.
It turns out that there are not billions of 3D models on the web that can be used for training data, unlike text or images, which grow every second. Instead, the researchers simply cooked up a small subset of 3D models for training that provided at least something for the tool to use.
DreamFusion does indeed work; it however can generate only a very small subset of 3D models due to the limited training.
However, if there were a massive selection of 3D models for training available, then this demonstration shows that it could very likely work in generating far more 3D models.
Unfortunately, there is no such “clean” training set available. Sure, there are plenty of 3D models, but they’re just not as available as images and captions on the web, for example.
What large sources of 3D models do exist?
One might be Thingiverse, which, as of this writing, has apparently 5.5M 3D models. While these are indeed 3D models or derivations of them, there are massive issues with the “captions”. In the case of Thingiverse, the captions would likely be the description text. Unfortunately, it’s a huge mess. Consider the text for Thingiverse item 5,543,957, “AB-Haube”
This item seems to be a customized lid of some kind. The description is as follows:
Customized version of https://www.thingiverse.com/thing:1943463
Created with Customizer! https://www.thingiverse.com/apps/customizer/run?thing_id=1943463
Using the following options:
show_slice = Off
hole_shape = Square
[and many other parameters omitted]”
This is not usable as a caption. It doesn’t describe the object in any way. The training tool would have to follow the link and see what the original source said, and even then that might not be descriptive either. While there are some Thingiverse entries that do have proper, usable descriptions, there are as many — or more — that don’t.
In other words, the data from this type of source are really noisy and any training would be highly questionable.
The descriptions from other repositories would no doubt be similarly messy. I guess that’s what you get when you enable anyone to submit anything.
There are some more organized repositories, such as those from mechanical parts catalogs, which I previously suspected might be more useful for this purpose. They may be the only useful 3D model data for training available today.
One way to help this along would be for online repositories to force new submissions to include a proper descriptor: “one sentence description, please, what is this thing?” Over time then we’d gradually collect more useful data.
That could be a logical step for the researchers: see if they could access parts repositories and use them for training data. However, that might require considerable discussion, legal review and more before it could happen. Another possibility is to simply try using Thingiverse / Printables / YouMagine / MyMiniFactory / etc. data for training and just see what happens.
Another intermediate step could be some kind of “harvesting” process that would scrape through online repositories and compile them into more standard format. This could, for example, reject items such as the Thingiverse model above and focus on the juicy descriptions. This could form a research-ready data set for training of future tools.
At this point you can see it’s all theory and no real applications. But the new paper proves it’s possible, and the ability to generate 3D models from text is such a beneficial goal there should be much more work undertaken.
In the meantime, please provide good descriptions for anything you upload to online 3D model repositories.
Via DreamFusion / Github (Hat tip to Shachar)