A new image-to-3D method promises to create even more detail.
Only a year ago, the idea of creating a full 3D model from only a single 2D image would have been science fiction. Today, there are multiple tools that attempt to do exactly that. But it’s still early, and there is lots of room for improvement, as the generated models are not always usable.
One of the areas for improvement is the level of detail generated. As humans, we can see plenty of detail in a given image and can easily imagine how that detail might be on the hidden sides of the subject. That’s challenging for AI models.
But some improvement has been made by a research team that has developed a new tool called Sparc3D. What have they changed? Here’s their explanation:
“Existing two-stage pipelines—compressing meshes with a VAE (using either 2D or 3D supervision), followed by latent diffusion sampling—often suffer from severe detail loss caused by inefficient representations and modality mismatches introduced in VAE. We introduce Sparc3D, a unified framework that combines a sparse deformable marching cubes representation Sparcubes with a novel encoder Sparconv-VAE. Sparcubes converts raw meshes into high-resolution (10243) surfaces with arbitrary topology by scattering signed distance and deformation fields onto a sparse cube, allowing differentiable optimization. Sparconv-VAE is the first modality-consistent variational autoencoder built entirely upon sparse convolutional networks, enabling efficient and near-lossless 3D reconstruction suitable for high-resolution generative modelling through latent diffusion.”
What does all that mean? Many current methods for generating 3D models compress shapes using a variational autoencoder and then use diffusion to generate new samples — but this often loses fine details due to mismatched data formats and inefficient encoding.
Sparc3D solves this by introducing a new system: it uses Sparcubes, which represent complex 3D shapes as sparse, high-resolution surfaces, and Sparconv-VAE, the first VAE designed entirely with sparse 3D convolutions, which represent feature detail. Together, they allow for highly detailed, efficient, and accurate 3D model generation.
Above is sample generation from Sparc3D on the right and a previous model on the left. You can see there’s quite a difference in the detail. Often these systems guess wrong for the hidden elements, but Sparc3D seems to do a better job.
The research team provides a demo system where you can try out Sparc3D online at HuggingFace. There you can upload a 2D image (preferably one with a single subject with a blank background), and it will generate a 3D model for you. Note: it’s not very fast, so you will likely wait at least a few minutes to get your result.
I generated an image of a WWI biplane (above) to see how Sparc3D would fare with that design. The result is below:
That’s quite incredible. The surfaces are correct and smooth, with all expected elements present, even on the hidden side. The complex geometry of this item would, I thought, have been challenging. However, Sparc3D generated a near-perfect biplane 3D model!
You can download a .GLB file, and that in turn can be converted into an .STL or .3MF file for 3D printing.
Most of the examples shown by Sparc3D are gaming objects, as that’s the real intention of the system. But could it handle something from real life? I sent in a portrait image to see what would happen.
Here’s the result, which is pretty decent. What was most surprising was the detail of the stubble on the subject’s face, which was in the 2M face 3D model.
I’m very impressed with Sparc3D’s ability to extrapolate hidden surfaces, and at the same time retain the detail in supplied 2D images.
Where is this technology going to be next year? I can only imagine.
Via GitHub