Google’s DeepMind project has developed some interesting 3D capabilities.
DeepMind, an AI startup acquired by Google a few years ago, has been developing unusual applications for AI using powerful algorithmic tools. Recently they announced a “Neural Scene Representation and Rendering” capability.
The idea here is that the AI can be presented with only a couple of 2D images of a scene, and then it is able to reconstruct a 3D understanding of the scene and can provide alternative views – even though the AI has never seen those views. Here’s how it looks:
They say it’s like “seeing” the fourth leg of a table when you can’t actually see it. You know it’s there.
This is a highly complex problem for AI, and as a result many researchers, including those at DeepMind, have been working on the problem. It seems that DeepMind has made some good steps in this direction.
In fact, they’ve produced a “Generative Query Network:
A framework within which machines learn to perceive their surroundings by training only on data obtained by themselves as they move around scenes. Much like infants and animals, the GQN learns by trying to make sense of its observations of the world around it. In doing so, the GQN learns about plausible scenes and their geometrical properties, without any human labelling of the contents of scenes.
This is quite fascinating, as it seems that the AI is actually producing a “mental” representation of the scene.
But it seems to be also quite analogous to optical methods of capturing 3D scenes. As we’ve explained previously, optical methods involve taking a series of 2D images of a subject and then analyzing the shifts in background to gradually develop a 3D model (or “understanding”) of the subject.
This usually requires quite a number of images in order to properly prepare the 3D model, at least a few dozen, but over a hundred would be optimal. You simply need an appropriate system to analyze the images, generate a 3D point cloud and convert that to a 3D model.
This seems to be quite similar to how DeepMind’s system works, except that the “system” is a neural net that was generated through experience and we have no specific idea of how it works internally.
It appears Google’s DeepMind system uses far fewer images and at this time only works on contrived virtual reality scenes. They have yet to unleash the system on “reality”, which is of course far more complex.
But eventually they will, and I have little doubt they will make it work at some point.
Then we would live in a world where only a few brief glimpses of an object would permit capture of that shape. Is this a bad thing?
I’m not sure. Today we have multiple methods of easily capturing objects in 3D and even methods of reproducing them using 3D printers, among other machines. But we don’t see large scale – or even small scale – theft of designs through such captures.
I think instead that this capability could make it easier for industrial users to provide quality control scans of print results. Imagine an advanced 3D printer that once completed, unleashes the AI to check and confirm that the printed part is within tolerances merely by a quick glance.
That’s more likely what the future holds.