Photo: Elena Zhukova/Embodied IntelligenceEmbodied Intelligence wants to use AI and VR to teach robots new skills, like how to manipulate wires, much faster.
Depending on who you ask, robotic grasping has been solved for a while now. That is, the act of physically grasping an object, not dropping it, and then doing something useful is a thing that robots are comfortable with. The difficult part is deciding what to grasp and how to grasp it, and that can be very, very difficult, especially outside of a structured environment.
This is a defining problem for robotics right now: Robots can do anything you want, as long as you tell them exactly what that is, every single time. In a factory where robots are doing the exact same thing over and over again, this isn’t so much of an issue, but throw something new or different into the mix, and it becomes an enormous headache.
Over the past several years, researchers like Pieter Abbeel at UC Berkeley have been developing ways of teaching robots new skills, rather than actions, and how to learn, rather than just how to obey. This week, Abbeel and several of his colleagues from UC Berkeley and OpenAI are announcing a new startup (with US $7 million in seed funding) called Embodied Intelligence, which will “enable industrial robot arms to perceive and act like humans instead of just strictly following pre-programmed trajectories.”
A nice little summary of what Embodied has in mind, from their press release:
We are building technology that enables existing robot hardware to handle a much wider range of tasks where existing solutions break down, for example, bin picking of complex shapes, kitting, assembly, depalletizing of irregular stacks, and manipulation of deformable objects such as wires, cables, fabrics, linens, fluid-bags, and food.
To equip existing robots with these skills, our software builds on the latest advances in deep reinforcement learning, deep imitation learning, and few-shot learning, to all of which the founding team has made significant contributions. The result isn’t just a new set of skills in the robot repertoire, but teachable robots, that can be deployed for new tasks on short turn-around.
The background here will be familiar to anyone who has followed Abbeel’s research at UC Berkeley’s Robot Learning Lab (RLL). While the towel folding is probably the most famous research out of RLL, the lab has also been working on adaptive learning through demonstration, as with this robotic knot tying from 2013:
There are two important things that are demonstrated here. First, you’ve got the learning from demonstration bit, where a human shows the robot how to tie a knot without any explicit programming necessary, and then generalizes the demonstration to apply the skill that it represents to future knot-tying tasks. This leads to the second important thing: Since there are no fixtures, the rope (being rope) can start off in all kinds of different configurations, so the robot has to be able to recognize that and modify its behavior accordingly.
While humans can do this kind of thing without thinking, robots still can’t, which is why there’s been such a big gap between the capabilities of humans and robotic manipulators. Embodied wants to bridge this gap with robots that can learn quickly and flexibly.
“Around 2012, we concluded that it would be really hard to get to the real-world capabilities that we’d want with the more engineered approaches that we’d been following,” Abbeel tells us. “They had a lot of learning in them, but it was a combination of learning and engineering to get everything to work.” Then came a breakthrough in the field of AI: The ImageNet project at Stanford showed that learning could do a lot more than it could before, if you were willing to collect enough data and train a big, deep neural net for your tasks.
Abbeel and his team have since been “pushing reinforcement learning and imitation learning pretty hard,” he says, “and we’ve reached a point where we really believe that the time is right to start putting this into practice, not necessarily for a home robot, which needs to deal with an enormous amount of variation, but in manufacturing and logistics.”
Embodied is targeting repetitive manipulation tasks where the current state-of-the-art in automation is simply not capable enough, as well as tasks that would require robots to be reprogrammed very frequently. “On a practical level,” Abbeel says, “we’re building a software system that can learn new skills very, very quickly, which makes it very different from traditional automation.”
The idea is that with a flexible enough learning framework, programming becomes trivial, because the robot can rapidly teach itself new skills with just a little bit of human demonstration at the beginning. As Abbeel explains, “The big difference is that we bring software that we only have to write once, ahead of time, for all applications. And then to make the robot capable for a specific application, all we need to do is collect new data for that application. That’s a paradigm shift from needing to program for every specific task you care about to programming once and then just doing data collection, either through demonstrations or reinforcement learning.”
Teaching the robot new skills is a process that has been evolving rapidly over the last few years. As you saw in the knot-tying video, the way you use to have to do it was by physically moving the robot around and pushing buttons on a controller. Most industrial robots work the same way, through a teach pendant of some sort. It’s time consuming and not particularly intuitive, and it also creates a void between what the robot is experiencing and what the human teacher is experiencing, since the human’s perspective (and indeed entire perception system) is quite different from that of the robot that’s being taught.
Based on some more recent research at RLL, Embodied is taking a new approach based on virtual reality. “What’s really interesting is that we’ve hit a point where virtual reality has become a commodity,” Abbeel says. “What that means is actually you can teach robots things in VR, such that the robot experiences everything the way that it will experience it when doing the job itself. That’s a big change in terms of the quality of data that you can get.”
Because the data collected in this way is much high quality, teaching robots new skills is much faster. You can read the paper here, but teaching each of the tasks in the video above took no more than 30 minutes of demonstration (and sometimes significantly less) to achieve high success rates (in the mid 80 percent to high 90 percent). Remember, the system is learning a skill rather than a sequence of actions, meaning that it can extrapolate to adapt to variability that it wasn’t explicitly trained on. This is crucial for operating outside of a research environment.
Once the initial demonstration phase is over, the robot is probably not moving as fast as a human moves, and it’s also probably not as reliable as a human. A success rate of 80 or 90 percent is research good, but it’s not good enough that any manufacturing customer would be okay with it for their robots, especially if it’s slow. Embodied understands this, but Abbeel says that the robots will get better very quickly: “It might not reach 100 percent accuracy, and it might not be moving at human speed, but the next phase of learning perfects and speeds up the execution through reinforcement learning, and that together gives you a new skill.”
Embodied will be focusing on the kinds of visual motor skills that current robots don’t excel at, where you need continual visual feedback to execute on what you’re doing. Manipulating wires and cables is a good example of this—if you want your robot to be able to plug one thing into another thing, it has to be able to recognize and grasp a floppy thing in an arbitrary location and orientation, a skill that it can be difficult to program explicitly.
As far as the complexity of skills that Embodied will be able to teach its system, Abbeel says that it’s really up to what’s possible to do in teleop. “The way we characterize it is as long as a human can teleoperate the robot to do the job, then it should be learnable. Of course, the more complex the task, the more data will be needed, and that’s what we’ll figure out over time—what is the amount of data collection that’s needed for a given task. But, the practical metric would be, we sit behind our teleop, we try to do a task with the robot, if we can do it, then we know it’s going to be within the scope of what we can provide.”
We should mention that there are a few other companies already in this space, including Kindred, Kinema Systems, and RightHand Robotics, which offer robot manipulation solutions that can (to some extent) manage variability and adapt to new tasks. We’ll have to wait and see how well Embodied Intelligence compares—Abbeel told us to expect some video demos within the next few months.