Google makes robots learn new skills faster by sharing their experience

Google Brain Team and their subsidiaries DeepMind and [X] have revealed the learning method for robots to end all learning methods.

Humanity has an entire list of activities that we cannot wait to delegate to robots. Some are dangerous, others are unpleasant, and some are simply better suited for a mechanical mind. But even a task as simple as assisting the elderly with chores and daily activities is incredibly complex for a robot. If we think about it, we will find that that the most mundane task we can think of actually relies on a lot of decision-making and previous experience.
In other words, it takes a lot of work for a robot to learn a skill that we wouldn’t normally describe as robotic. Reason being that a robot relies on its own experience to hone its skill, which takes an impractical amount of time, no matter how sophisticated its learning algorithm is. This is especially true if motor skills are involved. We are naturally good at integrating our senses, reflexes, and muscles in a closely coordinated feedback loop. “Naturally”, because our behaviors are well-honed for the variability and complexity of the environment. Not the case for robots.
Thankfully, Sergey Levine (Google Brain Team), Timothy Lilicrap (DeepMind) and Mrinal Kalahrishnan (X) have developed and demonstrated a method that allows robots to learn from each other’s experiences. By enabling them to learn collectively, they gain more experience quicker.
These robots instantaneously transmit their experience to other robots over the network – sometimes known as “cloud robotics” – and it is this ability that can let them learn from each other to perform motion skills in close coordination with sensing in realistic environments.
Researchers have performed three experiments designed to investigate three possible approaches for general-purpose skill learning across multiple robots: learning motion skills directly from experience, learning internal models of physics, and learning skills with human assistance. In all three cases, multiple robots shared their experiences to build a common model of the skill.
Learning from raw experience with model-free reinforcement learning
Trial-and-error learning is very popular among humans and animals, and can actually be extended well to robots. This kind of learning is called “model-free” because there is no explicit model of environment formed – they explore variations on their existing behavior and then reinforce and exploite the variations that give bigger rewards. In combination with deep neural networks, model-free algorithms have been key to sucess in the past, most notably in the game of Go. Having multiple robots learn this way speeds up the process significantly.
In these experiments, robots were tasked with trying to move their arms to goal locations, or reaching to and opening a door. Each robot has a copy of a neural network that allows it to estimate the value of taking a given action in a given state. By querying this network, the robot can quickly decide what actions might be worth taking in the world. When a robot acts, noise is added to the actions it selects, so the resulting behavior is sometimes a bit better than previously observed, and sometimes a bit worse. This allows each robot to explore different ways of approaching a task. Records of the actions taken by the robots, their behaviors, and the final outcomes are sent back to a central server. The server collects the experiences from all of the robots and uses them to iteratively improve the neural network that estimates value for different states and actions. The model-free algorithms we employed look across both good and bad experiences and distill these into a new network that is better at understanding how action and success are related. Then, at regular intervals, each robot takes a copy of the updated network from the server and begins to act using the information in its new network. Given that this updated network is a bit better at estimating the true value of actions in the world, the robots will produce better behavior. This cycle can then be repeated to continue improving on the task. In the video below, a robot explores the door opening task.

With a few hours of practice, robots sharing their raw experience learn to make reaches to targets, and to open a door by making contact with the handle and pulling. In the case of door opening, the robots learn to deal with the complex physics of the contacts between the hook and the door handle without building an explicit model of the world, as can be seen in the example below:

Learning how the world works by interacting with objects
Direct trial-and-error reinforcement learning is a great way to learn individual skills. However, humans and animals don’t learn exclusively by trial and error. We also build mental models about our environment and imagine how the world might change in response to our actions.
We can start with the simplest of physical interactions, and have our robots learn the basics of cause and effect from reflecting on their own experiences. In this experiment, researchers had the robots play with a wide variety of common household objects by randomly prodding and pushing them inside a tabletop bin. The robots again shared their experiences with each other and together built a single predictive model that attempted to forecast what the world might look like in response to their actions. This predictive model can make simple, if slightly blurry, forecasts about future camera images when provided with the current image and a possible sequence of actions that the robot might execute:

Top row: robotic arms interacting with common household items. Bottom row: Predicted future camera images given an initial image and a sequence of actions.

Once this model is trained, the robots can use it to perform purposeful manipulations, for example based on user commands. In this prototype, a user can command the robot to move a particular object simply by clicking on that object, and then clicking on the point where the object should go:

The robots in this experiment were not told anything about objects or physics: they only see that the command requires a particular pixel to be moved to a particular place. However, because they have seen so many object interactions in their shared past experiences, they can forecast how particular actions will affect particular pixels. In order for such an implicit understanding of physics to emerge, the robots must be provided with a sufficient breadth of experience. This requires either a lot of time, or sharing the combined experiences of many robots.

Learning with the help of humans
Robots can learn entirely on their own, but human guidance is important, not just for telling the robot what to do, but also for helping the robots along. We have a lot of intuition about how various manipulation skills can be performed, and it only seems natural that transferring this intuition to robots can help them learn these skills a lot faster. In the next experiment, each robot was provided with a different door, and guided each of them by hand to show how these doors can be opened. These demonstrations are encoded into a single combined strategy for all robots, called a policy. The policy is a deep neural network which converts camera images to robot actions, and is maintained on a central server. The following video shows the instructor demonstrating the door-opening skill to a robot:

Next, the robots collectively improve this policy through a trial-and-error learning process. Each robot attempts to open its own door using the latest available policy, with some added noise for exploration. These attempts allow each robot to plan a better strategy for opening the door the next time around, and improve the policy accordingly:

Not surprisingly, robots learn more effectively if they are trained on a curriculum of tasks that are gradually increasing in difficulty. In the experiment, each robot starts off by practicing the door-opening skill on a specific position and orientation of the door that the instructor had previously shown it. As it gets better at performing the task, the instructor starts to alter the position and orientation of the door to be just a bit beyond the current capabilities of the policy, but not so difficult that it fails entirely. This allows the robots to gradually increase their skill level over time, and expands the range of situations they can handle. The combination of human-guidance with trial-and-error learning allowed the robots to collectively learn the skill of door-opening in just a couple of hours. Since the robots were trained on doors that look different from each other, the final policy succeeds on a door with a handle that none of the robots had seen before:

These are relatively simple tasks, involving relatively simple skills, but the method is essential, and possibly a key to enabling robots to assist us in our daily lives. Maybe start thinking about which chores you would like to get rid of first. Maybe consider how easily a robot could replace you at your job. Learn a new skill or two.
All media and experiment descriptions courtesy of Sergey Levine, Timothy Lillicrap, and Mrinal Kalakrishnan, Google Research Blog.


AI can now guess your age, evaluate your wrinkles, and suggest treatment

For thousands of years, humans were the only judges of beauty. Even though the criteria for beauty change, for objects and humans both, the judges were always the same, until now. Earlier this year, the first ever beauty contest judged by an AI took place. There were three judges. The Symmetry Master was capable of judging symmetry of facial structure. MADIS was a model which scored people by their similarity to pro-models within their racial group and for sharing common features with famous actors. Lastly, there was RYNKL, a program developed to judge ones wrinkles and level of ageing.

RYNKL has led a successful campaign on Kickstarter, where the pledged $12,187 amounted to almost a double of the intended goal of the project. The RYNKL app is now available at Google Play and App Store. When a selfie is taken through the app, it compares user’s wrinkles in specific areas to a large database, in which the wrinkles have been scored by other people. An AI algorithm then estimates the level of ageing and wrinkleness of user’s face. It also suggests specific treatment and can track changes of the score over time. The score goes from 0-100, zero being the perfect score.

The app is a product of Youth Laboratories, which is a team working under Insilico Medicine – a major big data analysis company. Insilico Medicine is at the same time working of different projects in aligning deep learning with aging and medicine. A paper titled Deep biomarkers of human aging was written by scientists from this company and it got recently published in Aging journal. It outlines features 21 deep neural networks (DNNs), which estimated one’s age and gender based on values recorded from a blood test. The combined level of accuracy for age estimation was 83.5%. The project even has its own website, where you can input your blood test data to get the age estimate.

The implications of this research go far beyond being objective while judging beauty. It is yet another successful use of AI technology, which we can see applied today already. By using two paths to get to the same objective, broadening the database by a crowdfunded campaigns, and at the same time coming up with different methods, Insilico Medicine moves forward fast in this very specific and narrow field.

“One of my goals in life is to minimize unnecessary animal testing in areas, where computer simulations can be even more relevant to humans. Serendipitously, some of our approaches find surprising new applications in the beauty industry, which has moved away from human testing and is moving towards personalizing cosmetics and beauty products”, said Alex Zhavoronkov, CEO of Insilico Medicine, Inc.

Call for papers

COIOTE 2015 now accepting submissions!

COIOTE 2015, the 2nd International Conference on Cognitive Internet of Things Technologies, will take place in Rome, Italy on October 26-27. The conference aims at gathering enthusiastic researchers and practitioners from AI and IoT-related areas sharing the common goal of addressing the new challenges posed by the Cognitive aspect of IoT by using new or leveraging existing Artificial Intelligence techniques. COIOTE 2015 is now open for submissions on topics ranging from knowledge representation to deep learning in IoT. The conference also caters for a limited number of workshops on dedicated session topics

Participation at this event will give attendees the unique opportunity to be exposed to all aspects of IoT-related topics at co-located conferences, as well as have full access to the IoT marketplace at the IoT360 Summit.

All accepted papers will be published by Springer and made available through SpringerLink Digital Library, one of the world’s largest scientific libraries. Best papers will be invited to publish in the EAI Endorsed Transactions on Cognitive Communications.

Previously unpublished paper in English, up to 6 pages long, can be submitted through Confy by 15 June. The papers must be formatted using the Springer LNICST Authors’ Kit.


Abstract submission deadline: 15 June 2015

Notification deadline: 25 July 2015

Camera-ready deadline: 25 August 2015