Google is putting extra thought into AI safety concerns

Via Google Research Blog.
Development of machine learning and artificial intelligence is moving forward steadily. And with significant progress come significant risks and public interest in the safety of advanced AI. Even if we forget the worst case scenarios that have been done and overdone in fiction, the practical problems that we are facing while trying to develop a capable and autonomous AI are numerous.
We have reported previously that Google is well aware of the dangers when it comes to advanced AI, but they have now gone one step further. Together with researchers from OpenAI, Stanford, and Berkeley, they have boiled the emerging issues down to five distinct categories. As they have stated on their Research Blog, these are issues that may seem trivial, even irrelevant right now, but they are the corner stones of current and future development of AI. The goal is simple – to ground the debate and frame it into tangible and quantifiable problems for both the engineers, and the public. There are no nightmare scenarios to be found here, only sure-fire ways of avoiding them.
It would be a stretch to put the paper that they have published yesterday, Concrete Problems in AI Safetyright next to Asimov’s Three Laws of Robotics, but the two have a lot in common. Google’s perspective is a little more narrow, as they focus on accidents in machine learning systems, but safety is still the central theme. As they state in the paper, an accident constitutes “unintended and harmful behavior that may emerge from machine learning systems when we specify the wrong objective function, are not careful about the learning process, or commit other machine learning-related implementation errors.” This definition surely covers enough ground to raise the interest of anyone interested in AI safety.
And here are the five problems, which authors have described as forward thinking and long-term:

  • Avoiding Negative Side Effects: How can we ensure that an AI system will not disturb its environment in negative ways while pursuing its goals, e.g. a cleaning robot knocking over a vase because it can clean faster by doing so?
  • Avoiding Reward Hacking: How can we avoid gaming of the reward function? For example, we don’t want this cleaning robot simply covering over messes with materials it can’t see through.
  • Scalable Oversight: How can we efficiently ensure that a given AI system respects aspects of the objective that are too expensive to be frequently evaluated during training? For example, if an AI system gets human feedback as it performs a task, it needs to use that feedback efficiently because asking too often would be annoying.
  • Safe Exploration: How do we ensure that an AI system doesn’t make exploratory moves with very negative repercussions? For example, maybe a cleaning robot should experiment with mopping strategies, but clearly it shouldn’t try putting a wet mop in an electrical outlet.
  • Robustness to Distributional Shift: How do we ensure that an AI system recognizes, and behaves robustly, when it’s in an environment very different from its training environment? For example, heuristics learned for a factory workfloor may not be safe enough for an office.

That is a tall order. We wish the entirety of Google’s Brain division best of luck.