Name: Human Compatible: Artificial Intelligence and the Problem of Control review
Item: Human Compatible: Artificial Intelligence and the Problem of Control
Author: Superbook

What it argues

Human Compatible is Stuart Russell's argument, from inside mainstream AI research, that the standard model of AI — build a system that optimizes for a fixed objective — is the wrong approach, and that the transition to much more capable AI systems requires a fundamental change in how AI is designed. Russell is one of the most distinguished AI researchers in the world, co-author of the most widely used AI textbook, and his engagement with the safety problem carries more technical credibility than most books in this space.

The standard model of AI, in Russell's analysis, specifies an objective function — a precise specification of what the system should maximize — and then builds a system that optimizes for it. This works well for narrow AI systems in constrained domains. But as systems become more generally capable, the problem of objective misspecification becomes critical: the system will achieve the objective as specified, which may diverge from what we actually wanted in ways we didn't anticipate. Goodhart's Law — "when a measure becomes a target, it ceases to be a good measure" — applies with particular force to powerful AI systems.

What it gets right

1.
The standard model of AI — specify an objective, build a system to maximize it — is the wrong approach for highly capable systems because perfectly specifying what we want is practically impossible.
2.
Goodhart's Law applies to AI: any objective that can be precisely specified will be optimized in ways that diverge from the underlying intention when the system is sufficiently capable.
3.
The solution is to build systems that are uncertain about their objectives and infer them from human behavior, rather than systems that pursue specified objectives with certainty.

What it covers

Artificial intelligence AI safety Control problem Machine learning Future

Who wrote it

Stuart Russell is a professor of computer science at the University of California, Berkeley, where he holds the Smith-Zadeh Chair in Engineering. He is co-author of Artificial Intelligence: A Modern Approach, the most widely used textbook in the field, which has been translated into 13 languages. His research has covered Bayesian networks, reinforcement learning, and the theory of bounded rationality. He has been a fellow of the Association for the Advancement of Artificial Intelligence since 1990 and was appointed Honorary Officer of the Order of the British Empire for services to education. Human Compatible has been widely cited as the most technically credible popular…

Human Compatible: Artificial Intelligence and the Problem of Control review

Talk to Human Compatible: Artificial Intelligence and the Problem of Control like its author wrote you back.

What it argues

What it gets right

What it covers

Who wrote it

Chat with Human Compatible: Artificial Intelligence and the Problem of Control