Artificial intelligence pioneer Geoffrey Hinton made headlines earlier this year when he raised concerns about the capabilities of AI systems. Speaking to CNN journalist Jake Tapper, Hinton said:
If it gets to be much smarter than us, it will be very good at manipulation because it would have learned that from us. And there are very few examples of a more intelligent thing being controlled by a less intelligent thing.
Anyone who has kept tabs on the latest AI offerings will know these systems are prone to “hallucinating” (making things up) – a flaw that’s inherent in them due to how they work.
Yet Hinton highlights the potential for manipulation as a particularly major concern. This raises the question: can AI systems deceive humans?
We argue a range of systems have already learned to do this – and the risks range from fraud and election tampering, to us losing control over AI.
AI learns to lie
Perhaps the most disturbing example of a deceptive AI is found in Meta’s CICERO, an AI model designed to play the alliance-building world conquest game Diplomacy.
Playing as France, CICERO plans with Germany to deceive England. Park, Goldstein et al., 2023
What are the risks?
AI systems with deceptive capabilities could be misused in numerous ways, including to commit fraud, tamper with elections and generate propaganda. The potential risks are only limited by the imagination and the technical know-how of malicious individuals. Beyond that, advanced AI systems can autonomously use deception to escape human control, such as by cheating safety tests imposed on them by developers and regulators. In one experiment, researchers created an artificial life simulator in which an external safety test was designed to eliminate fast-replicating AI agents. Instead, the AI agents learned how to play dead, to disguise their fast replication rates precisely when being evaluated. Learning deceptive behaviour may not even require explicit intent to deceive. The AI agents in the example above played dead as a result of a goal to survive, rather than a goal to deceive. In another example, someone tasked AutoGPT (an autonomous AI system based on ChatGPT) with researching tax advisers who were marketing a certain kind of improper tax avoidance scheme. AutoGPT carried out the task, but followed up by deciding on its own to attempt to alert the United Kingdom’s tax authority. In the future, advanced autonomous AI systems may be prone to manifesting goals unintended by their human programmers.That story about a killer AI run amok seems fake. Here’s my much nerdier & less dramatic story: I set up an AI agent to find advisers marketing tax avoidance schemes. The AI agent did this, then decided – entirely on its own – to inform HMRC of its findings. pic.twitter.com/6sAgFlbmvr
— Dan Neidle (@DanNeidle) June 2, 2023