New method to better understand much-employed self-learning Artificial Intelligence

11/04/2019 - Recent advancements in Artificial Intelligence (AI) research result from the combination of deep neuronal networks and reinforcement learning. In the latter, agents are able to learn rewarding behaviours in unknown environments by an iterative trial-and-error behaviour update process. But this process is not yet fully understood. Reinforcement learning agents are a specific area of AI. As AI can have a big impact on society, a better understanding AI systems is crucial to assess potential challenges and risks. Already today, AI is employed to steer cars, manage production lines, or even draft texts. A team of scientists from the Potsdam Institute for Climate Impact Research has developed a new method to investigate those algorithms using insights from statistical physics. Published in the journal Physical Reviews E, their insights can help to improve the design of large-scale AI reinforcement learning systems.
New method to better understand much-employed self-learning Artificial Intelligence

“With our work, we contribute to better understanding what impact artificial intelligence potentially has on society. Using techniques from dynamical systems theory, we find that self-learning agents may not evolve towards a single behaviour. Instead, they may enter a continuous cycle of different behaviours or even evolve on an unpredictable trajectory”, lead author Wolfram Barfuss explains. “Eventually, insights of such dynamical systems studies can be translated back to improve the design of large-scale AI reinforcement learning systems.”

One of the elements distinguishing the study from previous research is the role of the model’s environment. Former research used to limit the variability of the agents’ environment. Yet in reality, environments evolve dynamically, and agents adapt their behaviour accordingly. “We present a methodological extension: By separating the interaction from the adaptation timescale, we obtain the deterministic limit of a general class of reinforcement learning algorithms. This is called temporal difference learning. This form of learning indeed functions in more realistic multistate environments”, says Jürgen Kurths, co-author and chair of the Research Department Complexity Science at the Potsdam Institute.

Artificial intelligence also has great potential for understanding climate impacts. Thus, the Potsdam Institute aims at increasing the use of artificial intelligence in order to assess e.g. how the people potentially react to shocks induced by climate change. This could help to better protect the public from such risks in the future.

Article: Wolfram Barfuss, Jonathan F. Donges, Jürgen Kurths (2019): Deterministic limit of temporal difference reinforcement learning for stochastic games. Phys.Rev. E 99 [DOI: 10.1103/PhysRevE.99.043305]


Weblink to the article: https://journals.aps.org/pre/abstract/10.1103/PhysRevE.99.043305#fulltext