Why do we learn to reward cooperation?

October 27, 2022

Researchers at the Max Planck Institute in Plön show that reputation plays a key role in determining which rewarding policies people adopt. Using game theory, they explain why individuals learn to use rewards to specifically promote good behaviour.

Social Rewarding — Results from evolutionary simulations display the co-evolution of cooperation and social rewarding in a population. At low information transmissibility, most population members learn not to reward others. Consequently, cooperation rates are low. As information transmissibility increases, cooperation and social rewarding become highly abundant. In these simulations, individuals adopt their strategies through a process of imitation and exploration. They are more likely to imitate a strategy from the population if it fares better than their current strategy (in a process of imitation or *selection*). With some probability they can also explore a random strategy (in a process of exploration or *mutation*).

© MPI for Evolutionary Biology

Results from evolutionary simulations display the co-evolution of cooperation and social rewarding in a population. At low information transmissibility, most population members learn not to reward others. Consequently, cooperation rates are low. As information transmissibility increases, cooperation and social rewarding become highly abundant. In these simulations, individuals adopt their strategies through a process of imitation and exploration. They are more likely to imitate a strategy from the population if it fares better than their current strategy (in a process of imitation or *selection*). With some probability they can also explore a random strategy (in a process of exploration or *mutation*).

© MPI for Evolutionary Biology

Often, we use positive incentives like rewards to promote cooperative behaviour. But why do we predominantly reward cooperation? Why is defection rarely rewarded? Or more generally, why do we bother to engage in any form of rewarding in the first place? Theoretical work done by researchers Saptarshi Pal and Dr. Christian Hilbe at the Max Planck Research Group ‘Dynamics of Social Behaviour’ suggests that reputation effects can explain why individuals learn to reward socially.

With tools from evolutionary game theory, the researchers construct a model where individuals in a population (the players) can adopt different strategies of cooperation and rewarding over time. In this model, the players’ reputation is a key element. The players know, with a degree of certainty (characterized by the information transmissibility of the population), how their interaction partners are going to react to their behaviour (that is, which behaviours they deem worthy of rewards). If the information transmissibility is sufficiently high, players learn to reward cooperation. In contrast, without sufficient information about peers, players refrain from using rewards. The researchers show that these effects of reputation also play out in a similar way when individuals interact in groups with more than two individuals.

Antisocial rewarding

In addition to highlighting the role of reputation in catalyzing cooperation and social rewarding, the scientists identify a couple of scenarios where antisocial rewarding may evolve. Antisocial rewarding either requires populations to be assorted or rewards to be mutually beneficial for both the recipient and the provider of the reward. “These conditions under which people may learn to reward defection are however a bit restrictive since they additionally require information to be scarce” adds Saptarshi Pal.

The results from this study suggest that rewards are only effective in promoting cooperation when they can sway individuals to act opportunistically. These opportunistic players only cooperate when they anticipate a reward for their cooperation. A higher information transmissibility increases both, the incentive to reward others for cooperating, and the incentive to cooperate in the first place. Overall, the model suggests that when people reward cooperation in an environment where information transmissibility is high, they ultimately benefit themselves. This interpretation takes the altruism out of social rewarding - people may not use rewards to enhance others’ welfare, but to help themselves.