Prisoner's dilemma and the pigeon: Control by immediate consequences.
Even when cooperation pays three times more, immediate reinforcement alone drives impulsive choices.
01Research in Context
What this study did
Researchers put pigeons in a chamber with two colored keys.
Each key stood for cooperate or defect in a prisoner's dilemma game.
The birds played 100 rounds against a computer partner.
Cooperating gave three times more food over time, but defection gave one immediate pellet.
What they found
The pigeons never learned to cooperate.
They hit the defect key almost every time.
This earned them only one-third of the food they could have gained.
The immediate pellet beat the bigger delayed reward every time.
How this fits with other research
HOFFMAN et al. (1963) showed that pigeons can keep learned patterns for 2.5 years.
Yet here the birds could not learn a new pattern even with 100 trials.
The difference is that S et al. used punishment-based suppression, while L et al. used reward-based choice.
Madden et al. (2003) proved that rats can learn complex stimulus rules for drug rewards.
This makes the pigeons' failure more striking — even rats mastered schedule control when stakes were high.
Hart et al. (1974) found that tight DRL schedules create odd side behaviors like wheel running.
Both studies show how schedule design can trap animals in sub-optimal patterns.
Why it matters
Your clients may also pick small immediate rewards over bigger delayed ones.
This study warns that simply telling them about long-term benefits rarely works.
You need to change the immediate consequences to make the better choice pay off right now.
Get CEUs on This Topic — Free
The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.
Put the better choice on a rich fixed-ratio schedule so the client gets immediate payoff for cooperation.
02At a glance
03Original abstract
In three experiments pigeons played (i.e., chose between two colored keys) iterated prisoner's dilemma and other 2 x 2 games (2 participants and 2 options) against response strategies programmed on a computer. Under the prisoner's dilemma pay-off matrix, the birds generally defected (i.e., pecked the color associated with not cooperating) against both a random response (.5 probability of either alternative) and a tit-for-tat strategy (on trial n the computer "chooses" the alternative that is the same as the one chosen by the subject on trial n - 1) played by the computer. They consistently defected in the tit-for-tat condition despite the fact that as a consequence they earned about one third of the food that they could have if they had cooperated (i.e., pecked the "cooperate" color) on all the trials. Manipulation of the values of the food pay-offs demonstrated that the defection and consequent loss of food under the tit-for-tat condition were not due to a lack of sensitivity to differences in pay-off values, nor to strict avoidance of a null pay-off (no food on a trial), nor to insensitivity to the local (current trial) reward contingencies. Rather, the birds markedly discounted future outcomes and thus made their response choices based on immediate outcomes available on the present trial rather than on long-term delayed outcomes over many trials. That is, the birds were impulsive, choosing smaller but more immediate rewards, and did not demonstrate self-control. Implications for the study of cooperation and competition in both humans and nonhumans are discussed.
Journal of the experimental analysis of behavior, 1995 · doi:10.1901/jeab.1995.64-1