An Open Framework for the Reproducible Study of the Iterated Prisoner’s Dilemma

,


Introduction
Several Iterated Prisoner's Dilemma tournaments have generated much interest; Axelrod's original tournaments [2,3], two 2004 anniversary tournaments [20], and the Stewart and Plotkin 2012 tournament [44], following the discovery of zero-determinant strategies.Subsequent research has spawned a number of papers (many of which are referenced throughout this paper), but rarely are the results reproducible.Amongst well-known tournaments, in only one case is the full original source code available (Axelrod's second tournament [3], in FORTRAN).In no cases is the available code well-documented, easily modifiable, or released with significant test suites.To complicate matters further, a new strategy is often studied in isolation with opponents chosen by the creator of that strategy.Often such strategies are not sufficiently described to enable reliable recreation (in the absence of source code), with [42] being a notable counter-example.In some cases, strategies are revised without updates to their names or published implementations [25,26].As such, the results cannot be reliably replicated and therefore have not met the basic scientific criterion of falsifiability.This paper introduces a software package: the Axelrod-Python library.The Axelrod-Python project has the following stated goals: • To enable the reproduction of Iterated Prisoner's Dilemma research as easily as possible • To produce the de-facto tool for any future Iterated Prisoner's Dilemma research • To provide as simple a means as possible for anyone to define and contribute new and original Iterated Prisoner's Dilemma strategies The presented library is partly motivated by an ongoing discussion in the academic community about reproducible research [9,16,39,40], and is: • Open: all code is released under an MIT license; • Reproducible and well-tested: at the time of writing there is an excellent level of integrated tests with 99.73% coverage (including property based tests: [28]) • Well-documented: all features of the library are documented for ease of use and modification • Extensive: 135 strategies are included, with infinitely-many available in the case of parametrised strategies • Extensible: easy to modify to include new strategies and to run new tournaments Review of the literature As stated in [6]: "few works in social science have had the general impact of [Axelrod's study of the evolution of cooperation] ".In 1980, Axelrod wrote two papers: [2,3] which describe a computer tournament that has been a major influence on subsequent game theoretic work [5,6,7,8,10,11,12,13,15,18,23,24,27,34,35,36,38,43,44].As described in [6] this work has not only had impact in mathematics but has also led to insights in biology (for example in [43], a real tournament where Blue Jays are the participants is described) and in particular in the study of evolution.The tournament is based on an iterated game (see [29] or similar for details) where two players repeatedly play the normal form game of (1) in full knowledge of each other's playing history to date.An excellent description of the one shot game is given in [13] which is paraphrased below: Two players must choose between Cooperate (C) and Defect (D): • If both choose C, they receive a payoff of R (Reward); • If both choose D, they receive a payoff of P (Punishment); • If one chooses C and the other D, the defector receives a payoff of T (Temptation) and the cooperator a payoff of S (Sucker).and the following reward matrix results from the Cartesian product of two decision vectors C, D , R, R S, T T, S P, P such that T > R > P > S and 2R > T + S The game of (1) is called the Prisoner's Dilemma.Specific numerical values of (R, S, T, P ) = (3, 0, 5, 1) are often used in the literature [2,3], although any satisfying the conditions in 1 will yield similar results.In [34] a description is given of how incomplete information can be used to enhance cooperation, in a similar approach to the proof of the Folk theorem for repeated games [29].This aspect of incomplete information is also considered in [6,24,35] where "noisy" tournaments randomly flip the choice made by a given strategy.In [36], incomplete information is considered in the sense of a probabilistic termination of each round of the tournament.As mentioned before, IPD tournaments have been studied in an evolutionary context: [12,24,38,44] consider this in a traditional evolutionary game theory context.These works investigate particular evolutionary contexts within which cooperation can evolve and persist.This can be in the context of direct interactions between strategies or population dynamics for populations of many players using a variety of strategies, which can lead to very different results.For example, in [24] a machine learning algorithm in a population context outperforms strategies described in [38] and [44] that are claimed to dominate any evolutionary opponent in head-to-head interactions.
Further to these evolutionary ideas, [8,10] are examples of using machine learning techniques to evolve particular strategies.In [4], Axelrod describes how similar techniques are used to genetically evolve a high performing strategy from a given set of strategies.Note that in his original work, Axelrod only used a base strategy set of 12 strategies for this evolutionary study.This is noteworthy as the library now boasts over 136 strategies that are readily available for a similar analysis.

Implementation and architecture Description of the Axelrod Python package
The library is written in Python (http://www.python.org/)which is a popular language in the academic community with libraries developed for a variety of uses including: • Algorithmic Game Theory [30] (http://gambit.sourceforge.net/)).
• Astrophysics [1] (http://www.astropy.org/);• Data manipulation [33] (http://pandas.pydata.org/);• Machine learning [37] (http://scikit-learn.org/); • Mathematics [46] (http://www.sagemath.org/); • Visualisation [17] (http://matplotlib.org/);Furthermore, in [18] Python is described as an appropriate language for the reproduction of Iterated Prisoner's Dilemma tournaments due to its object oriented nature and readability.The library itself is available at https://github.com/Axelrod-Python/Axelrod.This is a hosted git repository.Git is a version control system which is one of the recommended aspects of reproducible research [9,40].As stated in the Introduction, one of the main goals of the library is to allow for the easy contribution of strategies.Doing this requires the writing of a simple Python class (which can inherit from other predefined classes).All components of the library are automatically tested using a combination of unit, property and integration tests.These tests are run as new features are added to the library to ensure compatibility (they are also run automatically using travis-ci.org).When submitting a strategy, a simple test is required which ensures the strategy behaves as expected.Full contribution guidelines can be found in the documentation, which is also part of the library itself and is hosted using readthedocs.org.As an example, Figures 1 and 2 show the source code for the Grudger strategy as well as its corresponding test.
class Grudger(Player): """A player starts by cooperating however will defect if at any point the opponent has defected."""name = 'Grudger' classifier = { 'memory_depth': float('inf'), # Long memory 'stochastic': False, 'inspects_source': False, 'manipulates_source': False, 'manipulates_state': False } def strategy(self, opponent): """Begins by playing C, then plays D for the remaining rounds if the opponent ever plays D.""" if opponent.defections:return D return C You can see an overview of the structure of the source code in Figure 3.This shows the parallel collection of strategies and their tests.Furthermore the underlying engine for the library is a class for tournaments which lives in the tournament.pymodule.This class is responsible for coordinating the play of generated matches (from the match.pymodule).This generation of matches is the responsibility of a match generator class (in the match generator.pymodule) which is designed in such a way as to be easily modifiable to create new types of tournaments.This is described further in a tutorial in the documentation which shows how to easily create a tournament where players only play each other with probability 0.5.This will be discussed further in the reuse section of this paper.To date the library has had contributions from 26 contributors from a variety of backgrounds which are not solely academic.These contributions have been mostly in terms of strategies.One strategy is the creation of an undergraduate mathematics student with little prior knowledge of programming.Multiple other strategies were written by a 15 year old secondary school student.Both of these students are authors of this paper.As well as these strategy contributions, vital architectural improvements to the   library itself have also been contributed.
(2) Availability Operating system The Axelrod library runs on all major operating systems: Linux, Mac OS X and Windows.

Programming language
The library is continuously tested for compatibility with Python 2.7 and the two most recent python 3 releases.

Additional system requirements
There are no specific additional system requirements.

Support
Support is readily available in multiple forms: • An online chat channel: https://gitter.im/Axelrod-Python/Axelrod.

Dependencies
The following Python libraries are required dependencies: • Numpy 1.9.2 • Matplotlib 1.4.2(only a requirement if graphical output is required) • Tqdm 3.4.0• Hypothesis 3.0 (only a requirement for development)

List of contributors
The names of all the contributors are not known: as these were mainly done through Github and some have not provided their name or responded to a request for further details.Here is an incomplete list:

Reuse potential
The Axelrod library has been designed with sustainable software practices in mind.
The availability of a large number of strategies makes this tool an excellent and obvious example of the benefits of open research which should positively impact the game theory community.This is evidently true already as the library has been used to study and create interesting and powerful new strategies.Installation of the library is straightforward via standard python installation repositories (https://pypi.python.org/pypi).The package name is axelrod and can thus be installed by calling: pip install axelrod on all major operating systems (Windows, OS X and Linux).
Figure 4 shows a very simple example of using the library to create a basic tournament giving the graphical output shown in Figure 5.

New strategies, tournaments and implications
Due to the open nature of the library the number of strategies included has grown at a fast pace, as can be seen in Figure 6.Nevertheless, due to previous research being done in an irreproducible manner with, for example, no source code and/or vaguely described strategies, not all previous tournaments can yet be reproduced.In fact, some of the early tournaments might be impossible to reproduce as the source code is apparently forever lost.This library aims ensure reproducibility in the future.One tournament that is possible to reproduce is that of [44].The strategies used in that tournament are the following: This can be reproduced as shown in Figure 8, which gives the plot of Figure 7.Note that slight differences with the results of [44] are due to stochastic behaviour of some strategies.In parallel to the Python library, a tournament is being kept up to date that pits all available strategies against each other.Figure 9 shows the results from the full tournament which can also be seen (in full detail) here: http://axelrod-tournament. Figure 7: The results from [44].
readthedocs.org/.Data sets are also available showing the plays of every match that takes place.Note that to recreate this tournament simply requires changing a single line of the code shown in Figure 4, changing: >>> strategies = [s() for s in axelrod.demo_strategies]}to: >>> strategies = [s() for s in axelrod.ordinary_strategies]}.
The current winning strategy is new to the research literature: Looker Up.This is a strategy that maps a given set of states to actions.The state space is defined generically by m, n so as to map states to actions as shown in (2).
The example of ( 2) is an incomplete illustration of the mapping for m = 8, n = 2. Intuitively, this state space uses the initial plays of the opponent to gain some information about its intentions whilst still taking into account the recent play.The actual winning strategy is an instance of the framework for m = n = 2 for which a particle swarm algorithm has been used to train it.The second placed strategy was trained with an evolutionary algorithm [19,22].In [21] experiments are described that evaluate how the second placed strategy behaves in environments other than those in which it was trained and it continues to perform strongly.There are various other insights that have been gained from ongoing open research on the library, details can be found in [14].These include:  • A closer look at zero determinant strategies, showing that extortionate strategies obtain a large number of wins: the number of times they outscore an opponent during a given match.However these do not perform particularly well from the overall tournament ranking point of view.This is relevant given the findings of [44] in which zero determinant strategies are shown to be able to perform better than any other strategy.This finding extends to noisy tournaments (which are also implemented in the library).• This negative relationship between wins and performance does not generalise.
There are some strategies that perform well, both in terms of matches won and overall performance: Back stabber, Double crosser, Looker Up, and Fool Me Once.These strategies continue to perform well in noisy tournaments, however some of these have knowledge of the length of the game (Back stabber and Double crosser).This is not necessary to rank well in both wins and score as demonstrated by Looker Up and Fool Me Once.• Strategies like Looker Up and Meta Hunter seem to be generally cooperative yet still exploit naive strategies.The Meta Hunter strategy is a particular type of Meta strategy which uses a variety of other strategy behaviours to choose a best action.These strategies perform very well in general and continue to do so in noisy tournaments.

Conclusion
This paper has presented a game theoretic software package that aims to address reproducibility of research into the Iterated Prisoner's Dilemma.The open nature of the development of the library has lead rapidly to the inclusion of many well known strategies, many novel strategies, and new and recapitulated insights.The capabilities of the library mentioned above are not at all comprehensive, a list of the current abilities include:

Figure 1 :
Figure 1: Source code for the Grudger strategy.

Figure 2 :
Figure 2: Test code for the Grudger strategy.

Figure 3 :
Figure 3: An overview of the source code.

Figure 5 :
Figure 5: The results from a simple tournament.

Figure 6 :
Figure 6: The number of strategies included in the library

(
(C, D, D, D, C, D, D, C) m first actions by opponent , n last pairs of actions