Why did punishment and the use of reputation evolve in humans? According to one family of theories, they evolved to support the maintenance of cooperative group norms; according to another, they evolved to enhance personal gains from cooperation. Current behavioral data are consistent with both hypotheses (and both selection pressures could have shaped human cooperative psychology). However, these hypotheses lead to sharply divergent behavioral predictions in circumstances that have not yet been tested. Here we report results testing these rival predictions. In every test where social exchange theory and group norm maintenance theory made different predictions, subject behavior violated the predictions of group norm maintenance theory and matched those of social exchange theory. Subjects do not direct punishment toward those with reputations for norm violation per se; instead, they use reputation self-beneficially, as a cue to lower the risk that they personally will experience losses from defection. More tellingly, subjects direct their cooperative efforts preferentially towards defectors they have punished and away from those they haven’t punished; they avoid expending punitive effort on reforming defectors who only pose a risk to others. These results are not consistent with the hypothesis that the psychology of punishment evolved to uphold group norms. The circumstances in which punishment is deployed and withheld–its circuit logic–support the hypothesis that it is generated by psychological mechanisms that evolved to benefit the punisher, by allowing him to bargain for better treatment.