by Ted Goertzel
(published in The Skeptical Inquirer, 2002, 26 (1): 19-23)
Do you believe that every time a prisoner is executed in the United States, eight future murders are deterred? Do you believe that a 1 % increase in the number of citizens licensed to carry concealed weapons causes a 3.3% decrease in the state’s murder rate? . . . .
If you were misled by any of these studies, you may have fallen for a pernicious form of junk science: the use of mathematical models with no demonstrated predictive capability to draw policy conclusions. These studies are superficially impressive. Written by reputable social scientists from prestigious institutions, they often appear in peer-reviewed scientific journals. Filled with complex statistical calculations, they give precise numerical “facts” that can be used as debaters’ points in policy arguments. But these “facts are will-o’-the-wisps. Before the ink is dry on one study, another appears with completely different “facts”. Despite their scientific appearance, these models do not meet the fundamental criterion for a useful mathematical model: the ability to make predictions that are better than random chance.
Although economists are the leading practitioners of this arcane art, sociologists, criminologists and other social scientists have versions of it as well. It is known by various names, including “econometric modeling,” “structural equation modeling,” and “path analysis”. All of these are ways of using the correlations between variables to make causal inferences. The problem with this, as anyone who had had a course in statistics knows, is that correlation is not causation. Correlations between two variables are often “spurious” because they are caused by some third variable. Econometric modelers try to overcome this problem by including all the relevant variables in their analysis, using a statistical technique called “multiple regression.” If one had perfect measures of all the causal variables, this would work. But the data are never good enough. Repeated efforts to use multiple regression to achieve definitive answers to public policy questions have failed. . . .
John Lott, an economist at Yale University, used an econometric model to argue that “allowing citizens to carry concealed weapons deters violent crime, without increasing accidental deaths.’ Lott’s analysis involved “shall issue” laws that require local authorities to issue a concealed weapons permit to any law-abiding citizen who applies for one. Lott estimated that each one percent increase in gun ownership in a population causes a 3.3 % decrease in homicide rates. Lott and his co-author, David Mustard, posted the first version of their study on the Internet in 1997 and tens of thousands of people downloaded it. It was the subject of policy forums, newspaper columns, and often wuite sophisiticated debates on the Wide World Web. In a book with the catchy title More Guns, Less Crime, Lott taunted his critics, accusing them of putting ideology ahead of science.
Lott’s work is an example of statistical one-upmanship. he has more data and a more complex analysis than anyone else studying the topic. He demands that anyone who wants to challenge his arguments become immersed in a very complex statistical debate, based on computations so difficult that they cannot be done with ordinary desktop computers. He challenges anyone who disagrees with him to download his data set and redo his calculations, but most social scientists do not think it worth their while to replicate studies using methods that have repeatedly failed. Most gun control researchers simply brushed off Lott and Mustard’s claims and went on with their work. Two highly respected criminal justice researchers, Frank Zimring and Gordon Hawkins (1997) wrote an article complaining that:
just as Messrs. Lott and Mustard can, with one model of the determinants of homicide, produce statistical residuals suggesting that ‘shall issue’ laws reduce homicide, we expect that a determined econometrician can produce a treatment of the same historical periods with different models and opposite effects. Econometric modeling is a double-edged sword in its capacity to facilitate statistical findings to warm the hearts of true believers of any stripe.
Zimring and Hawkins were right. Within a year, two determined econometricians, Dan Black and Daniel Nagin (1998) published a study showing that if they changed the statistical model a little bit, or applied it to different segments of the data, Lott and Mustard’s findings disappeared. Black and Nagin found that when Florida was removed from the sample there was “no detectable impact of the right-to-carry laws on the rate of murder and rape.” They concluded that “inference based on the Lott and Mustard model is inappropriate, and their results cannot be used responsibly to formulate public policy.”
John Lott, however, disputed their analysis and continued to promote his own. Lott had collected data for each of America’s counties for each year from 1977 to 1992. The problem with this is that America’s counties vary tremendously in size and social characteristics. A few large ones, containing major cities account for a very large percentage of the murders in the United States. As it happens, none of these very large counties have “shall issue” gun control laws. This means that Lott’s massive data set was simply unsuitable for his task. He had no variation in his key causal variable — “shall issue” laws — in the places where most murders occurred.
He did not mention this limitation in his book or articles. When I discovered the lact of “shall issue” laws in the major cities in my own examination of his data, I asked him about it. He shrugged it off, saying that he had “controlled” for population size in his analysis. But introducing a statistical control in the mathematical analysis did not make up for the fact that he simply had not data for the major cities where the homicide problem was most acute.
It took me some time to find this problem in his data, since I was not familiar with the gun control issue. But Zimring and Hawkins zeroed in on it immediately because they knew that “shall issue” laws were instituted in states where the National Rifle Association was powerful, largely in the South, the West and in rural regions. These were states that already had few restrictions on guns. They observed that this ligislative history frustrates “our capacity to compare trends in ‘shall issue’ states with trends in other states. Because the states that changed legislation are different in location and constitution from the states that did not, comparisons across legislative categories will always risk confusing demographic and regional influences wiht the behavioral impact of different legal regimes.”
Zimring and Hawkins further observed that;
Lott and Mustard are, of course, aware of this problem. Their solution, a standard econometric technique , is to build a statistical model that will control for all the differences between Idaho and New York City that influence homicide and crime rates, other than “shall issue” laws. If we can “specify” the major influences on homicide, rape, burglary, and auto theft in our model, then we can eliminate the influence of these factors on the different trends. Lott and Mustard build models that estimate the effects of demographic data, economic data, and criminal punishement on various offenses. These models are the ultimate in statistical home cooking in that they are created for the data set by these authors and only tested on the data that will be used in the evaluation of the right-to-carry impacts.
Lott and Mustard were comparing trends in Idaho and West virginia and Mississippi with trends in Washington D.C. and New York City. What actually happened was that there was an explosion of crack-related homicides in major eastern cities in the 1980s and early 1990s. Lott’s whole argument came down to a claim that the largely rural and western “shall issue” states were spared the crack-related homicide epidemic because of their “shall issue” laws. This never would have been taken seriously if it had not been obscured by a maze of equations. . . .
In 1975 the American Economic Review published an article by a leading economist, Isaac Ehrlich of the University of Michigan, who estimated that each execution deterred eight homicides. Before Ehrlich, the best known specialist on the effectiveness of capital punishment was Thorsten Sellen, who had used a much simpler method of analysis. Sellen prepared graphs comparing trends in different states. He found little or no difference between states with or without the death penalty, so he concluded that the death penalty made no difference. Enrlich, is an act of statistical one-upmanship, claimed that his analysis was more valid because it controlled for all the factors that influence homicide rates.
Even before it was published, Ehrlich’s work was cited by the Solicitor General of the United States in an amicus curiae brief filed with the United States Supreme Court in defense of the death penalty. Fortunately, the Court decided not to rely upon Ehrlich’s evidence because it had not been confirmed by other researchers. This was wise, because within a year or two other researchers published equally sophisticated econometric analyses showing that the death penalty had no deterrent effect.
The controversy over Ehrlich’s 2work was so important that the National Research Council convened a blue ribbon panel of experts to review it. After a very thorough review, the panel decided that the problem was not just with Ehrlich’s model, but with the idea of using econometric methods to resolve controversies over criminal justice policies. They (Manski 1978: 422) concluded that:
because the data likely to be available for such analysis have limitations and because criminal behavior can be so complex, the emergence of a definitive behavioral study laying to rest all controversy about the behavioral effects of deterrence policies should not be expected.
Most experts now believe that Sellen was right, that capital punishment has no demonstrable effect on murder rates. But Ehrlich has not been persuaded. He is now a lonely true believer in the validity of his model. In a recent interview (Bonner and Fessendren, 2000) he insisted “if variations like unemployment, income inequality, likelihood of apprehension and willingness to use the death penalty are accounted for, the death penalty shows a significant deterring effect.” . . . .
The journals that publish econometric studies of public policy issues often do not require predictive testing , which shows that the editors and reviewers have low expectations for their fields. So researchers take data for a fixed period of time and keep fine-tuning and adjusting their model until they can “explain” trends that have already happened. There are always a number of ways to do this, and with modern computers it is not terribly hard to keep trying until you find something that fits. At that point, the researcher stops, writes up the findings, and sends the paper off for publication. Later, another researcher may adjust the model to obtain a different result. This fills the pages of scholarly journals, and everybody pretends not to notice that little or no progress is being made. But we are no closer to having a valid econometric model of murder rates today than we were when Isaac Ehrlich published the first model in 1975.
The scientific community does not have food procedures for acknowledging the failure of a widely used research method. Methods that are entrenched in graduate programs at leading universities and published in prestigious journals tend to be perpetuated. Many laymen assume that if a study has been published in a peer-reviewed journal, it is valid. The case examined above show that this is not always the case. Peer review assures that established practices have been followed, but it is of little help when those practices themselves are faulty.
In 1991 David Freedman, a distinguished sociologist at the University of california at Berkeley and the author of textbooks on quantitative research methods, shook the foundations of regression modeling when he frankly stated “I do not think that regression can carry much of the burden in a causal argument. Nor do regression equations, by themselves, five much help in controlling for confounding ariables” (Freedman, 1991: 292). Freedman’s article provoked a number of strong reactions. Richard Berk (119: 315) observed that Freedman’s argument “will be very difficult for most quantitative sociologists to accept. It goes to the heart of their empirical enterprise and in so doing, puts entire professional careers in jeopardy.”
Berk, Richard 1991. “Toward a methodology for mere mortals,” Sociological Methodology 21 : 315-324.
Black, Dan, and Daniel Nagin 1998. “Do right-to-carry laws deter violent crime?” Journal of Legal Studies 27 : 209-219.
Bonner, Raymond, and Ford Fessendren 2000. “States with no death penalty share lower homicide rates,” New York Times, September 22.
Freedman, David 1991. “Statistical models and shoe leather,” Sociological Methodology 21 291-313.
Lott, John 2000. More Guns, Less Crime: Understanding Crime and Gun Control Laws, University of Chicago Press, 2nd edition.
Zimring, FRank, and Gordon Hawkins 1997. “Concealed handguns: the counterfeit deterrent,” The Responsive Community 7 :46-60.