Why the Internet is on the verge of blowing up all of our methods courses
by Christian Sandvig, reblogged by permission, from his blog "multicast": the Internets, technology, and policy
March 20th, 2010
(or: Methodologists, atone!)
By far my favorite book on the research methods, Unobtrusive Measures (first published in 1966), is a skeptical romp through social science where the authors take the position that most of what we call social science is wrong. The theme of the book is that research is likely wrong because research design is very difficult and researchers too easily substitute received wisdom and procedure for hard thinking about designing studies, experiments, measures, tests, and so on. Scientific conduct has a rote character that extensive training and preparation (e.g., making you get a Ph.D.) can reinforce. Peer review and the tenure system can be engines of conservatism.
So you perform a survey in which you ask a particular question of a particular group not because it means something as evidence or it is a particularly good idea. You do it because your advisor did it that way, or someone else (cite, year) did it that way and it is therefore respectable. And if someone did it before, its comparable. This is perfectly reasonable. Its likely you are interested in a particular problem, but not really in the methods or statistics relevant to tests related to that problem, so you offload all of the thinking about statistics by performing the methods and statistics that everyone else does. Its efficient.
Yet when you stop and actually think about the intricacies of any particular research design, it gets ugly. Einstein said, Theory is something nobody believes, except the person who made it. An experiment is something everybody believes, except the person who made it. For decades (since even before Webb in 1966), various writers have been alarmed at the misuse of quantitative research.
My own struggles with the topic led me to design a graduate course called Unorthodox Research Methods. The premise is this: Most research courses teach procedure, but we need to train our students to think about research design and evidence first and we are not doing a good job of that. (Im revising the syllabus for this course and so Im thinking about these issues again, hence this post.)
One big example of the big pitfalls in our proceedure-based methods education is the use of statistical significance. Even non-quants are familiar with those nagging asterisks that appear after all sorts of columns in all sorts of journal articles across the social sciences. Statistical significance is the end of conversation about method in many research projects. Once p < .05, you pack up your kit and go home. I think it is fair to say that most researchers have internalized this approach despite the fact that it is totally wrong and the statistics literature has railed against it for decades.
Just so we are clear: statistical significance is often useless its not even a hint toward the right answer for your research project in many situations. Luckily for the truth, the rise of the Internet is about to cause this test to blow up in our face. We have taught statistics so badly in the social sciences that most academics do not appear to realize that the test of significance is about sampling. (Bam!) It is a test that helps you figure out if you are being excessively skeptical because of the small size of the sample that youve got. And our samples are now changing.
Data from the cloud now lets us test all kinds of social science questions (particularly if you are interested in human communication) that before would have by necessity sat in a small sample questionnaire. As social scientists turn toward big data they are going to trip over their bad habit of significance testing. The fact is, most methods courses and research procedures in wide use are obsessed with errors caused by sampling, especially small sample sizes. (Bam!) But as a sea of digital data opens up to the horizon, our problems are increasingly about specification error and not sample sizes.
Remember, statistical significance is about sampling. Except in the limiting case of literally zero correlation, if the sample were large enough all of the coefficients would be significantly different from everything. (McCloskey, p. 202). Take your study of communication patterns from 60 paper-and-pencil questionnaires replicate it with a random sample of a million Facebook accounts (if you can get access see this editorial). Youll find that statistical significance (particularly at the arbitrary point of p < .05) tells you zip.
(click for more shirts like this.)
I think most of the solution is to de-emphasize procedure, as social science procedure is becoming much more volatile as information technology improves. We need to get people to understand that research design is a creative act, not the boring part of the research process. To that end, we need classes about evidence and research design. Figuring out how to do that is a challenge but weve got to step up to it. (If youve got ideas for revising the syllabus for my last attempt, send me an email or a comment.)
Chant it with me: Statistical significance does not equal substantive significance. Please chant it with me.
Deirdre McCloskey: Rhetoric Within the Citadel: Statistics (http://deirdremccloskey.org/docs/pdf/Article_181.pdf) and Why Economic Historians Should Stop Relying on Statistical Tests of Significance (http://deirdremccloskey.org/docs/pdf/Article_180.pdf)
John P. A. Ioannidis: Why Most Published Research Findings Are False (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182327/)
Jonathan A C Sterne and George Davey Smith: Sifting the Evidence: Whats Wrong with Significance Tests? (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1119478/?tool=pubmed)
This work, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-Share Alike 3.0 License.